Settings and Basic Customisation

Indexing Settings

Preventing Common Content from Appearing in the Index

On most sites you won't want menus, common titles etc. appearing in the search results. As we index the full page HTML that can happen here, so we've provided a few methods to prevent that.

For Macros:

There's a GeneralExtensions helper function that allows you to determine whether or not the page is being called by the search indexer. Simply call GeneralExtensions.IsIndexingActive() from your Razor. This doesn't do anything particularly sophisticated. It simply checks for a parameter that can be passed by query string or cookie, by default that parameter is called FullTextActive, this can be changed in the FullTextSearch.config file. To prevent your macros from outputting if search is active simply wrap them in the following if block...

if (GeneralExtensions.IsIndexingActive())
{
	//macro
}

More generally...

The indexer can also be configured to strip out defined parts of the HTML from the page output.
This is controlled by the <TagsToRemove> and <IdsToRemove> settings in the FullTextSearch.config file. So, for example, by default any HTML element with an ID of "mainNavigation" and all it's children are stripped from the output and will not appear in the index. This is a quick and simple way to prevent certain content from appearing in search results.

Preventing Certain Document Types from Appearing in the Index

This can be accomplished in a couple of ways. Adding the node type alias to ExcludeNodeTypes in the ExamineIndex.config file disables all indexing for a specific node type.
Adding the node type alias to the <NoFullTextNodeTypes> list in FullTextSearch.config will prevent the page HTML from being rendered and included in the index, but not prevent any other node properties from being included.

Preventing Certain Pages from Appearing in the Index

It's common practice in Umbraco to have a Property "umbracoNaviHide" that prevents pages from showing up in the main navigation. We've adapted this, and any page with a property of "umbracoSearchHide" won't show up in the index or search results. So this is a useful way to prevent individual pages from being searched. The name(s) of the property(s) that disable indexing for a page are specified in FullTextSearch.config under <DisableSearchPropertyNames>

Changing the indexing method

FullTextSearch has two ways in which it indexes pages.

  1. HTTP Renderer(default)
    This fires off web requests to an address and hostname specified in FullTextSearch.config.
  2. Programmatic Renderer
    This uses a .net Server.Execute command to render page content. This means it can only be run with an active HTTP Context (i.e. synchronously while publishing is taking place). It's theoretically neater, and was initially intended to be the only renderer, but the requirement that it run synchronously can make publishing painfully slow on large sites.

Generally the default HTTP Renderer should work well enough, if you're having problems with it switch it for the Programmatic renderer in the FullTextSearch.config file, remembering to also set PublishEventRendering to true to ensure there is an active HTTP Context for it to work with.

Further Customisation

It's possible to override the default renderers from you own code, as well as hook into several indexing events, in order to customise the indexing process to your needs. Have a look at the advanced customisation instructions.

Search Settings

Macro Parameters

There's a lot of macro parameters to the FullTextSearch.cshtml file, they control how the index is searched, and how the results are output. A list of them, and their use follows...

queryType

Type of search to perform. Possible values are:

MultiRelevance ->

The default. The index is searched for, in order of decreasing relevance

  1. the exact phrase entered in any of the title properties
  2. any of the terms entered in any of the title properties
  3. a fuzzy match for any of the terms entered in any of the title properties
  4. the exact phrase entered in any of the body properties
  5. any of the terms entered in any of the body properties
  6. a fuzzy match for any of the terms entered in any of the body properties

MultiAnd ->
Similar to MultiRelevance, but requires all terms be present

SimpleOr->
Similar to MultiRelevance again, but the exact phrase does not
get boosted, we just search for any term

AsEntered->
Search for the exact phrase entered, if more than one term is present

Note that quoted queries are correctly processed in all search modes except AsEntered.

Other special query types (Boolean, wildcard, etc), are not supported as yet.

titleProperties

A comma separated list of properties that are part of the page title, these will have their relevance boosted by a factor of 10 defaults to nodeName. Set to "ignore" not to search titles.

bodyProperties

A comma separated list of properties that are part of the page body. These properties and the titleProperties will be searched.
defaults to using the full text index (FullTextSearch by default) only

summaryProperties

The list of properties, comma separated, in order of preference, that you wish to use to create the summary to appear under the title. All properties selected must be in the index, cos that's where we pull the data from.
Defaults to Full Text

titleLinkProperties

The list of properties, comma separated, in order of preference,that you wish to use to create the title link for each search result.
Defaults to titleProperties, or if that isn't set nodeName

rootNodes

Comma separated list of root node ids Only nodes which have one of these nodes as a parent will be returned.
Default is to search all nodes

contextHighlighting

Set this to false to disable context highlighting in the summary/title. You may wish to do this if you are having performance issues as context highlighting is (relatively) slow.
Defaults to on.

summaryLength

The maximum number of characters to show in the summary.
Defaults to 300

pageLength

Number of results on a page. Defaults to 20. Set to zero to disable pagination.

fuzzyness

Lucene Queries can be "fuzzy" or exact.
A fuzzy query will match close variations of the search terms, such as plurals etc. This sets how close the search term must be to a term in the index. Values from zero to one. 1.0 = exact matching. Note that fuzzy matching is slow compared to exact or even wildcard matching, if you're having performance issues this is the first thing to switch off.
Defaults to 0.8

useWildcards

Add a wildcard "*" to the end of every search term to make it match anything starting with the search term. This is a slightly faster, but less accurate way of achieving the same ends as fuzzy matching. Note that fuzzyness is automatically set to 1.0 if a wildcards are enabled.
Defaults to off

Further customisation

If the existing macro isn't flexible enough for you, have a look at the advanced customisation file, which contains some information on how to control output. There's an event that's called just before output which allows you to modify search results from your own code, or you can call the search functions from your own user controls rather than rely on the Razor Helper.

Intro

Installation Instructions

Advanced Customisation

Download

Last edited Jul 2, 2013 at 2:43 PM by governor, version 10

Comments

No comments yet.