Refinement with FAST Search Server 2010 for SharePoint is considerably more powerful than refinement in SharePoint Server 2010.
SharePoint Server 2010 automatically generates ‘shallow’ refinement for search results that enable a user to apply additional filters to their search results based on the values returned by the query. ‘Shallow’ refinement is based on the managed properties returned from the first 50 results by the original query.
FAST Search Server 2010 for SharePoint enables you to specify whether a managed property can be used in a ‘shallow’ or ‘deep’ refinement. ‘Deep’ refinement is based on statistical aggregation of managed property values within the entire result set; ‘shallow’ refinement is just based on, by default, the first 50 results returned by the query. Using ‘deep’ refinement, you can find exactly what you are looking for, such as a person who has written a document about a subject, even if this document would otherwise appear down the result list. ‘Deep’ refinement can also display counts, and lets the user see the number of results in each refinement category.
You can also use the statistical data returned for numeric refinements in other types of analysis.
As content is crawled the indexer discovers document metadata, and the FAST content processing pipeline extracts entities from the content. These properties are called Crawled Properties.
You can map these crawled properties to Managed Properties by using Central Administration or PowerShell
These Managed Properties, that make up the index profile, can then be used in a number of scenarios in your search solution, such as in refiners, scopes, ranking profiles, and advanced queries.
Consistent metadata is critical to the quality of the search experience and this is precisely where the content processing and property extraction capability in FAST Search for SharePoint frequently comes into play.
Deep refiners are one of the most powerful ways such metadata is exposed to users. These refiners facilitate establishing a dialogue with the end-user by offering an at-a-glance overview of the search result set and guiding toward possible choices (to quickly zoom into the information required to complete the task at hand).
Exact counts in deep refiners are computed across the whole result set, and this is important as it helps prevent users drilling down into a “dead end” with zero results. This is also a significant differentiating factor between FAST and the standard search offering in SharePoint.
We will not go into the details here, but consistent metadata has further uses in FAST Search for SharePoint: relevancy tuning, multi-level sorting and advanced search (also known as fielded search).
One of the key differentiators between FAST Search Server 2010 for SharePoint and other search products is the ability for you to add logic to the content processing pipeline, which is used at crawl and indexing time.
The first stage in the pipeline is to extract the text from content, regardless of the document type and format.
Then the language and encoding of the text is determined, which helps with later stages in the pipeline (such as defining which dictionaries and language rules to use for word analysis)
The tokenization stage breaks the stream of text down into individual words by using language-specific word breakers. FAST Search Server 2010 for SharePoint includes very powerful word-breaking capabilities based on spaces and punctuation, but also on language-specific rules for dealing with compound words.
After words have been broken down, FAST Search Server 2010 for SharePoint analyzes the words back to their stem form. This process, which is called lemmatization, is language specific, and FAST Search Server 2010 for SharePoint includes very comprehensive rules for applying this stage.
As I discussed earlier, you can create your own entity extractors to enable your users to search in the terms and language of your business. The entity extraction process happens at this stage in pipeline. FAST Search Server 2010 for SharePoint includes some common entity extractors by default, but you can add to those.
The next stage is to normalize dates and times that were found in document contents and metadata, so that features such as sorting, filtering, and refinement will work consistently.
Then a document vector is generated for each document, which represents an overall analysis of the contents and metadata of the item. This vector is used to compare document similarity, which enables the ‘Similar Results’ feature.
Hyperlinks that point to the document being indexed are then analyzed in terms of their anchor text. This is a good indication of how authoritative the document is when it is returned by a search that contained search terms which match the anchor text.
The final step is to map the metadata and entity values to managed properties, so that the search schema is kept up-to-date.
You can configure optional processing steps in the pipeline, such as building your own entity extractors as XML property mappers, and you can extend the offensive content filter to exclude specific terms from the indexing process.
Even more powerfully, you extend the pipeline by calling external applications for custom item processing.
You can collapse multiple fields into a single managed property.
Some examples of custom processing include
•Geo-tagging content with latitude and longitude values (or other spatial attributes)
•Machine translation between languages
•Sentiment analysis where your own rules for what specific words actually mean are applied.
This custom processing approach uses a specially defined stage that enables you to investigate the set of crawled properties and manipulate their values or map them output to another crawled property. The executables and temporary files are automatically handled in a sandbox with timeouts, so this is a safe process. This stage runs before crawled properties are mapped to managed properties.
One of the most important requirements of an enterprise search system is to be able to index data from external line-of-business applications. The new and enhanced features of Business Connectivity Services provided by the SharePoint platform now make this easier than ever. Administrators can use SharePoint Designer 2010, or developer can use Visual Studio to create connections to external data, and have that data exposed as external lists. SharePoint Server and FAST can then both index that data through the BDC connector.
The slide shows the architectural components that are provided by the SharePoint 2010 platform, and also shows how FAST Search Server 2010 adds to and integrates with those components to enable you to build high-value search solutions.
FAST Search Server 2010 for SharePoint is built on a highly modular architecture where the services can be scaled individually.
By partitioning into multiple indexes, you can index over a billion documents within a single farm.
You can scale the query matching components in a row/column matrix where the columns reflect the index partitioning, and the rows add query performance and fault-tolerance for query evaluation.
FAST Search Server 2010 for SharePoint enables you to optimize for low latency from the moment a document is changed in the source repository to the moment it is searchable. This can be done by proper dimensioning of the crawling, item processing, and indexing to fulfill your requirements. These three parts of the system can be scaled independently through the modular architecture.