Have you ever wanted to know if you can use your search terms to greater effect? Or how search results that do not have exact matches still make a lot of sense? If you have wondered why your search results are what they are, join us in a sneak peek at search behind the scenes. This blog post is the first in a series of articles that seek to shed light on how search finds your info, sifts through it, and organizes, so you can use this insight to your advantage in your everyday interactions with Salesforce search.
It’s Multiple Searches Really
If you’ve used Salesforce even briefly, chances are that you have been on the search results page. When you enter your search terms and click Search, in reality, separate searches are triggered for all the objects in your default search scope, the results from which are collated on to your search results page. Also, bear in mind that some objects store information differently. Files, articles, documents, and Chatter feed posts contain large chunks of unstructured text, while Accounts, Contacts, Opportunities store information in a more structured fashion (fields). Further, the amount of data that search has to index and mine through impacts search speed.
Searchable fields are different for every object.
See Search Fields in the Salesforce help for details.
It’s More Than You Think …
Your search terms are not used as is in our search queries; we reformulate your search terms using natural language analysis algorithms (tokenization, lemmatization etc.) and other techniques to refine your search query. This process is called query expansion or query reformulation.
The first important process to affect your searches is tokenization, because it defines what is stored in your search indexes. Tokenization breaks down your search text into smaller pieces at spaces, punctuation, and alphanumeric boundaries. Your organization data is similarly tokenized and stored in the organization indexes. So when you perform a search, your results actually include matches on the tokens in your search string. See how, in the example in the diagram, acme2 is broken down to acme and 2. Tokenization is different for languages where there are no clearly defined word boundaries, such as Chinese, Japanese, Korean, and Thai (CJKT). For CJKT searches, your organization data and search terms are tokenized at bisyllabic boundaries.
For the files, articles, and solutions query types, search uses a stopword list for every supported language in Salesforce to remove words such as “the”, “to”, and “for” in your searches. Removing these fairly common words from the search query prevents the dilution of your search results.
We also pay close attention to any operators (AND, AND NOT, OR, parentheses, quotation marks) in your search text. So you can use operators to your advantage to really focus your search results. When we process your search text, we check for the default operator associated with each searched object and inject it into the search query. When you explicitly specify an operator, you override the default operator. So if you typed Charlie AND Smith into your search string, we only return items that contain both Charlie and Smith. The default operator for most objects is AND, except for articles, solutions, and documents (default OR searches).
Operators are not case-sensitive.
We recognize lowercase operators in your search text as well.
Another step in the reformulation involves processing the wildcards in your search query. The asterisk (*) is used to match one or more characters. The question mark (?) wildcard only matches single characters of the same type as the character it is placed after. Wildcards help broaden your search results in the following ways:
So for the example in the diagram, if you wanted to match acme2 using a wildcard search, the best way to do it would be to use acme* and not acme?.
We use lemmatization to match on inflected forms of the verbs and nouns in your search. Your search results include matches on the actual terms you provide, the root term called the lemma, and the inflected forms of the root as long as they are the same part of speech. So in the example in the diagram, a search for run matches items containing run, running, and ran, but not runner. There are a couple other query expansion steps that come into play for Salesforce Knowledge queries, but we will not go into those details at this point. Finally, after the reformulation process is complete, the search query is sent to the Salesforce search query servers to find matching items in your indexes.
We hope that this insight into the query reformulation process gives you some ideas to tweak your search terms to your advantage. Stay tuned for more information on how we find and organize your search results!