Context Filtering With Solr Suggester
"Q: What did the Filter Query say to the Solr Suggester?
Introduction
The available literature on the Solr Suggester primarily centers on surface-level configuration and common use-cases. This article provides a thorough introduction to the Solr Suggester and discusses its history, design, implementation; and even provides some comprehensive examples of its usage.
This blog post aims to showcase the versatility of the Solr Suggester and the process of achieving context-filtered suggestions in Solr.
A Little Context…
Suppose you have a collection which is comprised of various datasources. For this example, let’s choose two of the datasources, “datasource_A” and “datasource_B”. The goal is to enable suggestions on your search application, but to only return suggestions on documents from datasource_A, excluding any and all documents from datasource_B.
Enter “suggest.cfq”, the parameter which to some degree emulates the well-known Solr fq param. The widely-used fq parameter, however, does not filter results rendered by the suggest component. So issuing a query such as /suggest?q=do&fq=_lw_data_source_s:datasource_A is essentially equivalent to /suggest?q=do, ignoring the filter query completely.
If you look to the Solr documentation you’ll see a note about how Context filtering (suggest.cfq
) is currently only supported by AnalyzingInfixLookupFactory and BlendedInfixLookupFactory, and only when backed by a Document*Dictionary. All other implementations will return unfiltered matches as if filtering was not requested.
May I Suggest a Solution?
In the following example, I create a Fusion collection called suggestTest and assign it two datasources “art” and “tv”.In Fusion, datasources are distinguished by a _lw_data_source_s field. After indexing documents to both datasources, I would like to enable suggestions on one of the datasources, but not the other.
I created a script which automates the process I’m about to describe – you can run the script by cloning the following repo: https://github.com/essiequoi/suggestTest.git
STEP ONE
Make sure to set the following environment variables or else defaults will be used:
$FUSION_HOME (ex: $HOME/Lucid/fusion/fusion2.4.3/)
$FUSION_API_BASE (ex: http://localhost:8764/api/apollo)
$SOLR_API_BASE (ex: http://localhost:8983/solr)
$FUSION_API_CREDENTIALS (ex: admin:password123)
$ZK_HOST (ex: localhost:9983)
STEP TWO
Create the suggestTest collection
STEP THREE
Edit solrconfig.xml to enable suggestions. As mentioned previously, you have the choice of using the AnalyzingInfixLookupFactory or BlendedInfixLookupFactory as your dictionary implementation. In my example, I use the former. We will be suggesting on the title field. The contextField parameter designates the field on which you’ll be filtering. I use the _lw_data_source_s field which holds the name of Fusion datasources.
STEP FOUR
Create the datasources “art” and “tv”. I use the local filesystem connector for both, but connector type is arbitrary in this example. And because I am indexing CSV files, I use the default CSV index pipeline which ships with Fusion. I index the following documents:
art.csv
tv.csv
STEP FIVE
Run the datasources. Once both datasources have finished running, you should have 6 documents total in your entire collection.
STEP SIX
In the suggestTest-default query pipeline, edit the Query Solr stage to allow the /suggest requestHandler.
STEP SEVEN
Build the suggester with http://{host}:8764/api/apollo/query-pipelines/suggestTest-default/collections/suggestTest/suggest?suggest.build=true . A “0” status indicates that the suggester built successfully.
STEP EIGHT
Query the collection (using the /suggest handler) for “ba” and observe the response. Next query for “be”, then for “ch”. You should be returned 2 suggestions for each query.
STEP NINE
Let’s say we’d like to get suggestions for “ba” but only those generated from the “art” datasource. The following query fails: http://{host}:8764/api/apollo/query-pipelines/suggestTest-default/collections/suggestTest/suggest?suggest.q=ba&fq=_lw_data_source_s:art . As you can see, it still returns suggestions from both datasources. Using the suggest.cfq parameter, and entering the appropriate datasource as the value, gets us the correct result.
STEP TEN
Keep searching!
Some Known Issues
It’s important to note that the following JIRAs exist in reference to the suggest.cfq parameter, none of which directly affect the use-case mentioned above:
SOLR-8928 – “suggest.cfq does not work with DocumentExpressionDictionaryFactory/weightExpression”
SOLR-7963 – “Suggester context filter query to accept local params query”
SOLR-7964 – “suggest.highlight=true does not work when using context filter query”
Conclusion
Out of the box, the Solr Suggester is capable of solving even the most specialized of use cases. With a little tweaking of your configuration files and just as much experimentation, you could open up your application to worlds of possibility.
A: Con-text me some time.
LEARN MORE
Contact us today to learn how Lucidworks can help your team create powerful search and discovery applications for your customers and employees.