Solr spellcheck compound from several fields
The Solr’s SpellCheck component is designed to provide inline spell checking of queries (i.e. query suggestions or “Did You Mean”) in case it thinks the input query might have been misspelled. The words can be loaded from text files, a field in Solr, or even from several fields (more than one field).
In order for making the spellcheck loading the words from several fields you need to:
Declare a new field and copy all the fields, of which their words should be part of the spellcheck index, into the new field
The declaration of the new field and the copy should be configured in the the schema.xml file.
New field declaration
You should pay attension to the following properties:
- Field type: It is important that you declared it as field type textSpell (Solr’s Spellcheck only works with field type textSpell).
- MultiValued: because your index consist from several fields (not one) you must declare it as multi valued.
- Stored: for space issue, declare the field as not stored (in my case it was a difference between increase of 13% vs 43% of space).
<!-- multiple field spell check --> <field name="didYouMean" type="textSpell" indexed="true" stored="false" multiValued="true"/>
Copy the fields into the new field
Suppose the following are the fields you would like your spell check consist from:
<field name="q" type="text" indexed="true" stored="true" /> <field name="tn" type="text" indexed="true" stored="true" /> <field name="an" type="text" indexed="true" stored="true" />
The following statements copy all the fields, of which their words should be part of the spellcheck index, into the new field.
<copyField source="q" dest="didYouMean"/> <copyField source="tn" dest="didYouMean"/> <copyField source="an" dest="didYouMean"/>
Configure Solr to use the new field
The configuration of Solr to use the new field (to specify the field name on which the spell check will operate on) is done in the solrconfig.xml file.
<searchComponent name="spellcheck" class="solr.SpellCheckComponent"> <str name="queryAnalyzerFieldType">textSpell</str> <lst name="spellchecker"> <str name="name">default</str> <str name="field">didYouMean</str> <str name="spellcheckIndexDir">./spellchecker</str> <str name="buildOnCommit">true</str> </lst> </searchComponent> <requestHandler name="/spell" class="solr.SearchHandler" lazy="true"> <lst name="defaults"> <str name="spellcheck.onlyMorePopular">false</str> <str name="spellcheck.extendedResults">false</str> <str name="spellcheck.count">5</str> <str name="spellcheck">on</str> <str name="spellcheck.collate">true</str> </lst> <arr name="last-components"> <str>spellcheck</str> </arr> </requestHandler>
How to make configuration changes into effect
In order to make the spellcheck configurations into effect follow the following steps:
- Restart your server (e.g. sudo /etc/init.d/jetty restart)
- Reload config and full import with cleaning (http://localhost:8983/solr/test/admin/dataimport.jsp?handler=/dataimport)
- Test your regular index (http://localhost:8983/solr/test/select/?q=*:*&start=0&rows=10&indent=on)
- Test the spell check index (http://localhost:8983/solr/test/spell/?q=helllo&version=2.2&start=0&rows=10&indent=on&spellcheck=true&spellcheck.build=true&spellcheck.collate=true)
- If you query for a sentence that contain more than one word, the spellcheck response will contain alternatives for each word not found in the index. The spellcheck.collate=true causes a modified version of the original query (the sentence) to be returned with the most likely alternatives.
- Note the spellcheck.build=true which is needed only once to build the spellcheck index from the main Solr index. It takes time and should not be specified with each request. SpellCheckComponent can be configured to automatically (re)build indices based on fields in Solr index when a commit is done. In order to do so you must enable this feature by adding the following line in your SpellCheckComponent configuration for each spellchecker where you wish it to apply:
I will be happy to receive any comment from you.