Solr spellcheck compound from several fields

The Solr’s SpellCheck component is designed to provide inline spell checking of queries (i.e. query suggestions or “Did You Mean”) in case it thinks the input query might have been misspelled. The words can be loaded from text files, a field in Solr, or even from several fields (more than one field).

In order for making the spellcheck loading the words from several fields you need to:

Declare a new field and copy all the fields, of which their words should be part of the spellcheck index, into the new field

The declaration of the new field and the copy should be configured in the the schema.xml file.

New field declaration

You should pay attension to the following properties:

  • Field type: It is important that you declared it as field type textSpell (Solr’s Spellcheck only works with field type textSpell).
  • MultiValued: because your index consist from several fields (not one) you must declare it as multi valued.
  • Stored: for space issue, declare the field as not stored (in my case it was a difference between increase of 13% vs 43% of space).
<!-- multiple field spell check -->
<field name="didYouMean" type="textSpell" indexed="true" stored="false" multiValued="true"/>

Copy the fields into the new field

Suppose the following are the fields you would like your spell check consist from:

<field name="q" type="text" indexed="true" stored="true" />
<field name="tn" type="text" indexed="true" stored="true" />
<field name="an" type="text" indexed="true" stored="true" />

The following statements copy all the fields, of which their words should be part of the spellcheck index, into the new field.

<copyField source="q" dest="didYouMean"/>
<copyField source="tn" dest="didYouMean"/>
<copyField source="an" dest="didYouMean"/>

Configure Solr to use the new field

The configuration of Solr to use the new field (to specify the field name on which the spell check will operate on) is done in the solrconfig.xml file.

<searchComponent name="spellcheck" class="solr.SpellCheckComponent">
    <str name="queryAnalyzerFieldType">textSpell</str>
    <lst name="spellchecker">
      <str name="name">default</str>
      <str name="field">didYouMean</str>
      <str name="spellcheckIndexDir">./spellchecker</str>
           <str name="buildOnCommit">true</str>
     </lst>
 
 </searchComponent>
  <requestHandler name="/spell" class="solr.SearchHandler" lazy="true">
     <lst name="defaults">
      <str name="spellcheck.onlyMorePopular">false</str>
      <str name="spellcheck.extendedResults">false</str>
      <str name="spellcheck.count">5</str>
            <str name="spellcheck">on</str>
      <str name="spellcheck.collate">true</str>
     </lst>
    <arr name="last-components">
      <str>spellcheck</str>
    </arr>
   </requestHandler>

How to make configuration changes into effect

In order to make the spellcheck configurations into effect follow the following steps:

  1. Restart your server (e.g. sudo /etc/init.d/jetty restart)
  2. Reload config and full import with cleaning (http://localhost:8983/solr/test/admin/dataimport.jsp?handler=/dataimport)
  3. Test your regular index (http://localhost:8983/solr/test/select/?q=*:*&start=0&rows=10&indent=on)
  4. Test the spell check index (http://localhost:8983/solr/test/spell/?q=helllo&version=2.2&start=0&rows=10&indent=on&spellcheck=true&spellcheck.build=true&spellcheck.collate=true)

Notes

  • If you query for a sentence that contain more than one word, the spellcheck response will contain alternatives for each word not found in the index. The spellcheck.collate=true causes a modified version of the original query (the sentence) to be returned with the most likely alternatives.
  • Note the spellcheck.build=true which is needed only once to build the spellcheck index from the main Solr index. It takes time and should not be specified with each request. SpellCheckComponent can be configured to automatically (re)build indices based on fields in Solr index when a commit is done. In order to do so you must enable this feature by adding the following line in your SpellCheckComponent configuration for each spellchecker where you wish it to apply:
    <str name="buildOnCommit">true</str>

I will be happy to receive any comment from you.

3 Comments
  1. Thibault says:

    Hi, i’ve followed your instructions but when i restart jetty, i’ve the following error : “Unknown fieldtype ‘textSpell’ specified on field didYouMean”

    Are you sure that it exists in schema.xml ? If so, do you think i juste have de deprecated version of solr ?

  2. Thanks for the article!

    Actually I was looking for something else, I need spell check for queries with several fields on it, e.g. “first_name:Jonh and middle_name:Josehp and nationality:Franc and country:Ierland”, where some fields have different index (e.g. first_name and middle_name; country and nationality).

    In this case, I suppose we copy the first_name and middle_name into a new “didYouMean_Names”, and we copy country and nationality into “didYouMean_Nations”.

    Next to that we need two searchcomponents, how can we add both to the request handler?

    It would be nice to have that in this article too…

Leave a Reply

*