Solr provides the facility to configure
the synonyms that are used for querying textual
data. In i2 Analyze, you can use this option to apply a customized
list of synonyms at query time. Synonyms, if not
accounted for, can cause a reduction in the
relevance of a search result when you search for
keywords that are present in alternative forms in
your index.
About this task
The synonyms file is the part of the Solr
configuration that accounts for the presence of
synonyms in your data. For example, your data
might contain the words, “bag, handbag,
pocketbook, purse” for the concept “bag”. When
someone searches they are likely to search for
one, but expect results for all four. To meet that
expectation, you might want to create a customized
synonyms file to accommodate similar variations
that are specific to your data. The exact words in
a synonyms list that are most useful in your
deployment depend on the content of your data. You
can also use a mix of languages, which might be
useful in some contexts, for example names:
'George, Γεώργιος, Jorge'. The default synonyms
file and synonyms list are in US English. The
synonyms files that are associated by default with
each supported language are supplied in the
directory,
toolkit\configuration\solr.
To
customize the alternative terms that are used in
search operations for your data, you can create
files that contain different terms from those
terms that are contained in the supplied synonyms
files.
The customized file must adhere to
the following guidelines:
- The file must be UTF-8 encoded.
- The terms in the file must match the terms
that are produced by the analyzer chain that is
used in Solr prior to the synonym filter being
applied.
- If multiple forms of a word exist, all the
forms must be specified in order for synonym
matching to work on each form.
- Words from Latin script languages, for example
French or Italian, must be specified without
diacritics. For example, use the following substitution:
- a instead of á
- c instead of ç
- Arabic and Hebrew words must be specified
exactly as they are written.
Procedure
-
Create a text file that defines synonyms in the required Solr format.
For more information, see https://lucene.apache.org/core/8_2_0/analyzers-common/org/apache/lucene/analysis/synonym/SolrSynonymParser.html.
Note:
- You cannot search for multi-word terms. However, if you have data that contains terms "USA" and
"United States of America", you can search for "USA" and use a synonym to ensure a match with
"United States of America".
- You can provide synonyms for terms that include punctuation. However, a search on such a term
might not work correctly. The unexpected result is because a filter is applied before synonyms,
which means, for example, "Mary-Ann" becomes "Mary,Ann" and then synonyms are expanded from "Mary
and "Ann"; not "Mary-Ann" or "Maryann".
-
Save the file with a
.txt extension, for example
custom-synonyms.txt.
-
Complete the instructions in Configuring the Solr index to deploy
with your synonyms file.