Configuring the Solr index

By default, the Solr index assumes that text data in i2 Analyze follows the conventions of Western European languages, and uses a list of synonyms that contains US English terms. It also treats letters like o, ó, and ö as equivalent for the purposes of filtering search results. You can change these settings so that the index meets the language requirements of your data.

About this task

To configure the Solr index, modify the schema-template-definition.xml file. In the template definition, you can control the following aspects of Solr:

  • How Solr returns data based on the language of the data in the index

  • The Solr synonyms file

  • Whether to preserve diacritics in search result filters. For example to treat a, å, and ä as equivalent or not.

By default, the Solr search index is configured for Western European languages, and uses synonym lists that contain US English terms. The template definition configuration file includes template options for ar_EG (Arabic) and he_IL (Hebrew) languages.

Each language template has a default synonyms file that is associated with the template language. You can change the default synonyms file to use a customized synonyms file. For more information, see Creating a synonyms file.

By default, diacritics are not preserved in filters. This means that "Amélie" and "Amelie" both contribute to the "amelie" filter To ensure that "Amélie" and "Amelie" create separate filters, you can preserve the diacritics.

Note: Changing the diacritic behavior for filters does not affect the behavior when searching.

If you need to configure the behavior of the Solr index, do this in your configuration development environment. If your system contains data, you must reindex after changing the Solr configuration.

Procedure

  1. To modify the Solr index configuration, open the configuration\solr\schema-template-definition.xml file in an XML editor.

  2. Specify the language template and synonyms file for the index.

    1. To specify the language template and default synonyms file, uncomment the section of the template file for the language you want to use.

      For example, to use Arabic uncomment the <Definition> and <SynonymsFile> file elements in the ar_EG config section:

      <!-- ar_EG config -->
        <Definition Analyzer="free_text">
          <AnalyzerChain>
            <tokenizer class="solr.StandardTokenizerFactory"/>
            <filter class="solr.LowerCaseFilterFactory"/>
            <filter class="solr.ASCIIFoldingFilterFactory" preserveOriginal="false"/>
          </AnalyzerChain>
          <PostSynonyms>
            <filter class="solr.ArabicNormalizationFilterFactory"/>
            <filter class="solr.ArabicStemFilterFactory"/>
          </PostSynonyms>
        </Definition>
      
        <Definition Analyzer="text_facet">
          <AnalyzerChain>
            <filter class="solr.LowerCaseFilterFactory"/>
            <filter class="solr.ASCIIFoldingFilterFactory" preserveOriginal="false"/>
            <filter class="solr.ArabicNormalizationFilterFactory"/>
            <filter class="solr.ArabicStemFilterFactory"/>
          </AnalyzerChain>
        </Definition>
      
        <SynonymsFile Path="synonyms-ar_EG.txt" />
    2. To specify a different synonyms file, place the file in the configuration\solr directory and provide the file name in the <SynonymsFile> element.

      For example:

      <SynonymsFile Path="custom-synonyms.txt" />

      For more information about creating a custom synonyms file, see Creating a synonyms file.

  3. To preserve the diacritics in facets, you must complete the previous step for your chosen template and remove the following line from the text_facet analyzer:

    <filter class="solr.ASCIIFoldingFilterFactory" preserveOriginal="false"/>

Update the application with your configuration changes.

Note: This procedure removes the data from your Solr index. When Liberty is started, Solr reindexes the data in the Information Store database.

  1. On the Liberty server, open a command prompt and go to the toolkit\scripts directory.

  2. Stop i2 Analyze.

    1. If you are using a single server deployment, run setup -t stop.

    2. If you are using a multiple server deployment, complete the steps to stop the components of i2 Analyze in Stopping and starting i2 Analyze.

  3. Redeploy i2 Analyze:

    setup -t deployLiberty
  4. Create and upload the Solr configuration to ZooKeeper:

    setup -t createAndUploadSolrConfig --hostname 'liberty.host-name'

    Here, liberty.hostname is the hostname of the Liberty server where you are running the command. It matches the value for the host-name attribute of the <application> element in the topology.xml file.

  5. Clear the search index:

    setup -t clearSearchIndex --hostname 'liberty.host-name'

    Here, liberty.hostname is the hostname of the Liberty server where you are running the command. It matches the value for the host-name attribute of the <application> element in the topology.xml file.

  6. Start i2 Analyze

    1. If you are using a single server deployment, run setup -t start.

    2. If you are using a multiple server deployment, complete the steps to start the components of i2 Analyze in Stopping and starting i2 Analyze.

What to do next

Run a selection of queries against your deployment server to test the configuration.