Solr and ZooKeeper

i2 Analyze uses Solr for text indexing and search capabilities. The ZooKeeper service maintains configuration information and distributed synchronization across Solr and Liberty.

The topology.xml file for a deployment that includes the opal-server application also includes the <zookeepers> and <solr-clusters> elements. The <solr-clusters> and <zookeepers> elements define the Solr cluster that is used in a deployment and the ZooKeeper instance that manages it.

<solr-clusters>

In the supplied topology.xml file that includes the opal-server application with the opal-services-is WAR, the <solr-clusters> definition is:

<solr-clusters>
  <solr-cluster id="is_cluster" zookeeper-id="zoo">
    <solr-collections>
      <solr-collection
        id="main_index" type="main"
        lucene-match-version=""
        num-shards="4" num-replicas="1"
      />
      <solr-collection
        id="match_index1" type="match"
        lucene-match-version=""
        num-shards="4" num-replicas="1"
      />
      <solr-collection
        id="match_index2" type="match"
        lucene-match-version=""
        num-shards="4" num-replicas="1"
      />
      <solr-collection
        id="highlight_index" type="highlight"
        lucene-match-version=""
        num-shards="4" num-replicas="1"
      />
      <solr-collection
        id="chart_index" type="chart"
        lucene-match-version=""
        num-shards="4" num-replicas="1"
      />
      <solr-collection 
        id="vq_index" type="vq"
        lucene-match-version=""
        num-shards="4" num-replicas="1"
      />
      <solr-collection
        id="recordshare_index" type="recordshare"
        lucene-match-version=""
        num-shards="4" num-replicas="1"
      />
    </solr-collections>
    <solr-nodes>
      <solr-node
        memory="2g"
        id="node1"
        host-name=""
        data-dir=""
        port-number="8983"
      />
    </solr-nodes>
  </solr-cluster>
</solr-clusters>

The <solr-clusters> element includes a child <solr-cluster> element. The id attribute of the <solr-cluster> element is a unique identifier for the Solr cluster. To associate the Solr cluster with the ZooKeeper instance, the value of the zookeeper-id attribute must match the value of the id attribute of the <zookeeper> element.

<solr-collections>

The <solr-collections> element is a child of the <solr-cluster> element. The <solr-collections> element has child <solr-collection> elements.

Depending on the WARs that are included in the application, the number and type of required child <solr-collection> elements is different.

  • opal-services-is

    In the opal-services-is WAR, you must have <solr-collection> elements with each of the following values for the type attribute:

    • main - you must have either one or two collections of type main

    • match - you must have two collections of type match

    • highlight - you must have one collection of type highlight

    • chart - you must have one collection of type chart

    • vq - you must have one collection of type vq

    • recordshare - you must have one collection of type recordshare

  • opal-services-daod

    In the opal-services-daod WAR, you must have one <solr-collection> element with a value of daod for the type attribute.

  • opal-services-is-daod

    In the opal-services-is-daod WAR, you must have <solr-collection> elements with each of the following values for the type attribute:

    • main - you must have either one or two collections of type main

    • daod - you must have one collection of type daod

    • match - you must have two collections of type match

    • highlight - you must have one collection of type highlight

    • chart - you must have one collection of type chart

    • vq - you must have one collection of type vq

    • recordshare - you must have one collection of type recordshare

The <solr-collection> element has the following attributes:

Attribute

Description

id

An identifier that is used to identify the Solr collection.

type

The type of the collection. The possible values are: main, daod, match, highlight, chart, vq, recordshare.

lucene-match-version

The version of the Lucene matching behavior for the collection. At this release, the value is populated when you deploy i2 Analyze.

num-shards

The number of logical shards that are created as part of the Solr collection.

num-replicas

The number of physical replicas that are created for each logical shard in the Solr collection.

min-replication-factor

The minimum number of replicas that an update must be replicated to for the operation to succeed. This optional value must be greater than 0 and less than or equal to the value of num-replicas.

num-csv-write-threads

The number of threads that are used to read from the database and write to the temporary CSV file when indexing data in the Information Store. This attribute is optional, and applies to Solr collections of type main and match only. The total of num-csv-write-threads and num-csv-read-threads must be less than the number of cores available on the Liberty server.

num-csv-read-threads

The number of threads that are used to read from the temporary CSV file and write to the index when indexing data in the Information Store. This attribute is optional, and applies to Solr collections of type main and match only. This value must be less than the value of num-shards. The total of num-csv-write-threads and num-csv-read-threads must be less than the number of cores available on the Liberty server.

<solr-nodes>

The <solr-nodes> element is a child of the <solr-cluster> element. The <solr-nodes> element can have one or more child <solr-node> elements. Each <solr-node> element has the following attributes:

Attribute

Description

id

A unique identifier that is used to identify the Solr node.

memory

The amount of memory that can be used by the Solr node.

host-name

The hostname of the Solr node.

data-dir

The location where Solr stores the index.

port-number

The port number of the Solr node.

<zookeepers>

In the supplied topology.xml file that includes the opal-server application, the <zookeepers> definition is:

<zookeepers>
  <zookeeper id="zoo">
    <zkhosts>
      <zkhost
        id="1"
        host-name=""
        data-dir=""
        port-number="9983"
        quorum-port-number=""
        leader-port-number=""
      />
    </zkhosts>
  </zookeeper>
</zookeepers>

The <zookeepers> element includes a child <zookeeper> element. The id attribute of the <zookeeper> element is a unique identifier for the ZooKeeper instance. To associate the ZooKeeper instance with the Solr cluster, the value of the id attribute must match the value of the zookeeper-id attribute of the <solr-cluster> element.

The <zkhosts> element is a child of the <zookeeper> element. The <zkhosts> element can have one or more child <zkhost> elements. Each <zkhost> element has the following attributes:

Attribute

Description

id

A unique identifier that is used to identify the ZooKeeper host. This value must be an integer in the range 1 - 255.

host-name

The hostname of the ZooKeeper host.

data-dir

The location that ZooKeeper uses to store data.

port-number

The port number of the ZooKeeper host.

quorum-port-number

The port number that is used for ZooKeeper quorum communication. By default, the value is 10483.

leader-port-number

The port number that is used by ZooKeeper for leader election communication. By default, the value is 10983.