Solr and ZooKeeper

i2 Analyze uses Solr for text indexing and search capabilities. The ZooKeeper service maintains configuration information and distributed synchronization across Solr and Liberty.

The topology.xml file for a deployment that includes the opal-server application also includes the <zookeepers> and <solr-clusters> elements. The <solr-clusters> and <zookeepers> elements define the Solr cluster that is used in a deployment and the ZooKeeper instance that manages it.

<solr-clusters>

In the supplied topology.xml file that includes the opal-server application with the opal-services-is WAR, the <solr-clusters> definition is:
<solr-clusters>
  <solr-cluster id="is_cluster" zookeeper-id="zoo">
    <solr-collections>
      <solr-collection
        id="main_index" type="main"
        lucene-match-version=""
        max-shards-per-node="4" num-shards="4" num-replicas="1"
      />
      <solr-collection
        id="match_index1" type="match"
        lucene-match-version=""
        max-shards-per-node="4" num-shards="4" num-replicas="1"
      />
      <solr-collection
        id="match_index2" type="match"
        lucene-match-version=""
        max-shards-per-node="4" num-shards="4" num-replicas="1"
      />
      <solr-collection
        id="highlight_index" type="highlight"
        lucene-match-version=""
        max-shards-per-node="4" num-shards="4" num-replicas="1"
      />
      <solr-collection
        id="chart_index" type="chart"
        lucene-match-version=""
        max-shards-per-node="4" num-shards="4" num-replicas="1"
      />
      <solr-collection 
        id="vq_index" type="vq"
        lucene-match-version=""
        max-shards-per-node="4" num-shards="4" num-replicas="1"
      />

    </solr-collections>
    <solr-nodes>
      <solr-node
        memory="2g"
        id="node1"
        host-name=""
        data-dir=""
        port-number="8983"
      />
    </solr-nodes>
  </solr-cluster>
</solr-clusters>

The <solr-clusters> element includes a child <solr-cluster> element. The id attribute of the <solr-cluster> element is a unique identifier for the Solr cluster. To associate the Solr cluster with the ZooKeeper instance, the value of the zookeeper-id attribute must match the value of the id attribute of the <zookeeper> element.

<solr-collections>
The <solr-collections> element is a child of the <solr-cluster> element. The <solr-collections> element has child <solr-collection> elements.
Depending on the WARs that are included in the application, the number and type of required child <solr-collection> elements is different.
opal-services-is
In the opal-services-is WAR, you must have <solr-collection> elements with each of the following values for the type attribute:
  • main - you must have either one or two collections of type main
  • match - you must have two collections of type match
  • highlight - you must have one collection of type highlight
  • chart - you must have one collection of type chart
  • vq - you must have one collection of type vq
opal-services-daod
In the opal-services-daod WAR, you must have one <solr-collection> element with a value of daod for the type attribute.
opal-services-is-daod
In the opal-services-is-daod WAR, you must have <solr-collection> elements with each of the following values for the type attribute:
  • main - you must have either one or two collections of type main
  • daod - you must have one collection of type daod
  • match - you must have two collections of type match
  • highlight - you must have one collection of type highlight
  • chart - you must have one collection of type chart
  • vq - you must have one collection of type vq
The <solr-collection> element has the following attributes:
Attribute Description
id An identifier that is used to identify the Solr collection.
type The type of the collection. The possible values are:
  • main
  • daod
  • match
  • highlight
  • chart
lucene-match-version The Lucene version that is used for the collection. At this release, the value is populated when you deploy i2 Analyze.
num-shards The number of logical shards that are created as part of the Solr collection.
num-replicas The number of physical replicas that are created for each logical shard in the Solr collection.
max-shards-per-node The maximum number of shards that are allowed on each Solr node. This value is the result of num-shards multiplied by num-replicas.
min-replication-factor The minimum number of replicas that an update must be replicated to for the operation to succeed. This value must be greater than 0 and less than or equal to the value of num-replicas.

This attribute is optional.

num-csv-write-threads The number of threads that are used to read from the database and write to the temporary .csv file when indexing data in the Information Store.

This attribute is optional, and applies to Solr collections of type main and match only.

The total of num-csv-write-threads and num-csv-read-threads must be less than the number of cores available on the Liberty server.

num-csv-read-threads The number of threads that are used to read from the temporary .csv file and write to the index when indexing data in the Information Store.

This attribute is optional, and applies to Solr collections of type main and match only.

This value must be less than the value of num-shards.

The total of num-csv-write-threads and num-csv-read-threads must be less than the number of cores available on the Liberty server.

<solr-nodes>
The <solr-nodes> element is a child of the <solr-cluster> element. The <solr-nodes> element can have one or more child <solr-node> elements. Each <solr-node> element has the following attributes:
Attribute Description
id A unique identifier that is used to identify the Solr node.
memory The amount of memory that can be used by the Solr node.
host-name The hostname of the Solr node.
data-dir The location that Solr stores the index.
port-number The port number of the Solr node.

<zookeepers>

In the supplied topology.xml file that includes the opal-server application, the <zookeepers> definition is:
<zookeepers>
  <zookeeper id="zoo">
    <zkhosts>
      <zkhost
        id="1"
        host-name=""
        data-dir=""
        port-number="9983"
        quorum-port-number=""
        leader-port-number=""
      />
    </zkhosts>
  </zookeeper>
</zookeepers>

The <zookeepers> element includes a child <zookeeper> element. The id attribute of the <zookeeper> element is a unique identifier for the ZooKeeper instance. To associate the ZooKeeper instance with the Solr cluster, the value of the id attribute must match the value of the zookeeper-id attribute of the <solr-cluster> element.

The <zkhosts> element is a child of the <zookeeper> element. The <zkhosts> element can have one or more child <zkhost> elements. Each <zkhost> element has the following attributes:
Attribute Description
id A unique identifier that is used to identify the ZooKeeper host. This value must be an integer in the range 1 - 255.
host-name The hostname of the ZooKeeper host.
data-dir The location that ZooKeeper uses to store data.
port-number The port number of the ZooKeeper host.
quorum-port-number The port number that is used for ZooKeeper quorum communication. By default, the value is 10483.
leader-port-number This port number that is used by ZooKeeper for leader election communication. By default, the value is 10983.