Configuring Solr for HADR

The i2 Analyze configuration specifies the structure of your Solr cluster. To provide high availability, configure a Solr cluster over at least two Solr servers.

About this task

In the i2 Analyze configuration, provide the details about the Solr servers in your environment. To provide high availability of the content of the Solr indexes, specify the number of replicas, the minimum replication factor, and the location of the replicas.

For more information about SolrCloud, see How SolrCloud Works.

Procedure

  1. Specifying the Solr nodes.
    To deploy Solr for high availability, you must have at least two Solr nodes on separate hosts.
    1. Add the Solr nodes to your topology.xml file.
      For example:
      <solr-nodes>
        <solr-node
          memory="2g"
          data-dir="C:/i2/i2analyze/data/solr"
          host-name="solr_server1_host_name"
          id="node1"
          port-number="8983"
          />
        <solr-node
          memory="2g"
          data-dir="C:/i2/i2analyze/data/solr"
          host-name="solr_server2_host_name"
          id="node2"
          port-number="8983"
          />
      </solr-nodes>
      Where solr_serverx_host_name is the hostname of a Solr server.

      For more information about the possible values for each attribute, see Solr and ZooKeeper.

  2. Configuring Solr replicas.
    In Solr, the data is stored as documents in shards. Every shard consists of at least one replica. For a highly available solution, you must have more than one replica of each shard and these replicas must be distributed across the servers that host the Solr nodes.
    1. Specifying the replication factor in your topology.xml.
      The replication factor is the number of replicas to be created for each shard. For high availability, this must be 2 or more. You specify the replication factor in the num-replicas attribute of the <solr-collection> element.
    2. Specifying the minimum replication factor in your topology.xml.
      The minimum replication factor defines when data is successfully replicated in Solr. If you have three replicas for a shard and a minimum replication factor of 2, a write operation is deemed successful if the data is written to at least two replicas. You specify the minimum replication factor in the min-replication-factor attribute of the <solr-collection> element.
      The following extract from a topology.xml file shows an example of the num-replicas and min-replication-factor attributes:
      <solr-collections>
        <solr-collection 
          num-replicas="2"
          min-replication-factor="1"
          id="main_index"
          type="main"
          max-shards-per-node="4"
          num-shards="1"
          />
        ...
      </solr-collections>
    3. Specify the Solr autoscaling policies.
      To ensure that your system can still operate if a Solr server or node fails, you must ensure that there is at least on replica of each shard still available. To do this, you can create a replica placement policy to specify that each replica of a shard must be placed on a distinct host.
      • Create the configuration\environment\opal-server\solr.autoscaling.policy.json file.
      • In the file, provide your Solr autoscaling policies. For more information about the policies, see Autoscaling Policy and Preferences.
      • To create a cluster policy that creates each replica of a shard on a distinct host, add the following lines to the file:
        {
           "set-cluster-policy":[
              {
                 "replica":"<2",
                 "shard":"#EACH",
                 "host":"#EACH"
              }
           ]
        }
      • The cluster policy is set when you create the Solr cluster as part of the deployment steps. Additionally, you can update the cluster policy by running the setSolrAutoscalingClusterPolicy toolkit task.

What to do next

Continue configuring the i2 Analyze configuration. For more information, see Deploying i2 Analyze with high availability.