In a deployment that provides high availability, ZooKeeper is used to manage the Solr cluster and maintain Liberty leadership information. ZooKeeper uses an active/active pattern that allows all ZooKeeper servers to respond to requests.

Detecting server failure

To detect ZooKeeper server failures:
  • The IBM_i2_Component_Availability.log contains messages that report the status of the ZooKeeper quorum. For more information, see ZooKeeper status messages.
  • The Solr Web UI contains the status of the ZooKeeper quorum: http://localhost:8983/solr/#/~cloud?view=zkstatus. Where localhost and 8983 are replaced with the host name and port number of one of your Solr nodes.

Manual and automatic fail over

The ZooKeeper quorum continues to function while more than half of the total number of ZooKeeper hosts in the quorum are available. For example, if you have three ZooKeeper servers, your system can sustain one ZooKeeper server failure.

Recovering failed servers

There are a number of reasons why a server might fail. Use the logs from the server to diagnose and solve the issue.

ZooKeeper logs:
  • The ZooKeeper logs are located in the i2analyze\data\zookeeper\8\ensembles\zoo\zkhosts\1\logs directory on the failed ZooKeeper server. Where 1 is the identifier of the ZooKeeper host on the server.
  • The application logs are in the deploy\wlp\usr\servers\opal-server\logs directory.

For more information about the different log files and their contents, see Deployment log files.

To recover the failed server you might need to restart the server, increase the hardware specification, or replace hardware components.

Reinstating high availability

On the recovered ZooKeeper server, run setup -t startZkHosts -hn zookeeper.hostname to start the ZooKeeper host on the server.

You can use the IBM_i2_Component_Availability.log and Solr Web UI ZooKeeper status view to ensure that the system returns to its previous state.