Data synchronization in i2 Analyze

In a deployment of i2 Analyze that contains an Information Store database, data is stored in both the database and the Solr search index. In i2 Analyze, the database is considered the source of truth for data in the deployment. When data is added, modified, or removed from the deployment, the data change is applied to the database before those changes are reflected in the search index.

When data changes occur, the database and Solr retain watermarks that record if the operation was successfully completed in the database and Solr index. i2 Analyze routinely checks these watermarks in the deployment. i2 Analyze can respond to the state of the deployment in a number of different ways, and can normally be resolved without manual intervention.
Note: There is no mechanism in i2 Analyze to ensure that data is written to any standby database instances before the watermarks are updated.

Changes to the Solr collection health

If at least on replica for a shard is unavailable, the collection is marked as unhealthy. If there are data changes in the database during this time, the Solr index is updated on the replicas that are available while the minimum replication factor is achieved.

When all of the replicas are available and the Solr collection is marked as healthy again, all of the data changes that occurred when it was in the unhealthy state are reindexed. This means that all data changes are indexed on every replica.

At i2 Analyze application start

When the i2 Analyze application starts, i2 Analyze checks the watermarks.
  • If the deployment is in sync, then the deployment starts as usual.
  • If there is data in the database that isn't in the Solr index, the indexing process indexes the data from the database that is not in the Solr index.
  • If there is data in the Solr index that isn't in the database, i2 Analyze does not start and the following message is displayed:
    The 'main_index' index reports that it is ahead of the database

    This situation can occur if you recover the database from a backup that was taken before the current Solr index. In your back up and restore procedure, ensure that you restore the database from a backup that was taken after the Solr index backup.

Implementation details

The values that i2 Analyze uses to record the state of the deployment are called watermarks. Whenever data is changed in the database, the data change is then indexed in Solr. When Solr has completed indexing the changed data, the watermarks are updated. The watermarks are stored in the database and ZooKeeper. Two watermarks are maintained, a high watermark and a low watermark.
High watermark
The high watermark is updated in both locations when Solr achieves the minimum replication factor. That is, when the index is updated on a specified number of replicas for a shard. When you configure Solr for high availability, you specify the minimum replication factor for each Solr collection.
Low watermark
The low watermark is updated in both locations when Solr achieves the maximum replication factor. That is, when the index is updated on all of the replicas for a shard. The low watermark is only updated after the high watermark, the low watermark cannot have a higher value than the high watermark.