Merge

When one or more pieces of data are determined to represent the same real-world object, the data is merged into a single i2 Analyze record. For the Information Store to merge data, the correlation identifiers must match, the implicit discriminators must be compatible, and the origin identifiers must be different.

During ingestion, a merge operation can occur in the following scenarios:
  • New data in the staging table contains the same correlation identifier as an existing record in the Information Store. The new data has an origin identifier that is not associated with the existing record.
  • An update to an existing record with a single piece of provenance in the Information Store causes the correlation identifier of that record to change. The new correlation identifier matches with another record in the Information Store.
  • Multiple rows of data in the staging table contain the same correlation identifier. The Information Store ingests the data as a new i2 Analyze record, or the record merges with an existing record in the Information Store.
After a merge operation, the following statements are true for the merged record:
  • The record has a piece of provenance for all of the source information that contributed to the merged record.
  • By default, the property values for the merged i2 Analyze record are taken from the source information associated with the provenance that has the most recent value for the source_last_updated column.

    If only one piece of provenance for a record has a value for the source_last_updated column, the property values from the source information that is associated with that provenance are used. Otherwise, the property values to use are determined by the ascending order of the origin identifier keys that are associated with the record. The piece of provenance that is last in the order is chosen. To ensure data consistency, update your existing records with a value for the source_last_updated column before you start to use correlation, and continue to update the value.

    If the default behavior does not match the requirements of your deployment, you can change the method for defining property values for merged records. For more information, see Define how property values of merged records are calculated.

  • If an existing record to be merged contained any notes, the notes are moved to the merged record.
  • If an existing record to be merged was an entity record at the end of any links, the links are updated to reference the merged record.
    Note: Any links that were created through Analyst's Notebook Premium are also updated to reference the merged record.

During ingestion, the number of merge operations that occur is reported in the merge_count column of the ingestion report.

The following diagrams demonstrate the merge operation.

In the first example of a merge operation, data in the staging table is merged into an existing i2 Analyze entity record because the correlation identifiers match and the origin identifiers are different.
Figure 1. Incoming staging data merges with an existing i2 Analyze record.


In the diagram, the correlation identifiers of data in the staging table and the existing i2 Analyze record (a) match, which causes a merge operation. The existing i2 Analyze record (a) is not associated with the origin identifier of the incoming data. In this example, it is assumed that the staging table data is more recent than the existing data. As part of the merge, the property values from the data in the staging table row are used. This results in a change to the value for the first name property from "John" to "Jon". The merged i2 Analyze record (a) now contains provenance for the origin identifier OI.12 and one for the new data, OI.22.
In the second example of a merge operation, data in the staging table causes an update to an existing record (a) that changes the correlation identifier to match another record (b) in the Information Store, causing a merge.
Figure 2. Incoming data updates an existing record, which causes the existing record to merge with another existing record in the Information Store. One of the existing records now has no provenance, and it is removed.


In the diagram, the data in the staging table has a different correlation identifier to the record (a) that it is currently associated with by its origin identifier, and the same correlation identifier as another existing record (b). This causes a merge operation. The first existing record (a) no longer has any provenance associated with it, and is removed. In this example, it is assumed that the staging table data is more recent than the existing data. As part of the merge, the property values from the staging table row are used. This results in a change to the value for the first name property from "Jon" to "John". The merged i2 Analyze record (b) contains multiple pieces of provenance, one for the origin identifier OI.32 and one for the new data, OI.12.