Correlation identifiers

The role of a correlation identifier is to indicate that data is about a specific real-world object. If multiple pieces of data are about the same specific real-world object, they have the same correlation identifier. At ingestion time, the correlation identifier of incoming data informs the Information Store how to process that data. Depending on the current state of the i2 Analyze record that is associated with the incoming data, a match with the correlation identifier on an inbound row of data determines the outcome of the association.

You specify the values for the correlation identifier in the staging table that you are ingesting the data from. The correlation identifier is made up of two parts, the correlation identifier type and the correlation identifier key.

type

The type of a correlation identifier specifies the type of correlation key that you are using as part of the correlation identifier. If you are generating correlation keys by using different methods, you might want to distinguish them by specifying the name of the method as the correlation identifier type. If your correlation keys are consistent regardless of how they are created, you might want to use a constant value for the correlation identifier type.

When you specify the identifier type, consider that this value might be seen by analysts.

The length of the value for the type must not exceed 100 bytes. This value is equivalent to 50 Unicode characters.

key

The key of a correlation identifier contains the information necessary to identify whether multiple pieces of data represent the same real-world object. If multiple pieces of data represent the same real-world object, they have the same correlation identifier key.

The length of the correlation identifier key must not exceed the following sizes:

On Db2: 1000 bytes. This is equivalent to 500 Unicode characters.
On SQL Server: 692 bytes. This is equivalent to 346 Unicode characters.

To prepare your data for correlation by i2 Analyze, you might choose to use a matching engine or context computing platform. Matching engines and context computing platforms can support the identification of matches that enable you to identify when data that is stored in multiple sources represents a single entity. You can provide these values to the Information Store at ingestion time. An example of such a tool is IBM InfoSphere Identity Insight. InfoSphere Identity Insight provides resolved entities with an entity identifier. If you are using such a platform, you can populate the correlation identifier type to record this, for example identityInsight. You might populate the correlation identifier key with the entity identifier, for example 1234. This generates a correlation identifier of identityInsight.1234. For more information about IBM InfoSphere Identity Insight, see Overview of IBM InfoSphere Identity Insight.

Alternatively, as part of the data processing to add data to the staging tables, you might populate the correlation identifier with values from property fields that distinguish entities. For example, to distinguish People entities you might combine the values for their date of birth and an identification number, and you might specify the type as manual. This generates a correlation identifier of manual.1991-02-11123456.

The complete correlation identifier is used for comparison. Only data with correlation identifiers of the same type is correlated.

For more information about specifying a correlation identifier during the ingestion process, see Information Store staging tables.

Implicit discriminators

In addition to the correlation identifier that is created from the type and key values that you provide, implicit discriminators are also used during the matching process. In addition to the correlation identifier, the following implicit discriminators are also compared. The implicit discriminators must be compatible to enable correlation to occur.

Item type: The item type of the data that you are ingesting must be the same as the item type of the existing i2 Analyze record that is matched by the correlation identifier. If the item types are not the same, then no correlation operations occur.
Security dimension values: The security dimension values of the data that you are ingesting must be the same as the data that is matched by the correlation identifier. If the security dimension values are not the same, then no correlation operations occur.
Link direction and ends: For link data, the link direction and ends of the data that you are ingesting must be the same as the data that is matched by the correlation identifier. If the link direction and ends are not the same, then no correlation operations occur.
The direction and ends of a link are inspected, and direction is respected. For example, a link from A to B of direction 'WITH' matches with a link from B to A of direction 'AGAINST'. A link from A to B of direction 'WITH' does not match with a link from A to B of direction 'AGAINST'.