Identifiers in i2 Analyze records

i2 Analyze records make extensive use of identifiers. Records use identifiers to refer to their type in an i2 Analyze schema, to their original source data, and to other records in ELP relationships. Preparing data for compatibility with i2 Analyze often involves creating or providing the identifiers that form the basis for the reference mechanisms.

Type identifiers

Every i2 Analyze record contains a type identifier, which is a reference to one of the entity types or link types that a schema defines. When you create an ingestion mapping file, an import specification, or a connector, you must arrange for each incoming record to receive the identifier of a type definition.

Every i2 Analyze link record contains two further type identifiers, which are references to the entity types of the records at the ends of the link. You must arrange for incoming link records to receive these identifiers as well.

This strong typing of records in i2 Analyze is key to the analytical functions that the platform provides. It allows users to consider not only the existence of relationships between records, but also the nature of those relationships. Schemas define exactly what relationships to allow between record types, and i2 Analyze enforces those rules during record creation.

Record identifiers

i2 Analyze records are created when you ingest data into the Information Store, or when a user creates an item that contains an i2 Analyze record on the chart surface by:

Importing data through an import specification
Adding the results of an operation against an external source
Using an i2 Analyze palette in Analyst's Notebook Premium

At creation, every i2 Analyze record automatically receives a universally unique record identifier that is permanent for the lifetime of that record. Users and administrators of an i2 Analyze deployment can use the record identifier as a convenient way to refer to a record in features such as text search and the Investigate Add-On.

Source identifiers

The role of a source identifier is to refer to the data for a record reproducibly in its original source. If a record represents data from several sources, then it contains several source identifiers. The nature of a source identifier depends on the source and the record creation method, and sometimes on whether the record is a link or an entity.

When you write ingestion mappings or develop connectors for the i2 Connect gateway, you are responsible for providing the identifying information. For example, if the original source is a relational database, then entity data is likely to have ready-made source identifiers: table names and primary keys. Link data can also have ready-made source identifiers, but it might not, especially if the relationship that the link represents exists only as a foreign key.

If the source of a record is a text file, then the file name might form part of the source identifier, along with some reference to the data within the file.

Note: Source identifiers are not displayed to end users, but they are a part of the data that records contain. Avoid including sensitive information such as passwords, or configuration detail such as IP addresses. Assume that any information you use as part of a source identifier might be read by users of the system.

Origin identifiers

In general, source identifiers are not certain to be unique within a deployment of i2 Analyze. Several users might independently retrieve the same data from an external source, resulting in several records with the same source identifier. However, when you ingest data into the Information Store, i2 Analyze compares the incoming source identifier with existing records. If it finds a match, i2 Analyze updates a record instead of creating one.

The source identifiers that records receive during ingestion therefore are unique within i2 Analyze, and they have a special name in this context. They are called origin identifiers.

Correlation identifiers

The purpose of a correlation identifier is to indicate that the data in an i2 Analyze record pertains to a particular real-world object. As a result, correlation identifiers are usually related to property values rather than other identifiers. (For example, two Person records from different sources that contain the same social security number are likely to contain data about the same real person.) When two records have the same correlation identifier, they represent the same real-world object, and are candidates to be merged.

When you ingest data into the Information Store, you can provide a correlation identifier for each incoming record. For more information about correlation identifiers and how to create them, see Correlation identifiers.