Origin identifiers
The role of a source identifier is to reference the data for a record reproducibly in its original source. The source identifiers that records receive during ingestion are unique within i2 Analyze, and they have a special name in this context. They are called origin identifiers.
The nature of a origin identifier depends on the source and the creation method, and sometimes on whether the record is a link or an entity. When you ingest data into the Information Store, i2 Analyze compares the incoming origin identifier with existing records. If it finds a match, i2 Analyze updates a record instead of creating one.
After you develop your process for creating origin identifiers, you must continue to use that process. If you change the way that your origin identifiers are created and ingest the same data again, the Information Store creates new records for the data instead of updating the existing records. To ensure that changes to data are processed as updates, you must create your origin identifiers consistently.
For more information about different identifiers in i2 Analyze, see Identifiers in i2 Analyze records.
The structure of an origin identifier
type
- The "type" of an origin identifier allows the services in an i2 Analyze deployment to determine quickly whether they are interested in (or how to process) a particular row of data. The value of the type element does not have to be meaningful, but data from different sources generally have different values.
For a deployment that uses a Db2 database, the length of the origin identifier type must not exceed 100 bytes, which is equivalent to 50 two-byte Unicode characters. For a SQL Server database, the limit is 200 bytes, or 100 two-byte Unicode characters.
keys
- The "keys" of an origin identifier contain the
information necessary to reference the data in its
original source. The pieces of information that
you use to make up the keys differs depending on
the source of the data. For data that originates
in relational sources, you might use keys whose
values include the source name, the table name,
and the unique identifier of the data within that
table.The length of the origin identifier keys must not exceed the following sizes:
- On PostgreSQL: 1000 Unicode characters.
- On SQL Server: 692 bytes. This is equivalent to 346 Unicode characters.
- On Db2: 1000 bytes. This is equivalent to 500 Unicode characters.
- It is recommended that your origin identifiers are as short as possible in length, and that any common values are at the end of the key.
- The content of your origin identifier keys must conform to
Common Format and MIME Type for Comma-Separated Values (CSV) Files. i2 Analyze
uses comma (' , ') as the delimeter and double-quote (' " ') as the quote character.
- Do not include comma and double-quote characters next to each other in origin
identifier keys. For example:
123",456
.
- Do not include comma and double-quote characters next to each other in origin
identifier keys. For example:
- Do not use non-printing or control characters in your origin identifiers because they might not be indexed correctly and cause your origin identifiers to be different from your intended values.
Creating origin identifiers for your data
There are two mechanisms for specifying the data for your origin identifiers. You can populate the staging table with all the information required to create the origin identifiers, or you can use a combination of information in the staging table and the ingestion mapping.
When you can provide all the information in the staging tables, there is less processing of the data during ingestion, which can improve ingestion performance.
- All information in the staging table
- If you can populate the staging table with all
the information for your origin identifiers, you
can use the
origin_id_type
andorigin_id_keys
columns to store this information. Populate theorigin_id_type
column with the type of your origin identifier. Populate theorigin_id_keys
column with a unique value that is already a composite of key values including the unique identifier from the source. When you use these columns, you must specify them in the ingestion mapping file that you use.To ingest links, you must specify the origin identifiers at the end of the link. You specify the "to" end of the link in the
When you specify all the information in the staging table, the origin identifier section of your ingestion mapping is more simple. For example:to_origin_id_type
andto_origin_id_keys
columns, and the "from" end infrom_origin_id_type
andfrom_origin_id_keys
.
To specify the origin identifiers at the link ends:... <originId> <type>$(origin_id_type)</type> <keys> <key>$(origin_id_keys)</key> </keys> </originId> ...
... <fromOriginId> <type>$(from_origin_id_type)</type> <keys> <key>$(from_origin_id_keys)</key> </keys> </fromOriginId> <toOriginId> <type>$(to_origin_id_type)</type> <keys> <key>$(to_origin_id_keys)</key> </keys> </toOriginId> ...
- Combination of information in the staging table and the ingestion mapping
- If you cannot populate the staging table with
all the information for your origin identifiers,
you can use the
source_id
column in the staging table to contain the unique identifier from the source and provide any other keys and the origin identifier type in the ingestion mapping.To ingest links, you must specify the origin identifiers at the end of the link. You specify the unique identifier of the "to" end of the link in the
to_source_id
column, and the "from" end infrom_source_id
. In the ingestion mapping, you specify the other keys that make up the origin identifiers of the link ends.When you provide the information in both the staging table and the ingestion mapping, the mapping file is more complex. For example:To specify the origin identifiers at the link ends, if the to end is an "Account" entity type:... <originId> <type>OI.EXAMPLE</type> <keys> <key>$(source_id)</key> <key>PERSON</key> </keys> </originId> ...