Information Store staging tables

At your request, i2 Analyze generates an Information Store staging table that can contain data for i2 Analyze records of a single entity type or link type. To generate the staging table, it uses information from the i2 Analyze schema, which is the same starting point from which it generates the main data tables during deployment.

An entity type staging table contains:

  • At least one column for each property type in the schema.
  • Columns for storing information that uniquely identifies data in its source. During ingestion, i2 Analyze can use this information to construct an origin identifier for the ingested data.
  • Two columns to record when the data was created and updated in the source.
  • Three columns for storing information that describes the source of the data. During ingestion, i2 Analyze uses this information to populate a source reference for the ingested data.
  • Two columns to record the correlation identifier type and correlation identifier key of the data. During ingestion, i2 Analyze uses the information in these columns to construct the correlation identifier for the ingested data.
  • A column for each security dimension that the security schema defines. During ingestion, i2 Analyze can use the information in these columns to implement per-record security.

For example, if the i2 Analyze schema contains this simplified entity type definition:

<EntityType Id="ET5" DisplayName="Person">
        <PropertyType DisplayName="First (Given) Name"
                      LogicalType="SINGLE_LINE_STRING" Id="PER4"/>
        <PropertyType DisplayName="Birth place location"
                      LogicalType="GEOSPATIAL" Id="PER5"/>
        <PropertyType DisplayName="Date of Birth"
                      LogicalType="DATE" Id="PER9"/>
        <PropertyType DisplayName="Date and Time of Death"
                      LogicalType="DATE_AND_TIME" Id="PER10"/>

This SQL statement is then the definition of a corresponding staging table in Db2:

CREATE TABLE "IS_Staging"."E_Person" (
        "source_id" VARCHAR(50),
        "origin_id_type" VARCHAR(100),
        "origin_id_keys" VARCHAR(1000),
        "source_created" TIMESTAMP,
        "source_last_updated" TIMESTAMP,
        "correlation_id_type" VARCHAR(100),
        "correlation_id_key" VARCHAR(1000),
        "p_first_given_name" VARCHAR(250),
        "p_birth_place_location" VARCHAR(250),
        "p_date_of_birth" DATE,
        "p0_date_and_time_of_deat" TIMESTAMP,
        "p1_date_and_time_of_deat" VARCHAR(250),
        "p2_date_and_time_of_deat" SMALLINT,
        "p3_date_and_time_of_deat" TIMESTAMP,
        "security_level" VARCHAR(50),
        "security_compartment" VARCHAR(50),
        "source_ref_source_type" VARGRAPHIC(200),
        "source_ref_source_location" DBCLOB(2000),
        "source_ref_source_image_url" DBCLOB(2000)

And this SQL statement is then the definition of a corresponding staging table in SQL Server:

CREATE TABLE IS_Staging.E_Person(
        source_id nvarchar(50) NULL,
        origin_id_type nvarchar(100),
        origin_id_keys nvarchar(1000),
        source_created datetime2(6) NULL,
        source_last_updated datetime2(6) NULL,
        correlation_id_type nvarchar(100) NULL,
        correlation_id_key nvarchar(1000) NULL,
        p_first_given_name varchar(250) NULL,
        p_birth_place_location varchar(max) NULL,
        p0_date_and_time_of_deat datetime2(4) NULL,
        p1_date_and_time_of_deat nvarchar(250) NULL,
        p2_date_and_time_of_deat smallint NULL,
        p3_date_and_time_of_deat datetime2(6) NULL,
        security_level nvarchar(50) NULL,
        security_compartment nvarchar(50) NULL,
        source_ref_source_type nvarchar(200) NULL,
        source_ref_source_location nvarchar(2000) NULL,
        source_ref_source_image_url nvarchar(2000) NULL
Note: Additionally, staging tables for link types contain a column for the direction of the link, and further columns for the information that uniquely identifies the link end data in the source.

The statements create the staging table in a separate schema from the Information Store data tables. Many of the columns in the staging table have names that are derived from the display names of the property types in the i2 Analyze schema. In most cases, the relationship between the schema and the staging table is obvious, but there are a number of extra columns and differences:

  • The source_id, origin_id_type, origin_id_keys columns of the staging table can be used to store values that reference the rest of the data in its original source and can be used to make up the origin identifier of the resulting record.
    Note: If the staging table definition was for a link type, it would also contain from_ and to_ variations of each of the columns.

    For more information about generating origin identifiers during ingestion, see Origin identifiers.

  • The next two columns of the staging table are source_created and source_last_updated. You can use these columns to store information about when the data to be ingested was created and modified in its source.
  • The next two columns of the staging table are correlation_id_type and correlation_id_key. If you want to correlate data during ingestion into the Information Store, you can use these columns to store values that i2 Analyze uses to generate correlation identifiers. For more information, see Overview of correlation.
    Note: Although populating the correlation identifier columns is not mandatory, doing so acts like a switch. The presence of correlation identifier values in any row of a staging table causes i2 Analyze to perform correlation for all the rows in that table.
  • Any property type in the i2 Analyze schema that has the logical type DATE_AND_TIME occupies four columns in the staging table. These columns always appear in the same order:
    • The "P0" column is for the local date and time as originally recorded, as a DATE_AND_TIME.
    • The "P1" column is for the time zone of the local date and time, as listed in the IANA database. For example, Europe/London.
    • The "P2" column is for an indicator of whether Daylight Saving Time is (1) or is not (0) in effect.
      Note: i2 Analyze considers this value only when the time is ambiguous because it occurs during the hour that is "repeated" when Daylight Saving Time ends.
    • The "P3" column is for the date and time as expressed in Coordinated Universal Time (UTC), as another DATE_AND_TIME.

    For more information about the permitted values for DATE_AND_TIME columns in your database management system, see Information Store property value ranges.

  • The next columns derive from the security schema rather than the i2 Analyze schema. One column exists for each security dimension that the security schema defines. You can use these columns if you want to give different dimension values to each i2 Analyze record that is created or updated as a result of ingestion.
  • In link tables, there is also a direction column to store the direction of links.
  • The final three columns are named source_ref_source_type, source_ref_source_location, and source_ref_source_image_url. These columns are used to populate the source reference that is generated when the data is ingested.

    For more information about implementing source references in your system, see Configuring source references.

The staging tables contain some, but never all, of the data for i2 Analyze records. They do not contain the type identifiers that Information Store records must have, and it is not mandatory to populate the columns for timestamps, security dimension values, or correlation identifiers. You can supply the remainder of the information in an ingestion mapping.