References and system properties
In an ingestion mapping, you can use constants or references to specify values for i2 Analyze records. When you use a reference, the ingestion process retrieves a value from a staging table column or a property in a settings file. The settings file can also set system properties that control some aspects of ingestion into the Information Store.
By default, the settings file does not exist. You must create the settings file and specify the file when you run the ingestion command.
For example, you might call your settings file ingestion_settings.properties and it might contain the following name-value pairs:
SEC_LEVEL_VALUE=UC
SEC_COMPARTMENT_VALUE=HI,OSI
IngestionFailureMode=MAPPING
When you run the ingestion command, you reference your settings file as follows:
setup -t ingestInformationStoreRecords
...
-p importConfigFile=ingestion_settings.properties
System properties
As well as providing values for ingestion mappings, you can use the settings file to configure the behavior of the ingestion process. The file supports a handful of system properties that you can set in the same way as you create and set custom properties.
IngestionFailureMode [ RECORD | MAPPING ]
When the Information Store encounters a problem with a record during ingestion, its default behavior is to log the error and move on to the next record. Failure is record-based. Instead, you can specify that a problem with one record causes the Information Store not to ingest any of the records from that staging table. Failure then is mapping-based.
In the settings file, the possible values for the IngestionFailureMode setting are RECORD or MAPPING. The default value is RECORD.
For example, to change the failure mode to mapping, add the following line to your settings file:
IngestionFailureMode=MAPPING
RecordFailureThreshold
During the ingestion process, if the number of errors that occur is greater than the value for the RecordFailureThreshold property, the process stops and no data is ingested into the Information Store. By default, the value for this property is 1000.
For example:
RecordFailureThreshold=500
IngestionRunstats [ TRUE | FALSE ]
During the ingestion process the RUNSTATS command updates statistics in the system catalog about the characteristics of a table, associated indexes, or statistical views.
In a table that contains many records, it can take the RUNSTATS command a long time to complete. If the number of records that you are ingesting is a small percentage of the total number of records in the table, it is recommended that you configure the ingestion process not to call the RUNSTATS command. You can then manually run the RUNSTATS command after a significant number of records are ingested.
You might also want to run your own RUNSTATS command instead of the one that is used by the ingestion process.
To prevent the ingestion process from calling the RUNSTATS command, set the value of IngestionRunstats to FALSE. For example:
IngestionRunstats=FALSE
The following commands are the RUNSTATS commands to run manually after an ingestion on each database when IngestionRunstats is set to false. Where <table name> is the table name where the data was ingested. For example, if you ingested Person data the table name is IS_DATA.E_PERSON.
For SQL Server:
ANALYZE <table name>
For PostgreSQL:
UPDATE STATISTICS <table name> WITH SAMPLE 10 PERCENT
For Db2:
RUNSTATS ON TABLE <table name> ON KEY COLUMNS AND INDEXES ALL ALLOW WRITE ACCESS TABLESAMPLE SYSTEM(10) INDEXSAMPLE SYSTEM(10)
The following command is a RUNSTATS command that you can run manually after an ingestion on large Db2 deployments. The command generates query output, and saves the output to the file named ISStatisticsCollection.sql that you can run.
db2 connect to ISTORE user <user name> using <password>
db2 -x
"SELECT
'RUNSTATS ON TABLE ' || TRIM(TABSCHEMA) ||'.'|| TRIM(TABNAME) || '
WITH DISTRIBUTION ON ALL COLUMNS
AND INDEXES ALL
ALLOW WRITE ACCESS
TABLESAMPLE BERNOULLI(10)
INDEXSAMPLE BERNOULLI(10);'
FROM SYSCAT.TABLES
WHERE TYPE = 'T' AND TABSCHEMA = 'IS_DATA'"
> ISStatisticsCollection.SQL
ImportBatchSize
The ImportBatchSize controls the number of rows from a staging table that are ingested per batch. By default, the ImportBatchSize is set to 100,000.
For example:
ImportBatchSize=100000
If you are ingesting data into the system while analysts are using the system, it is possible that some analysis operations are blocked from returning results while a batch of data is being ingested. During the ingestion process, up to 100,000 rows (or the value of your batch size) can be locked in the database during each batch of the import. These locks cause a potential for the following analytical operations to stop returning results until the batch is complete: Find Path, Expand, and Visual Query.
To determine how long each batch takes to ingest, inspect the IS_Data.Ingestion_Batches table. The time that the batch takes to complete is the time that analytical operations might be blocked from returning results. If the time is too long for your requirements, you can reduce the value for ImportBatchSize to reduce the batch size and the time that it takes to complete.
For SQL Server, a batch size greater than 1,700 can cause table locks during link ingestion.
References
Many of the pieces of information that you provide in an ingestion mapping are fixed for that mapping. Item types, end types, and some parts of the origin identifier do not change between the i2 Analyze records that one mapping is responsible for. The most appropriate way to specify this kind of information is to use constant values on a per-mapping basis.
The two main reasons for preferring references to constant values lie at opposite ends of the spectrum:
To give different values for the same field in records that are ingested through the same mapping, you can refer to a staging table column. This approach is appropriate for many non-property values that change from one record to the next.
To use the same values across multiple ingestion mappings, refer to a property in a settings file. This approach might be appropriate when you want all the data from a source to get the same security dimension values. You can refer to the same property from every mapping that you write.
A settings file that defines properties for the ingestion process is just a text file that contains a set of name=value pairs, with one pair on each line:
SEC_LEVEL_VALUE=UC
SEC_COMPARTMENT_VALUE=HI,OSI
When you run one of the ingestion commands, you can supply it with the name of the properties file whose values you want to use.
To use a value by reference in an ingestion mapping, you use the $(name) syntax. name is the name of either a column in the staging table or a property in a settings file. For example, $(SOURCE_ID) and $(DIRECTION) refer to staging table columns, while in the previous example $(SEC_LEVEL_VALUE) and $(SEC_COMPARTMENT_VALUE) refer to properties.
Note: Since referring to columns and properties uses the same syntax, a clash can happen if a column and a property have the same name. In that case, the value of the property takes precedence.