References and system properties
In an ingestion mapping, you can use constants or references to specify values for i2 Analyze records. When you use a reference, the ingestion process retrieves a value from a staging table column or a property in a settings file. The settings file can also set system properties that control some aspects of ingestion into the Information Store.
By default, the settings file does not exist. You must create the settings file and specify the file when you run the ingestion command.
SEC_LEVEL_VALUE=UC
SEC_COMPARTMENT_VALUE=HI,OSI
IngestionFailureMode=MAPPING
When
you run the ingestion command, you reference your
settings file as
follows:setup -t ingestInformationStoreRecords
...
-p importConfigFile=ingestion_settings.properties
System properties
As well as providing values for ingestion mappings, you can use the settings file to configure the behavior of the ingestion process. The file supports a handful of system properties that you can set in the same way as you create and set custom properties.
IngestionFailureMode [ RECORD | MAPPING ]
-
When the Information Store encounters a problem with a record during ingestion, its default behavior is to log the error and move on to the next record. Failure is record-based. Instead, you can specify that a problem with one record causes the Information Store not to ingest any of the records from that staging table. Failure then is mapping-based.
In the settings file, the possible values for the
IngestionFailureMode
setting areRECORD
orMAPPING
. The default value isRECORD
.For example, to change the failure mode to mapping, add the following line to your settings file:IngestionFailureMode=MAPPING
RecordFailureThreshold
-
During the ingestion process, if the number of errors that occur is greater than the value for the
RecordFailureThreshold
property the process stops and no data is ingested into the Information Store. By default, the value for this property is 1000.For example:RecordFailureThreshold=500
IngestionRunstats [ TRUE | FALSE ]
-
During the ingestion process the RUNSTATS command updates statistics in the system catalog about the characteristics of a table, associated indexes, or statistical views.
In a table that contains many records, it can take the RUNSTATS command a long time to complete. If the number of records that you are ingesting is a small percentage of the total number of records in the table, it is recommended that you configure the ingestion process not to call the
RUNSTATS
command. You can then manually run the RUNSTATS command after a significant number of records are ingested.To prevent the ingestion process from calling theRUNSTATS
command, set the value ofIngestionRunstats
to FALSE. For example:IngestionRunstats=FALSE
The following command is the RUNSTATS command to run manually after an ingestion. The command generates query output, and saves the output to the file named ISStatisticsCollection.sql that you can run.db2 connect to ISTORE user <user name> using <password> db2 -x "SELECT 'RUNSTATS ON TABLE ' || TRIM(TABSCHEMA) ||'.'|| TRIM(TABNAME) || ' WITH DISTRIBUTION ON ALL COLUMNS AND INDEXES ALL ALLOW WRITE ACCESS TABLESAMPLE BERNOULLI(10) INDEXSAMPLE BERNOULLI(10);' FROM SYSCAT.TABLES WHERE TYPE = 'T' AND TABSCHEMA = 'IS_DATA'" > ISStatisticsCollection.SQL
ImportBatchSize
The
ImportBatchSize
controls the number of rows from a staging table that are ingested per batch. By default, theImportBatchSize
is set to 100,000.For example:If you are ingesting data into the system while analysts are using the system, it is possible that some analysis operations are blocked from returning results while a batch of data is being ingested. During the ingestion process, up to 100,000 rows (or the value of your batch size) can be locked in the database during each batch of the import. These locks cause a potential for the following analytical operations to stop returning results until the batch is complete: Find Path, Expand, and Visual Query.ImportBatchSize=100000
To determine how long each batch takes to ingest, inspect the
IS_Data.Ingestion_Batches
table. The time that the batch takes to complete is the time that analytical operations might be blocked from returning results. If the time is too long for your requirements, you can reduce the value forImportBatchSize
to reduce the batch size and the time that it takes to complete.For SQL Server, a batch size greater than 1,700 can cause table locks during link ingestion.
References
Many of the pieces of information that you provide in an ingestion mapping are fixed for that mapping. Item types, end types, and some parts of the origin identifier do not change between the i2 Analyze records that one mapping is responsible for. The most appropriate way to specify this kind of information is to use constant values on a per-mapping basis.
The two main reasons for preferring references to constant values lie at opposite ends of the spectrum:
- To give different values for the same field in records that are ingested through the same mapping, you can refer to a staging table column. This approach is appropriate for many non-property values that change from one record to the next.
- To use the same values across multiple
ingestion mappings, refer to a property in a
settings file. This approach might be appropriate
when you want all the data from a source to get
the same security dimension values. You can refer
to the same property from every mapping that you
write.
A settings file that defines properties for the ingestion process is just a text file that contains a set of
name=value
pairs, with one pair on each line:SEC_LEVEL_VALUE=UC SEC_COMPARTMENT_VALUE=HI,OSI
When you run one of the ingestion commands, you can supply it with the name of the properties file whose values you want to use.
To use a value by reference in an ingestion
mapping, you use the
$(name)
syntax. name is the name of
either a column in the staging table or a property
in a settings file. For example,
$(SOURCE_ID)
and
$(DIRECTION)
refer to staging
table columns, while in the previous example
$(SEC_LEVEL_VALUE)
and
$(SEC_COMPARTMENT_VALUE)
refer to
properties.