Understanding ingestion reports

Every attempt to add, update, or delete data in the Information Store through the deployment or ETL toolkit adds rows to the IS_Public.Ingestion_Deletion_Reports view. You can use the contents of this view to track the history of all such operations, and to examine the impact of a particular operation.

Each time that you run a command that might change the contents of the Information Store, you create a job in the database. Each job acts on one or more batches of i2 Analyze records. There is always one batch per item type that the command affects, but there can also be several batches for the same type if the number of affected records is large.

For example, consider a command that processes updates for deleted Person entity data. The first batch in the resulting job is for Person records, and there might be more such batches if there are many records to be deleted. If the Person data has links, then the job has further batches for each type of link that might get deleted as a result of the entity deletion.

The IS_Public.Ingestion_Deletion_Reports view contains information about every batch from every toolkit operation to create or update data in the Information Store. When you query the view, include ORDER BY job_id to group entries for the same job.

Note: Deletion-by-rule operations also result in job and batch creation, and view population, according to the same rules. For more information, see the Deletion Guide.

The first few columns in the view have the same value for all batches within a job:

Column name Description
label The value that you passed in the importLabel parameter of a toolkit command, or the value that a deletion-by-rule operation generates, or null.
job_id The server-assigned identifier for this ingestion or deletion job. This identifier is also a cross-reference to the Deletion_By_Rule_Log view if the job originated from a deletion-by-rule operation.
ingestion_mode The value that you passed in the importMode parameter, or Delete for all deletion-by-rule operations.
validation_mode A description of how the job was configured to react to errors during the operation.
error_threshold The threshold that applies to some of the validation modes.
primary_item_type The i2 Analyze schema ID of the item type that was specified at job creation.
primary_record_count The number of records of the primary item type that were affected by the job. (Deleting entity data can affect link records too.)
start_time The start time of the job as a whole.
end_time The end time of the job as a whole.

The remaining columns can have different values for different batches of records:

Column name Description
batch_item_type The i2 Analyze schema ID of the item type that was acted on in this batch. For at least one batch, the batch_item_type is the same as the primary_item_type.
batch_start_time The start time of this batch, which is always later than the start time of the job.
batch_end_time The end time of this batch, which is always earlier than the end time of the job.
insert_count The number of rows of data from this batch that were inserted to the Information Store, resulting in new i2 Analyze records.
update_count The number of rows of data from this batch that updated existing records in the Information Store.
merge_count The number of merge operations that occurred in the Information Store from this batch.
unmerge_count The number of unmerge operations that occurred in the Information Store from this batch.
delete_count The number of pieces of provenance that were deleted from the Information Store as a result of this batch.
delete_record_count The number of records that were deleted from the Information Store as a result of this batch.
reject_count The number of rows that were rejected from this batch during processing because they are invalid.
status An indicator of the result of this batch, from success (all rows processed correctly) through partial success to failure (no rows processed).
reject_view The full name of the view that contains details of any rejected rows.
stack_trace If i2 Analyze generated a stack trace as a result of errors during ingestion or deletion, this column contains it.
Ingest example
The (abbreviated) report for successful ingestion operations might look like this:
job_id 1 2
ingestion_mode Standard Standard
primary_item_type ET10 ET4
primary_record_count 62 8
batch_item_type ET10 ET4
batch_start_time 2017-11-30 15:27:06.76 2017-11-30 15:27:09.45
batch_end_time 2017-11-30 15:27:09.87 2017-11-30 15:27:09.63
insert_count 57 7
update_count 0 0
merge_count 5 1
unmerge_count 0 6
delete_count 0 0
delete_record_count 0 0
reject_count 0 0
status Succeeded Succeeded
In this example, several commands to ingest entity records resulted in the creation of several jobs. Each job demonstrates different behavior that is possible during ingestion, including correlation operations:
JOB_ID 1
This job demonstrates what the ingestion report can look like when data in the staging table causes merge operations. In this example, five merge operations are completed on the incoming rows of data, as shown in the merge_count column. This results in 57 i2 Analyze records created from the 62 rows of data, as shown in the insert_count and primary_record_count columns. This includes merging five rows of data with existing i2 Analyze records in the Information Store.
JOB_ID 2
This job demonstrates what the ingestion report can look like when the data in the staging table causes unmerge and merge operations. In this example, six unmerge operations are completed on the incoming rows of data, as shown in the unmerge_count column. One merge operation is completed on the incoming rows, as shown in the merge_count column. This results in 7 i2 Analyze records created, from eight rows of data as shown in the insert_count and primary_record_count columns. The primary_record_count value does not include the unmerge_count.
Delete example
The (abbreviated) report for a successful delete operation might look like this:
job_id 26 26 26 26 26
ingestion_mode Delete Delete Delete Delete Delete
primary_item_type ET5 ET5 ET5 ET5 ET5
primary_record_count 324 324 324 324 324
batch_item_type ET5 LAC1 LAS1 LEM1 LIN1
batch_start_time 2017-11-30 15:27:06.76 2017-11-30 15:27:08.60 2017-11-30 15:27:08.60 2017-11-30 15:27:09.43 2017-11-30 15:27:09.45
batch_end_time 2017-11-30 15:27:09.87 2017-11-30 15:27:09.30 2017-11-30 15:27:09.29 2017-11-30 15:27:09.62 2017-11-30 15:27:09.63
insert_count 0 0 0 0 0
update_count 0 0 0 0 0
merge_count 0 0 0 0 0
unmerge_count 0 0 0 0 0
delete_count 324 187 27 54 33
delete_record_count 320 187 27 54 33
reject_count 0 0 0 0 0
status Succeeded Succeeded Succeeded Succeeded Succeeded

In this example, a command to update the Information Store for deleted entity data (with item type ET5) resulted in the creation of a job with five batches. The first few columns of the Ingestion_Deletion_Reports view contain the same values for all batches in the same job. Later columns reveal how deleting entity records results in the deletion of connected link records (with item types LAC1, LAS1, LEM1, LIN1).

In one case, the delete_record_count value is less than the delete_count value. This is because some of the provenance to be deleted was associated with an i2 Analyze record that had more than one piece of provenance. An i2 Analyze record is deleted only when the last associated provenance is deleted.