Troubleshooting the ingestion process

The commands that you run during the ingestion process send information about their progress to the command line and a log file. If any command encounters errors or does not run to completion, you can read the output to help you to diagnose the problem.

When an ingestion process runs to completion, the final output from the command is a report of what happened to the Information Store. The reports appear on the command line and in the ingestion log at toolkit\configuration\logs\importer\i2_Importer.log. The three possible end states are success, partial success, and failure.

Success

If the ingestion command processed all of the rows in the staging table without error, then the Information Store reflects the contents of the staging table. The command reports success like this example:

> INFO  [IImportLogger] - Total number of rows processed: 54
> INFO  [IImportLogger] - Number of records inserted: 0
> INFO  [IImportLogger] - Number of records updated: 54
> INFO  [IImportLogger] - Number of merges: 0
> INFO  [IImportLogger] - Number of unmerges: 0
> INFO  [IImportLogger] - Number of rows rejected: 0
> INFO  [IImportLogger] - Duration: 5 s
> INFO  [IImportLogger] -
> INFO  [IImportLogger] - Result: SUCCESS
Partial success

If you ran the command in record-based failure mode, and it processed some of the rows in the staging table without error, then it reports partial success like this example:

> INFO  [IImportLogger] - Total number of rows processed: 34
> INFO  [IImportLogger] - Number of records inserted: 0
> INFO  [IImportLogger] - Number of records updated: 30
> INFO  [IImportLogger] - Number of merges: 0
> INFO  [IImportLogger] - Number of unmerges: 0
> INFO  [IImportLogger] - Number of rows rejected: 4
> INFO  [IImportLogger] - Duration: 4 s
> INFO  [IImportLogger] -
> INFO  [IImportLogger] - Result: PARTIAL SUCCESS
> INFO  [IImportLogger] -
> INFO  [IImportLogger] - Total number of errors: 4
> INFO  [IImportLogger] - Error categories:
> INFO  [IImportLogger] - ABSENT_VALUE: 4
> INFO  [IImportLogger] -
> INFO  [IImportLogger] - The rejected records and errors are recorded in
the database. For details, use the following view:
> INFO  [IImportLogger] - IS_Staging.S20171204122426717092ET5_Rejects_V

The records in the Information Store reflect the rows from the staging table that the command successfully processed. The report includes the name of a database view that you can examine to discover what went wrong with each failed row.

Failure

If you ran the command in mapping-based failure mode, then any error you see is the first one that it encountered, and the report is of failure:

> INFO  [IImportLogger] - Total number of rows processed: 1
> INFO  [IImportLogger] - Number of records inserted: 0
> INFO  [IImportLogger] - Number of records updated: 0
> INFO  [IImportLogger] - Number of merges: 0
> INFO  [IImportLogger] - Number of unmerges: 0
> INFO  [IImportLogger] - Number of rows rejected: 0
> INFO  [IImportLogger] - Duration: 0 s
> INFO  [IImportLogger] -
> INFO  [IImportLogger] - Result: FAILURE

When the process fails in this fashion, the next lines of output describe the error in more detail. In this event, the command does not change the contents of the Information Store.

Note: If a serious error occurs, it is possible for the ingestion command not to run to completion. When that happens, it is harder to be certain of the state of the Information Store. The ingestion process uses batching, and the records in the store reflect the most recently completed batch. If you are using the bulk import mode, see Bulk import mode error for more information about recovering from errors at this stage.

If the command reports partial success, you might be able to clean up the staging table by removing the rows that were ingested and fixing the rows that failed. However, the main benefit of record-based failure is that you can find out about multiple problems at the same time.

The most consistent approach to addressing failures of all types is to fix up the problems in the staging table and run the ingestion command again. The following sections describe how to react to some of the more common failures.

Link rows in the staging table refer to missing entity records

When the Information Store ingests link data, you might see the following error message in the console output:
Link data in the staging table refers to missing entity records
This message is displayed if the entity record at either end of a link is not present in the Information Store. To resolve the error:
  • Examine the console output for your earlier operations to check that the Information Store ingested all the entity records properly.
  • Ensure that the link end origin identifiers are constructed correctly, and exist for each row in the staging table.
  • Ensure that the link type and the entity types at the end of the links are valid according to the i2 Analyze schema.
Then, rerun the ingestion command.

Rows in the staging table have duplicate origin identifiers

During any ingestion procedure, but especially when a staging table is large, you might see the following error message in the console output:
Rows in the staging table have duplicate origin identifiers
This message is displayed when several rows in a staging table generate the same origin identifier. For example, more than one row might have the same value in the source_id column.

If more than one row in the staging table contains the same provenance information, you must resolve the issue and repopulate the staging table. Alternatively, you can separate the rows so that they are not in the same staging table at the same time.

This problem is most likely to occur during an update to the Information Store that attempts to change the same record (with the same provenance) twice in the same batch. It might be appropriate to combine the changes, or to process only the last change. After you resolve the problem, repopulate the staging table and rerun the ingestion command.

Geospatial data is in the incorrect format

During an ingestion procedure that contains geospatial data, you might see the following error messages in the console output:

On Db2:
SQLERRMC=GSEGEOMFROMWKT;;GSE3052N  Unknown type "FOO(33.3" in WKT.
On SQL Server:
System.FormatException: 24114: The label FOO(33.3 44.0) in the input well-known text (WKT) is not valid.
This message is displayed when data in a geospatial property column is not in the correct format.

Data in geospatial property columns must be in the POINT(longitude latitude) format. For more information, see Information Store property value ranges.

Error occurred during a correlation operation

During an ingestion procedure with correlated data, you might see the following error message in the console output:
An error occurred during a correlation operation. There might be some data in an unusable state.

This message is displayed if the connection to the database or Solr is interrupted during a correlation operation.

To resolve the problem, you must repair the connection that caused the error, and then run the syncInformationStoreCorrelation toolkit task. This task synchronizes the data in the Information Store with the data in the Solr index so that the data returns to a usable state.

After you run the syncInformationStoreCorrelation task, reingest the data that you were ingesting when the failure occurred. Any attempt to run an ingestion or a deletion command before you run syncInformationStoreCorrelation will fail.

Ingestion with correlated data is still in progress

During an ingestion procedure, you might see the following error message in the console output:
You cannot ingest data because an ingestion with correlated data is still in progress,
or because an error occurred during a correlation operation in a previous ingestion.

If another ingestion is still in progress, you must wait until it finishes. If a previous ingestion failed during a correlation operation, you must run the syncInformationStoreCorrelation toolkit task.

For more information about running the syncInformationStoreCorrelation toolkit task, see Error occurred during a correlation operation.

Ingestion of the same item type is still in progress

During an ingestion procedure, you might see the following error message in the console output:
You cannot ingest data for item type <ET5> because an ingestion is still in progress. 
You must wait until the process is finished before you can start another ingestion for this item type.
If another ingestion of the same item type is still in progress, you must wait until it finishes.

If you are sure that the ingestion is complete or not in progress, you can remove the file that is blocking the ingestion. To determine whether an ingestion is in progress, a file is created in the temporary directory on the server where the ingestion command was run. For example, AppData\Local\Temp. The file name is INGESTION_IN_PROGRESS_<item type ID>. After you remove the file, you can run the ingestion command again.

Bulk import mode error

The symptoms of this type of failure are a stack trace and failure message in the console and importer log. To recover from a failure at this time:

  1. Identify the cause of the failure. You must use the SQL error codes to determine the cause of the failure.

    You might see error messages about the following issues:

    • Log size or connectivity issues.
    • Invalid data in the staging table.
  2. Fix the problem that caused the failure. This might include ensuring connectivity to the database or increasing the log size.
  3. After you resolve the problem that caused the error, you can attempt the ingestion again. If any of the rows in the staging table were already ingested into the Information Store, you must remove them from the staging table before you can ingest in bulk mode.
    • In the console or importer log, if the value for Number of rows accepted is 0 then run the ingestion command again.
    • In the console or importer log, if the value for Number of rows accepted is greater than 0, you must ensure that these records are not ingested again.

      Before you run the ingestion command again, add the CheckExistingOriginIds=filter setting to the import configuration file. When this value is set, the ingestion process calculates whether the origin identifiers in the staging table already exist in the Information Store and does not attempt to ingest them again.

      When this is set to filter, the ingestion might take longer to complete. After the ingestion that failed is complete, you can remove the CheckExistingOriginIds setting from your import configuration file for future ingestion operations.

      For more information about creating an import configuration file, see References and system properties.