Updating the Information Store for deleted data

The data that the Information Store ingests is fixed at the moment of ingestion, and removal of the data in the external source does not automatically delete it from the Information Store. However, you can update the Information Store to reflect the deletion of data in the external source by using staging tables and the deployment toolkit.

When data changes in its original source, you can use the same pipeline that you used for initial ingestion to update the records in the Information Store. If data is deleted from its source, you can use the staging tables and the deployment toolkit to reflect that fact in the Information Store as well.

A single i2 Analyze record can represent data from multiple sources, which results in a record that contains multiple pieces of provenance. As a consequence, responding to source data deletion does not necessarily mean deleting records from the Information Store. When you use the toolkit to reflect deleted source data, the effect is to remove the provenance associated with that data. If the process removes a record's only provenance, the record is deleted. If not, the record remains in the Information Store.

Note: To delete records from the Information Store explicitly, use the deletion-by-rule approach. You can write conditions to determine which records are deleted. For more information about deleting records in this way, see Deleting records by rule.

The commands to update the Information Store for deleted data use the same mapping file and staging tables as the commands for ingesting data, and you call them in a similar way. However, the only information that must be in the staging table is what the mapping file requires to generate the origin identifiers of the data that is no longer in the external source.

When you run the commands to update the Information Store for deleted data, the rules that apply differ from the rules for adding and updating data:

  • Links do not have to be processed before entities, or vice versa.
  • Links can be processed without specifying the origin identifiers of their ends.
  • Deleting a piece of provenance from an entity record also deletes all the link provenance that is connected to it.
  • The process silently ignores any origin identifiers that are not in the Information Store.

Because this process might cause significant numbers of i2 Analyze records to be deleted, two commands are provided. The first command previews the effect of running the second command before you commit to doing so. In the deployment toolkit, the two commands have different names but the same syntax:

setup -t previewDeleteProvenance
      -p importMappingsFile=ingestion_mapping_file
      -p importMappingId=ingestion_mapping_id
setup -t deleteProvenance
      -p importMappingsFile=ingestion_mapping_file
      -p importMappingId=ingestion_mapping_id
      -p importLabel=ingestion_label
      -p logConnectedLinks
      -p importMode=BULK_DELETE

In the ETL toolkit, you reuse the ingestInformationStoreRecords command. For more information about running the command from the ETL toolkit, see ETL toolkit.

For more information about the running the commands and any arguments, see The previewDeleteProvenance and deleteProvenance tasks.

Bulk delete mode can be used for improved performance when you are removing provenance from the Information Store that does not contribute to correlated records. If you try to delete any provenance that contributes to correlated records, that provenance is not removed from the Information Store and is recorded in a table in the IS_Public database schema. The table name is displayed in the console when the delete process finishes. For example, IS_Public.D22200707130930400326011ET5. Before you use bulk delete mode, ensure that your database is configured correctly. For more information, see Database configuration for IBM Db2.

The procedure for updating the Information Store in this way starts with a staging table that contains information about the data that you no longer want to represent in the Information Store.

  1. Ensure that the application server that hosts i2 Analyze is running.
  2. Run the previewDeleteProvenance command to discover what the effect of running deleteProvenance is.
    For example:
    setup -t previewDeleteProvenance -p importMappingsFile=mapping.xml
          -p importMappingId=Person
    The output to the console window describes the outcome of a delete operation with these settings. High counts or a long list of types might indicate that the operation is going to delete more records than you expected. Previewing the delete operation does not create an entry in the Ingestion_Deletion_Reports view, the output is displayed in the console.
    >INFO [DeleteLogger] - Delete preview requested at 2017.12.08 11:05:32
    >INFO [DeleteLogger] - Item type: Person
    >INFO [DeleteLogger] - Number of 'Person' provenance pieces to be deleted: 324
    >INFO [DeleteLogger] - Number of 'Person' i2 Analyze records to be deleted: 320
    >INFO [DeleteLogger] - Number of 'Access To' provenance pieces to be deleted: 187
    >INFO [DeleteLogger] - Number of 'Access To' i2 Analyze records to be deleted: 187
    >INFO [DeleteLogger] - Number of 'Associate' provenance pieces to be deleted: 27
    >INFO [DeleteLogger] - Number of 'Associate' i2 Analyze records to be deleted: 27
    >INFO [DeleteLogger] - Number of 'Employment' provenance pieces to be deleted: 54
    >INFO [DeleteLogger] - Number of 'Employment' i2 Analyze records to be deleted: 54
    >INFO [DeleteLogger] - Number of 'Involved In' provenance pieces to be deleted: 33
    >INFO [DeleteLogger] - Number of 'Involved In' i2 Analyze records to be deleted: 33
    >INFO [DeleteLogger] - Duration: 1 s
    Note: When you run the command for entity records, the output can exaggerate the impact of the operation. If the staging table identifies the entities at both ends of a link, the preview might count the link record twice in its report.
  3. Correct any reported problems, and verify that the statistics are in line with your expectations for the operation. If they are not, change the contents of the staging table, and run the preview command again.
  4. Run the deleteProvenance command with the same parameters to update the Information Store.
    For example:
    setup -t deleteProvenance -p importMappingsFile=mapping.xml
          -p importMappingId=Person -p importLabel=DeletePeople
          -p logConnectedLinks
    Note: Do not run multiple deleteProvenance commands at the same time, or while data is being ingested into the Information Store.
  5. Repeat the steps for the types of any other records that you want to process.

At the end of this procedure, the Information Store no longer contains the provenance (or any connected link provenance) for the data that you identified though the mapping files and staging tables. Any records that lose all of their provenance, and any connected link records, are deleted as a result. Deleting data is permanent, and the only way to restore it to the Information Store is to add it again through the ingestInformationStoreRecords command.