Database index management

It can be beneficial for performance to drop and re-create indexes in the Information Store database during bulk mode ingestion. The situations when dropping and creating the database indexes might help change over the lifetime of a deployment as your Information Store contains more data.

Decide when to drop and create indexes

The situations where you might want to drop and re-create the database indexes include:

  • Completing an initial load for an item type or ingesting data into an empty Information Store

  • When ingesting large amounts of data (for example, more than 20 million rows in the staging table)

The situations where you do not want to drop and re-create the database indexes include:

  • When the Information Store contains a large amount of data for the item type you are ingesting, the time that it takes to re-create the indexes can make the total ingestion time longer. If you are ingesting into an Information Store that already contains about 1 billion records, do not drop and re-create the database indexes.

  • If you are ingesting links between entities of the same type and some of those entities might be correlated, do not drop and re-create the database indexes.

By default, the indexes are kept in place and not dropped during data ingestion.

To help determine what is best for your data, you can test some ingestions that drop and re-create indexes and some that don't. When you complete the test ingestions, record the time that it takes to complete. As the amount of data in the Information Store increases, the time it takes to create the indexes also increases.

Dropping and re-creating indexes during a bulk import ingestion process

If you decide to drop and re-create the database indexes during large ingestions that use multiple staging tables across both entity and link types, the high-level process consists of the following steps:

  1. Stop Liberty and Solr

  2. Drop the database indexes for the entity types that you are ingesting data for

  3. Ingest all entity data

  4. Create the database indexes for the entity types that you ingested data for

  5. Drop the database indexes for the link types that you are ingesting data for

  6. Ingest all link data

  7. Create the database indexes for the link types that you ingested data for

  8. Start Liberty and Solr

You can use two methods to drop and create indexes:

  • Use the i2 Analyze deployment toolkit to generate scripts that you run against the database manually.

  • Use the import configuration properties file to specify whether the indexes must be dropped or created during a bulk import mode ingestion.

If you are ingesting data for multiple link types or you want to review the database scripts and run them manually, it is best to use the generated scripts to drop and re-create the database indexes. If you do not want to (or cannot) run scripts against the Information Store database, you can use the import configuration file and allow the ingestion process to drop and re-create the database indexes.

If you are completing an ingestion that spans multiple ingestion periods where the system is in use between ingestion periods, ensure that all of the indexes are created at the end of an ingestion period and that you start Liberty and Solr before analysts use the system.

For information about creating the import configuration file, see References and system properties.

Using the generated scripts method

  1. In your import configuration file, set both the dropIndexes and createIndexes settings to FALSE.

    For example:

    dropIndexes=FALSE
    createIndexes=FALSE
  2. Generate the create and drop index scripts for each item type that you are ingesting data for.

    For example, to generate the scripts for the item type with identifier ET5, run the following commands:

    setup -t generateInformationStoreIndexCreationScripts -p schemaTypeId=ET5
    setup -t generateInformationStoreIndexDropScripts -p schemaTypeId=ET5

    The scripts are output in the toolkit\scripts\database\db2\InfoStore\generated directory, in the createIndexes and dropIndexes directories.

  3. Stop Liberty and Solr.

    On the Liberty server, run:

    setup -t stopLiberty

    On each Solr server, run:

    setup -t stopSolrNodes --hostname solr.host-name

    Where solr.host-name is the host name of the Solr server where you are running the command, and matches the value for the host-name attribute of a <solr-node> element in the topology.xml file.

  4. Run the scripts that you generated in Step 2 to drop the indexes for each entity type that you plan to ingest data for.

  5. Complete the process to ingest the entity data using the bulk import mode. For more information, see Using bulk import mode.

  6. Run the scripts that you generated in Step 2 to create the indexes for each entity type that you ingested.

  7. Run the scripts that you generated in Step 2 to drop the indexes for each link type that you plan to ingest data for.

  8. Complete the process to ingest the link data using the bulk import mode. For more information, see Using bulk import mode.

    Only re-create the database indexes after you ingest the link data for every link type that you plan to ingest.

  9. Run the scripts that you generated in Step 2 to create the indexes for each link type that you ingested.

  10. Start Liberty and Solr.

    On each Solr server, run:

    setup -t startSolrNodes --hostname solr.host-name

    Where solr.host-name is the host name of the Solr server where you are running the command, and matches the value for the host-name attribute of a <solr-node> element in the topology.xml file.

    On the Liberty server, run:

    setup -t startLiberty

Using the import configuration properties file method

When you use the import configuration settings to drop and re-create the database indexes, the toolkit creates and drops the indexes for you as part of the ingestion process instead of running scripts against the database manually. However, you must modify the import configuration file a number of times throughout the ingestion of multiple item types. You must still stop Liberty and Solr before you run the ingestion command, and start them again after the indexes are created.

If all of the data for a single item type is ingested with a single ingestion command, in your import configuration file set the dropIndexes and createIndexes settings as follows:

dropIndexes=TRUE
createIndexes=TRUE

If the data for a single item type must be ingested by using multiple ingestion commands, you need to modify the import configuration file before the first ingestion command, for the intermediate commands, and before the final command for each item type. For example, if your data for a single entity type is in more than one staging table:

  1. The first time you call the ingestion command, set the dropIndexes and createIndexes settings as follows:

    dropIndexes=TRUE
    createIndexes=FALSE
  2. For the intermediate times that you call the ingestion command, set the dropIndexes and createIndexes settings as follows:

    dropIndexes=FALSE
    createIndexes=FALSE
  3. The final time you call the ingestion command, set the dropIndexes and createIndexes settings as follows:

    dropIndexes=FALSE
    createIndexes=TRUE

After you ingest all the entity data, repeat the process for the link data.