Database index management
It can be beneficial for performance to drop and re-create indexes in the Information Store database during bulk mode ingestion. The situations when dropping and creating the database indexes might help change over the lifetime of a deployment as your Information Store contains more data.
Decide when to drop and create indexes
- Completing an initial load for an item type or ingesting data into an empty Information Store
- When ingesting large amounts of data (for example, more than 20 million rows in the staging table)
- When the Information Store contains a large amount of data for the item type you are ingesting, the time that it takes to re-create the indexes can make the total ingestion time longer. If you are ingesting into an Information Store that already contains about 1 billion records, do not drop and re-create the database indexes.
- If you are ingesting links between entities of the same type and some of those entities might be correlated, do not drop and re-create the database indexes.
By default, the indexes are kept in place and not dropped during data ingestion.
To help determine what is best for your data, you can test some ingestions that drop and re-create indexes and some that don't. When you complete the test ingestions, record the time that it takes to complete. As the amount of data in the Information Store increases, the time it takes to create the indexes also increases.
Dropping and re-creating indexes during a bulk import ingestion process
- Stop Liberty and Solr
- Drop the database indexes for the entity types that you are ingesting data for
- Ingest all entity data
- Create the database indexes for the entity types that you ingested data for
- Drop the database indexes for the link types that you are ingesting data for
- Ingest all link data
- Create the database indexes for the link types that you ingested data for
- Start Liberty and Solr
- Use the i2 Analyze deployment toolkit to generate scripts that you run against the database manually.
- Use the import configuration properties file to specify whether the indexes must be dropped or created during a bulk import mode ingestion.
If you are ingesting data for multiple link types or you want to review the database scripts and run them manually, it is best to use the generated scripts to drop and re-create the database indexes. If you do not want to (or cannot) run scripts against the Information Store database, you can use the import configuration file and allow the ingestion process to drop and re-create the database indexes.
If you are completing an ingestion that spans multiple ingestion periods where the system is in use between ingestion periods, ensure that all of the indexes are created at the end of an ingestion period and that you start Liberty and Solr before analysts use the system.
For information about creating the import configuration file, see References and system properties.
Using the generated scripts method
- In your import configuration file, set both the
dropIndexes
andcreateIndexes
settings toFALSE
.For example:dropIndexes=FALSE createIndexes=FALSE
Generate the create and drop index scripts for each item type that you are ingesting data for.
For example, to generate the scripts for the item type with identifierET5
, run the following commands:
The scripts are output in the toolkit\scripts\database\db2\InfoStore\generated directory, in the createIndexes and dropIndexes directories.setup -t generateInformationStoreIndexCreationScripts -p schemaTypeId=ET5 setup -t generateInformationStoreIndexDropScripts -p schemaTypeId=ET5
- Stop Liberty and Solr.On the Liberty server, run:
setup -t stopLiberty
On each Solr server, run:
Where solr.host-name is the host name of the Solr server where you are running the command, and matches the value for thesetup -t stopSolrNodes --hostname solr.host-name
host-name
attribute of a<solr-node>
element in the topology.xml file. - Run the scripts that you generated in Step 2 to drop the indexes for each entity type that you plan to ingest data for.
- Complete the process to ingest the entity data using the bulk import mode. For more information, see Using bulk import mode.
- Run the scripts that you generated in Step 2 to create the indexes for each entity type that you ingested.
- Run the scripts that you generated in Step 2 to drop the indexes for each link type that you plan to ingest data for.
- Complete the process to ingest the link data using the bulk import mode. For more information, see Using bulk import mode.
Only re-create the database indexes after you ingest the link data for every link type that you plan to ingest.
- Run the scripts that you generated in Step 2 to create the indexes for each link type that you ingested.
- Start Liberty and Solr.On each Solr server, run:
Where solr.host-name is the host name of the Solr server where you are running the command, and matches the value for thesetup -t startSolrNodes --hostname solr.host-name
host-name
attribute of a<solr-node>
element in the topology.xml file.On the Liberty server, run:setup -t startLiberty
Using the import configuration properties file method
When you use the import configuration settings to drop and re-create the database indexes, the toolkit creates and drops the indexes for you as part of the ingestion process instead of running scripts against the database manually. However, you must modify the import configuration file a number of times throughout the ingestion of multiple item types. You must still stop Liberty and Solr before you run the ingestion command, and start them again after the indexes are created.
dropIndexes
and createIndexes
settings as follows:dropIndexes=TRUE
createIndexes=TRUE
- The first time you call the ingestion command, set the
dropIndexes
andcreateIndexes
settings as follows:dropIndexes=TRUE createIndexes=FALSE
- For the intermediate times that you call the ingestion command, set the
dropIndexes
andcreateIndexes
settings as follows:dropIndexes=FALSE createIndexes=FALSE
- The final time you call the ingestion command, set the
dropIndexes
andcreateIndexes
settings as follows:dropIndexes=FALSE createIndexes=TRUE