Populating the staging tables

The i2 Analyze deployment toolkit and the ETL toolkit create the staging tables for data ingestion in the same database as the Information Store data tables. After you prepare your data, but before you can instruct the Information Store to ingest it, you must populate the staging tables.

The approach that you take to populate the staging tables depends on the database management system that you are using, the form that your source data is in, and the tools that you have available.

Db2 provides the ingest, import, and load utilities:
  • If your data is in comma-separated value (CSV) files, then you can use the IMPORT or INGEST commands.
  • If your data is in the tables or views of another database, then you can use the IMPORT, INGEST, or LOAD commands.
SQL Server provides the bulk insert and insert utilities:
  • If your data is in comma-separated value (CSV) files, then you can use the BULK INSERT command.
    Note: To use the BULK INSERT command, the user that you run the command as must be a member of the bulkadmin server role.
  • If your data is in the tables or views of another database, then you can use the INSERT command.
  • You can use SQL Server Integration Services as a tool to extract and transform data from various sources, and then load it into the staging tables.

Alternatively, regardless of your database management system, you can use IBM InfoSphere DataStage as a tool for transforming your data and loading it into the staging tables. You can specify the database schema that contains the staging tables as the target location for the ETL output.

The subdirectories of the examples\data directory in the deployment toolkit all contain a db2 and a sqlserver directory.

If you are using Db2, inspect the LoadCSVDataCommands.db2 file in the db2 directory. In each case, this file is a Db2 script that populates the example staging tables from the prepared CSV files. The script calls the IMPORT command repeatedly to do its work. In most instances, the command just takes data from columns in a CSV file and adds it to a staging table in a Db2 database schema.

If you are using SQL Server, inspect the LoadCSVDataCommands.sql file in the sqlserver directory. In each case, this file is an SQL script that populates the example tables from the prepared CSV files. The script calls the BULK INSERT command repeatedly to do its work. The BULK INSERT command uses .fmt format files, which are also in the sqlserver directory, to instruct SQL Server how to process the CSV files into the staging tables. For more information about format files, see Non-XML Format Files.