Populating the staging tables
The i2 Analyze deployment toolkit and the ETL toolkit create the staging tables for data ingestion in the same database as the Information Store data tables. After you prepare your data, but before you can instruct the Information Store to ingest it, you must populate the staging tables.
About this task
The approach that you take to populate the staging tables depends on the database management system that you are using, the form that your source data is in, and the tools that you have available.
- If your data is in comma-separated value (CSV) files, you can use either the
psql
client tool's \COPY command, or the SQL COPY command. - If your data is in the tables or views of another database, you can use the SQL INSERT command.
- If your data is in comma-separated value (CSV)
files, then you can use the BULK
INSERT command.Note: To use the BULK INSERT command, the user that you run the command as must be a member of the bulkadmin server role.
- If your data is in the tables or views of another database, then you can use the INSERT command.
- You can use SQL Server Integration Services as a tool to extract and transform data from various sources, and then load it into the staging tables.
- If your data is in comma-separated value (CSV) files, then you can use the IMPORT or INGEST commands.
- If your data is in the tables or views of another database, then you can use the IMPORT, INGEST, or LOAD commands.
Alternatively, regardless of your database management system, you can use IBM InfoSphere DataStage as a tool for transforming your data and loading it into the staging tables. You can specify the database schema that contains the staging tables as the target location for the ETL output.
Example
The subdirectories of the examples\data directory in the deployment toolkit all contain a postgres, a sqlserver, and a db2 directory.
If you are using PostgreSQL, inspect the LoadCSVDataCommands.sql file in the postgres directory. In each case, this file is an SQL script that populates the example staging tables from the prepared CSV files. The script calls the \COPY command repeatedly to do its work.
If you are using SQL Server, inspect the LoadCSVDataCommands.sql file in the sqlserver directory. In each case, this file is an SQL script that populates the example tables from the prepared CSV files. The script calls the BULK INSERT command repeatedly to do its work. The BULK INSERT command uses .fmt format files, which are also in the sqlserver directory, to instruct SQL Server how to process the CSV files into the staging tables. For more information about format files, see Non-XML Format Files.
If you are using Db2, inspect the LoadCSVDataCommands.db2 file in the db2 directory. In each case, this file is a Db2 script that populates the example staging tables from the prepared CSV files. The script calls the IMPORT command repeatedly to do its work. In most instances, the command just takes data from columns in a CSV file and adds it to a staging table in a Db2 database schema.