Deploying the ETL toolkit

If your deployment includes logic that extracts, transforms, and loads data on a different server from the i2® Analyze application or the Information Store, consider deploying the ETL toolkit. The ETL logic can then run ETL toolkit commands to automate loading and ingesting data into the Information Store.

About this task

In an i2 Analyze deployment that uses data from an external source, the ETL logic is the processing that transforms source data for loading into the Information Store staging tables. In mature deployments, it is common for the ETL process to be automated so that loading and ingesting data happen in sequence, on a schedule.

When your ETL logic is colocated with the standard i2 Analyze deployment toolkit, the logic can use that toolkit to drive the ingestion process automatically. When those components are on separate servers, you can deploy the ETL toolkit to the server that hosts the ETL logic. The ETL toolkit provides the ingestion functions of the deployment toolkit in a stand-alone package.

Procedure

The ETL toolkit must be able to communicate with the Information Store with all the same credentials as the deployment toolkit. To enable this behavior, you use the deployment toolkit to create the ETL toolkit, and then copy it to the ETL logic server.

  1. On the server that has the deployment toolkit, open a command prompt and navigate to the toolkit\scripts directory.
  2. Run the createEtlToolkit command to generate the ETL toolkit:
    setup -t createEtlToolkit -p outputPath=output_path
    This command creates the ETL toolkit in a directory that is named etltoolkit in the output path that you specify.
  3. Copy the ETL toolkit to the server that hosts the ETL logic.

If the ETL logic and toolkit are on the same server as the database management system that hosts the Information Store, you do not need to modify the connection configuration. If the database management system is on a different server, then you must ensure that the ETL toolkit can communicate with the remote database.

  1. Depending on your database management system, install the client tools for PostgreSQL, or Microsoft™ Command Line Utilities for SQL Server, or Db2® client software on the server that hosts the ETL toolkit.
    For more information, see Installing a database.
  2. Navigate to the classes directory of the ETL toolkit and open the Connection.properties file in a text editor.
  3. Ensure that the value for the db.installation.dir setting is correct for the path to the PostgreSQL client tools or Microsoft Command Line Utilities for SQL Server or the Db2 client on the server that hosts this ETL toolkit.
    For example:
    db.installation.dir=C:/Program Files/IBM/SQLLIB
  4. If you are using Db2 to host the Information Store, you must catalog the remote Db2 database. Run the following commands to enable the ETL toolkit to communicate with the Information Store:
    db2 catalog tcpip node node-name host-name server port-number
    db2 catalog database instance-name at node node-name

    Here, host-name, port-number, and instance-name are the values that are specified in the topology.xml file. node-name can be any value that you choose, but you must use the same value in both commands.

If the database management system that hosts the Information Store is not using SSL, then the process is complete.

If the database management system is configured to use SSL, you must also enable the ETL toolkit to communicate by using SSL. The detail of this process depends on your choice of database management system.

  1. If you're using PostgreSQL, take the i2-database_management_system-certificate.cer certificate that you exported from the database management system when you configured SSL on the server that hosts the ETL toolkit, and store it in a convenient location on the server.
  2. If you're using SQL Server or Db2:
    1. Register the i2-database_management_system-certificate.der certificate that you exported from the database management system when you configured SSL on the server that hosts the ETL toolkit.
      • On Windows, import the certificate into the Trusted Root Certification Authorities store for the current user.
      • On Linux, copy the certificate to the /etc/pki/ca-trust/source/anchors directory and use update-ca-trust to enable it as a system CA certificate.
    2. Create a truststore and import into the truststore the certificate that you exported from the database management system when you configured SSL.
      For example, run the following command:
      keytool -importcert -alias "dbKey"
              -file C:\i2\etltoolkit\i2-database_management_system-certificate.der
              -keystore "C:\i2\etltoolkit\i2-etl-truststore.jks"
              -storepass "password"

      Enter yes in response to the query, Trust this certificate?

  3. Navigate to the classes directory of the ETL toolkit and open the TrustStore.properties file in a text editor.
  4. If you're using PostgreSQL, populate the DBTrustStoreLocation property with the full path to the certificate that you stored earlier.
  5. If you're using SQL Server or Db2:
    1. Populate the DBTrustStoreLocation and DBTrustStorePassword properties with the full path to the truststore that you created, and the password that is required to access it.
      For example:
      DBTrustStoreLocation=C:/i2/etltoolkit/i2-etl-truststore.jks
      DBTrustStorePassword=password
    2. You can use the Liberty profile securityUtility command to encode the password for the truststore.
      1. Navigate to the bin directory of the Open Liberty deployment that was configured by the deployment toolkit.
      2. In a command prompt, run securityUtility encode password, which generates and displays the encoded password. Use the entire value, including the {xor} prefix, for the DBTrustStorePassword property value. For more information about using the security utility, see securityUtility encode.

Results

The ETL toolkit is ready for use by your ETL logic to modify the Information Store. At key points in the processes of preparing for and performing ingestion, you can use commands in the ETL toolkit in place of deployment toolkit functions.