Storing results using output connectors

i2 TextChart Server worker nodes can store the results of extraction processing using small, dynamically loaded modules called output connectors. Output connectors are attached to a particular cluster, and you can administer them per-cluster through the manager.

i2 TextChart Server is supplied with two output connectors: the file writer, and the Elasticsearch connector.

File writer

The file writer output connector instructs the workers in a cluster to write the results of extraction processing to a particular file system directory in XML or JSON format. Where a cluster has several workers, it's common to use a network directory that's mounted in the same location on all worker nodes.

To use the file writer, configure the target cluster and select the Output tab. From the drop-down menu, select FileWriter. Then, click Configure to customize its settings:

  • Directory: The absolute path to the directory where the result files will be written.

  • File Mode: The amount of detail to include in the output files. Choose from the following options:

    • full: Write all data produced by the extraction engine.

  • File Type: Select xml or json output format.

The file writer generates filenames from the name portions of the incoming document identifiers. All output is placed in the specified directory; no subdirectories are created.

Elasticsearch

The Elasticsearch output connector instructs the workers in a cluster to send extraction results to an existing Elasticsearch database instance.

To use the Elasticsearch connector, configure the target cluster and select the Output tab. From the drop-down menu, select Elasticsearch. Then, click Configure to customize its settings:

  • Database host: The hostname or IP address of the server running the Elasticsearch database instance.

  • Database port: The network port on which the Elasticsearch server is listening.

  • Database cluster: The name of the Elasticsearch cluster to use.

  • RFO index: The name of the index in which to store the full data associated with a processed document. ("RFO" stands for RosokaFullObject.) The connector creates one entry per document in this index, using the document identifier as the key.

  • Entity index: The name of the index in which to store the extracted entity data associated with a processed document. The connector creates one entry per document in this index.

  • PSO index: The name of the index in which to store the extracted PSO (relationship) data associated with a processed document. The connector creates one entry per PSO in this index.

  • Store tokens (RFO): Select this option to store all of the individual tokens that were identified during extraction from the entry in the RFO index. Due to the potentially high volume of data, the default is to not store tokens.

  • Store text (RFO): Select this option to store the original text of the entry in the RFO index. This is the text that was used as the input to the extraction engine, after encoding and formatting took place. By default, this text is not stored.

  • Store gloss (RFO): Select this option to store the "gloss" - that is, the rough translation - of the entry in the RFO index. By default, the gloss is not stored.

  • Store PSO (RFO): Select this option to store the PSO (relationship) data from the entry in the RFO index. By default, this information is stored.

You must specify at least one index name. Specifying more than one index causes the connector to populate each index with the appropriate data.