Managing clusters

To manage clusters in i2 TextChart Server, you use the Clusters page of the manager's user interface:


The manager's Clusters page

Adding a new cluster

To add a new cluster, New Cluster in the upper-right corner of the page to display a dialog. Enter a unique name for the cluster and an optional short description. Then, click Add to create it.

A new cluster is automatically configured with the default LxBase, the default GxBase, and a default set of properties. Output from the cluster can only be returned immediately from REST API calls.

Setting the default cluster

One cluster in the system is always designated as the default cluster, which means that it's used if no specific cluster is requested when documents are sent for processing through the REST API.

Unless you change it, the first cluster that you created is the default To change the default cluster, click the drop-down menu in the row that contains the cluster you want to use, and select Set as Default. The asterisk moves to the selected cluster, indicating that it is now the default.

Removing a cluster

To remove a cluster that you no longer need, click the drop-down menu in the row that contains the cluster and select Remove Cluster. Then, click OK in the confirmation dialog to remove the cluster from the system.

You can remove a cluster only if it has no assigned workers. Use the Workers page to remove all worker nodes from the cluster in question before you attempt to remove the cluster itself.

Configuring a cluster

To configure a cluster, click the drop-down menu in the row that contains the cluster and select Configure. A panel opens below the cluster table that allows you to see and change all available cluster options:


Cluster configuration

The configuration settings are split across three tabs: Assets, Extraction, and Output. After you make changes, click Save Changes button in the upper-right corner of the panel to propagate them to all the worker nodes in the cluster. Alternatively, click Cancel Changes to discard your changes.

Assets

A cluster's assets are the two large pieces of data that i2 TextChart Server uses for extraction and tagging:

  • LxBase: The LxBase consists of multi-lingual dictionaries and rules that control how the i2 TextChart Server engine finds named entities in unstructured text. The LxBase is a single ZIP file that is provided by i2 or created through TextChart Studio.

  • GxBase: The GxBase consists of a geographic database that is used to augment information on extracted place entities. The GxBase is a single ZIP file that i2 provides.

By default, i2 TextChart Server comes with the latest version of each of these assets. When you create a cluster, these default versions are used.

To use a new LxBase or a new GxBase, you can upload it by using the Assets tab in the cluster Configuration pane:


Cluster configuration assets

Click Upload next to the appropriate asset to select a file to upload. After upload, the file is checked to validate the data it contains. If validation is successful, the name of the asset changes:


Asset name change

Extraction

Use the Extraction tab to set run-time operational properties for the i2 TextChart Server engine:


Cluster configuration extraction

You can modify each property by selecting a value from a drop-down or typing into a field:

Property

Description

rawinput

Set to TRUE to force the engine never to convert documents from non-text formats such as Word, Excel, or PDF.

inlinetext

Set to TRUE to populate the inlinetext field of the RosokaFullObject output.

inlinegloss

Set to TRUE to populate the inlinegloss field of the RosokaFullObject output.

internalGeoGravy

Set to ON to enable geographic name lookup.

geoMode

Set to BEST to disambiguate place names with multiple matches solely by contained priority value and type. Set to COLOCATED to take other locations in the document into account, to find locations in a close geographic area.

geoSortPreference

Indicates how multiple geographic name matches are disambiguated. Can be blank (no preference) or one of: CONUS (prefer locations in the continental US); OCONUS (prefer locations outside of the continental US); region (prefer locations in the given region, including AFRCA, AMER, ANTAR, ASIA, BALK, CAFR, CAMER, CARIB, CAUC, CEURA, CEURO, EAFR, EASIA, EURA, EURO, MDEDT, MEAST, NAFR, NAMER, NEURO, OCEAN, OCENA, RUSSA, SAFR, SAMER, SASIA, SEASA, SEURO, and WAFR); country: country (prefer locations in the given country); and N, W, S, E (prefer locations in the region bounded by the given lat/long measures in decimal degrees).

loaderChunkHardLimit

Maximum length, in characters, of text that the engine can process in one go. A value of -1 indicates no limit (not recommended).

loaderChunkSoftLimit

Minimum length, in characters, to be processed before the loader attempts to break the document at a reasonable location (such as a paragraph break or a sentence break). A value of -1 indicates no limit. Must be set to a value lower than the hard limit.

Output

Use the Output tab to set the output connector for extraction output when documents are processed in "ingest" mode. A cluster that has no output connector can be used only for "immediate" processing, by returning the extraction results through the REST API.


Cluster configuration output

To change the output connector for a cluster, select a new one from the drop-down list. If you don't want to use an output connector, select None.

After you select an output connector, click Configure, enter the information that the connector needs, and the click Save below the configuration panel.

Note: You must click Save in the configuration before you click Save Changes at the top of the panel in order for your changes to be applied.

See Storing Results using Output Connectors for information about the configuration options for the built-in output connectors.