Importing and processing documents

The i2 TextChart plug-in for Analyst's Notebook processes documents to extract information about the things they describe and the connections between them. You can add the extracted information to your charts in the form of entities and links.

Entity processing

Entities are the important, named items, such as the people, places, and events that a document describes. i2 TextChart uses the linguistic context of the document to determine what words or phrases are extracted as entities.

After processing, you can modify entity extraction results and apply your own knowledge before sending results to the chart.

Note: The LxBase documentation includes a list of all the entity types that TextChart understands, with their corresponding definitions.

Importing documents

To use i2 TextChart, the first step is to import the documents that you want to analyze.

  1. In Analyst's Notebook, select the Text Analytics menu in the Home tab, and then click Collections.

    Text Analytics menu

    Note: The Mapping Configuration option is available only if you're not connected to a server. See Appendix A for more information.

  1. i2 TextChart stores documents in collections. A collection is a single or batch of document(s) you select to be in the same group for processing. You can start a new collection for the documents you're importing, or add them to an existing one.

    ANB with Collections pane
  2. If you clicked New, provide a name for the new collection and click Create.

    New collection name

    And then specify whether to populate the new collection with files or a folder.

    New collection files or folder
  3. Finally, select the files (or the folder that contains the files) that you want to import.

    Select files to import

Processing documents

After you import documents into a collection, you can instruct TextChart to process them all at the same time, or to process each document separately.

Note: When you process documents for the first time, it can take a few seconds for the engine to load. Also, larger documents generally take longer to process than shorter ones. TextChart displays a spinning cursor while long-running operations take place.

The documents in a particular collection appear as a list in Analyst's Notebook, from which you can perform a number of tasks. Clicking a single document in the list processes that document and opens it in the Document View. Some of the other tasks are described below.

TextChart document list
  1. Process Collection

    When you click Process Collection, TextChart processes all the documents in the list that it has not already processed. The progress bar at the top of the pane shows the status of the operation. After each document is processed, its page icon turns blue.

    Canceling the operation stops processing for the current and subsequent documents in the collection. You can also ask TextChart to reprocess all documents in the collection, regardless of whether they've been processed before.

  2. Reprocess / Remove from this Collection / Remove from Database

    If you right-click a single document in the list, the pop-up menu presents commands to reprocess the document, or to remove it from the collection, or to remove it (and the results from processing it) from the database.

  3. Documents

    The Documents tab displays a count of the documents in the collection.

  4. Entities

    The Entities tab contains an entity tree, which lists the types of all the found entities in processed documents, as well as the count of each type of entity.

    TextChart collection entity and link lists
  5. Links

    The Links tab contains a link tree that lists the types of all the found links in processed documents, as well as the count of each type of link.

    You can expand the link types in the tree view to see lists of the individual links and the predicates that TextChart identified during processing.

  6. Click Back to Collections to return to the list of collections.

  7. Click Add More Files to add more files to the current collection for processing.

  8. Click Add Directory to Collection to add all the documents in a particular directory to the current collection.

Viewing documents

After TextChart processes a document (or the first of several documents), it opens the Document View window, which presents an enriched view of the processed text.

The Document View highlights all of the found entities in context, using different colors for different types. In this view, you can review and curate the entities that TextChart identified.

TextChart Document View
  1. The button at the upper left of the window opens entity and link trees for the current document.

    TextChart document entity and link lists

    The trees behave initially like the ones in the list for all the documents in a collection. However, when you select entities and links (or groups of them), TextChart highlights them in the open document and displays a second panel that contains more information.

    TextChart document with entities selected
  2. Use the Tag drop-down to tag a new entity or link.

    Tag drop-down
  3. Through the Options button, you can:

    1. Toggle back and forth between an English version of the document, and the text in its original language.

    2. Remove whitespace from the document as displayed

    3. Change the font size of the text in the document.

    Options drop-down