Importing and processing documents
The i2 TextChart plug-in for Analyst's Notebook processes documents to extract information about the things they describe and the connections between them. You can add the extracted information to your charts in the form of entities and links.
Entities are the important, named items, such as the people, places, and events that a document describes. i2 TextChart uses the linguistic context of the document to determine what words or phrases are extracted as entities.
After processing, you can modify entity extraction results and apply your own knowledge before sending results to the chart.
Note: The LxBase documentation includes a list of all the entity types that TextChart understands, with their corresponding definitions.
To use i2 TextChart, the first step is to import the documents that you want to analyze.
In Analyst's Notebook, select the Text Analytics menu in the Home tab, and then click Collections.
Note: The Mapping Configuration option is available only if you're not connected to a server. See Appendix A for more information.
i2 TextChart stores documents in collections. A collection is a single or batch of document(s) you select to be in the same group for processing. You can start a new collection for the documents you're importing, or add them to an existing one.
If you clicked New, provide a name for the new collection and click Create.
And then specify whether to populate the new collection with files or a folder.
Finally, select the files (or the folder that contains the files) that you want to import.
After you import documents into a collection, you can instruct TextChart to process them all at the same time, or to process each document separately.
Note: When you process documents for the first time, it can take a few seconds for the engine to load. Also, larger documents generally take longer to process than shorter ones. TextChart displays a spinning cursor while long-running operations take place.
The documents in a particular collection appear as a list in Analyst's Notebook, from which you can perform a number of tasks. Clicking a single document in the list processes that document and opens it in the Document View. Some of the other tasks are described below.
When you click Process Collection, TextChart processes all the documents in the list that it has not already processed. The progress bar at the top of the pane shows the status of the operation. After each document is processed, its page icon turns blue.
Canceling the operation stops processing for the current and subsequent documents in the collection. You can also ask TextChart to reprocess all documents in the collection, regardless of whether they've been processed before.
Reprocess / Remove from this Collection / Remove from Database
If you right-click a single document in the list, the pop-up menu presents commands to reprocess the document, or to remove it from the collection, or to remove it (and the results from processing it) from the database.
The Documents tab displays a count of the documents in the collection.
The Entities tab contains an entity tree, which lists the types of all the found entities in processed documents, as well as the count of each type of entity.
The Links tab contains a link tree that lists the types of all the found links in processed documents, as well as the count of each type of link.
You can expand the link types in the tree view to see lists of the individual links and the predicates that TextChart identified during processing.
Click Back to Collections to return to the list of collections.
Click Add More Files to add more files to the current collection for processing.
Click Add Directory to Collection to add all the documents in a particular directory to the current collection.
After TextChart processes a document (or the first of several documents), it opens the Document View window, which presents an enriched view of the processed text.
The Document View highlights all of the found entities in context, using different colors for different types. In this view, you can review and curate the entities that TextChart identified.
The button at the upper left of the window opens entity and link trees for the current document.
The trees behave initially like the ones in the list for all the documents in a collection. However, when you select entities and links (or groups of them), TextChart highlights them in the open document and displays a second panel that contains more information.
Use the Tag drop-down to tag a new entity or link.
Through the Options button, you can:
Toggle back and forth between an English version of the document, and the text in its original language.
Remove whitespace from the document as displayed
Change the font size of the text in the document.