Creating dictionaries

i2 TextChart can only extract information from documents when they contain words and terms that appear in its dictionaries. If your organization regularly uses industry-specific terms that are not present in the standard TextChart dictionaries, you can create your own dictionary and add those terms to it.

To create a dictionary, clicking the "plus" icon above the tree view in the dictionary editor. TextChart Studio displays a dialog where you can provide the name and location of the new dictionary.


New dictionary dialog

To add a lexical entry to the new dictionary, click the "flag" button in the toolbar to generate an XML template.

<lex><word></word><sv></sv></lex>

As well as the term itself, which you type in the <word> element, there are additional attributes that you can use to enrich your extraction results.


Dictionary entries

For example, if you wanted to normalize processing so that all references in documents to "England", "Great Britain", or "United Kingdom" resolve to just "United Kingdom", you can use the norm attribute on the country_name semantic vector:

<lex><word>great britain</word><sv><country_name norm="united kingdom"/></sv></lex>

<lex><word>england</word><sv><country_name norm="united kingdom"/></sv></lex>

<lex><word>united kingdom</word><sv><country_name norm="united kingdom"/></sv></lex>

Alternatively, you can configure lexical entries so that terms to be extracted as PERSON entities always produce results with the same form. For example, to extract "Audrey Hepburn" as "Audrey Hepburn" and not "V. Audrey Hepburn", or to arrange that "Barack Obama" always has the subtype "elected official":

<lex><word>audrey hepburn</word><sv><PERSON given_name="audrey" sur_name="hepburn" gender="female"/></sv></lex>

<lex><word>barack obama</word><sv><PERSON given_name="barack" sur_name="obama" subtype="elected_official"/></sv></lex>

You can also create non-English, industry-specific dictionaries. When you click the "plus" icon above the tree view in the dictionary editor, TextChart Studio displays a dialog where you can select the appropriate language for your dictionary.


Select dictionary language

You can industry-specific knowledge to any available non-English dictionary. To do so, click the "flag" button in the toolbar to generate an XML template:

<lex><word></word><gloss></gloss></lex>

To add an entry to a non-English dictionary, add the term (in the dictionary's language) to the <word> element, and the English gloss to the <gloss> element.


Non-English dictionary

If the English gloss is already an entry in an English dictionary, TextChart uses the semantic vectors that are attached to the English entry. Optionally, you can also add semantic vectors to the non-English entry.