Editing TextChart rules

i2 makes all of the rules that control extraction available in a set of XML files. You can use TextChart Studio to modify the rules in the supplied set, or to write new rules of your own.

TextChart rules specify the linguistic patterns that entities must match in order to be extracted. TextChart reads and applies rule files in the order they appear in RuleFileList.xml. To access this file, click Manage LxBase in the LxBase section of the vertical toolbar.


Rule Editor

Within each rule file, TextChart applies the rules in the order in which they occur. To view and edit a particular rule file, click the Edit rules button in the menu.

To add a new rule to the open rule file, click the "Flag" button in the horizontal toolbar. TextChart Studio adds an XML template for the new rule to the file:

<Rule ID="ADD ID">
  <description>ADD DESCRIPTION</description>
  <comment>ADD ANY NECESSARY COMMENTS</comment>
  <example>ADD AN EXAMPLE OF THE DESIRED RESULT</example>
  <result>
    <combine></combine>
    <sv></sv>
    <attributes></attributes>
    <nolonger></nolonger>
  </result>
  <when>
    <T offset="0">
      <IS><sv></sv></IS>
      <ISNOT><sv></sv></ISNOT>
    </T>
  </when>
</Rule>

Every TextChart rule has a unique identifier and a description. You can also provide additional comments, and an example of the desired extraction result.

The rule's logic is composed of a <result> clause and a <when> clause. The <result> clause includes the semantic vector (<sv>) that the rule creates. For example, the matching pattern might be a PERSON entity or a sur_name semantic vector.

The <result> element also optionally includes <combine>, <nolonger>, and <attributes> elements:

  • Use <combine> to merge multiple tokens into the same resulting semantic vector. The value, which can be positive or negative, is the number of tokens to combine together as a match, starting with a count of 0. If the value is negative, combination happens backwards from the 0 token. This is used to look backwards for recursion.

  • Use <nolonger> to remove semantic vectors from tokens that match the pattern.

  • Use <attributes> to assign attributes to a particular token in a match.

The <when> element contains the pattern itself. For each token in the match, semantic vectors that must match are specified in the <IS> element, while semantic vectors that must not match are specified in the <ISNOT> element.

If an <IS> element contains two semantic vectors, they're ORed together during processing. The following example illustrates a token that must be a given name or a surname, but not a verb:

<T offset="0">
  <IS><sv><given_name/><sur_name/></sv></IS>
  <ISNOT><sv><verb/></sv></ISNOT>
</T>

To AND semantic vectors, the <when> element must contain instances of tokens with the same offset. In this example, the token must be both a surname and a word that starts with a capital letter:

<T offset="0">
  <IS><sv><sur_name/></sv></IS>
</T>

<T offset="0">
  <IS><sv><cap_word/></sv></IS>
</T>

You can use a negative offset to find a match in the rule, without including that particular token in the extraction result.