Match rules syntax

A match rules file is an XML document whose structure is validated when i2 Analyze starts. The match rules syntax is the same for both system match and Find Matching Records match rules files.

Root element: matchRules

The root element of a match rules file is a <matchRules> element from the defined namespace. For example:

<tns:matchRules
  xmlns:tns="http://www.i2group.com/Schemas/2019-05-14/MatchRules"
  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
  xmlns:xsd="http://www.w3.org/2001/XMLSchema"
  xsi:schemaLocation=
    "http://www.i2group.com/Schemas/2019-05-14/MatchRules MatchRules.xsd"
  version="2"
  enableSourceIdentifierMatching="true">
  ...
</tns:matchRules>

The <matchRules> element has the following customizable attributes:

  • enableSourceIdentifierMatching

    For a deployment that contains the Information Store and the i2 Connect gateway, controls whether system matching uses source identifiers to determine whether records match each other, regardless of whether they have property matches.

    When this attribute is false or absent, different users can cause duplication by retrieving the same records from external sources, editing them, and uploading them to the Information Store. When it is true, such records are detected by system matching.

  • version

    The version of the match rules file, which must be 2 at this release.

matchRule

Inside the root element, each <matchRule> element defines a match rule for records of a particular entity or link type. The <matchRule> element has the following attributes:

  • id

    An identifier for the match rule, which must be unique within the match rules file.

  • itemTypeId

    The identifier for the entity or link type to which the rule applies, as defined in the i2 Analyze schema.

    Important: For item types that are defined in gateway or connector schemas, i2 Analyze appends the schema short name to the item type identifier. For example, if the gateway schema defines an item type with the identifier ET5, then the identifier to use here might be ET5-external.

    In the modified type identifier, the short name is always in lower-case letters and separated from the original identifier with a hyphen. Any whitespace or non-alphanumeric characters in the short name are converted to single hyphens.

    When you create or edit match rules through Analyst's Notebook, the application handles these modifications to item type identifiers for you. When you edit the XML file yourself, you are responsible for specifying item type identifiers correctly.

  • displayName

    The name of the rule, which is displayed to analysts in Analyst's Notebook in Find Matching Records.

  • description

    The description of the rule, which is displayed to analysts in Analyst's Notebook in Find Matching Records.

  • active

    Defines whether the rule is active. A value of true means that the rule is active; a value of false means that the rule is not active.

  • linkDirectionOperator

    Determines whether two links must have the same direction in order to match. Mandatory for link type rules, where it must have the value EXACT_MATCH or ANY. Must be absent or null for entity type rules.

  • version

    In earlier versions, the version attribute was mandatory on <matchRule> elements. At this release, the per-rule version is optional, and any value is ignored.

For example, an entity type rule:

<matchRule
  id="4a2b9baa-e3c4-4840-a9fd-d204711af50e"
  itemTypeId="ET3"
  displayName="Match vehicles"
  description="Match vehicles with the same license plate when
               either the registered state or region are the same."
  active="true">
  ...
</matchRule>

And a link type rule:

<matchRule
  id="8aa5f6f4-a1a8-41de-b5c5-8701e44bcde7"
  itemTypeId="LAS1"
  displayName="Match duplicate links"
  description="Match link records between the same pair of entity
               records when the links are in the same direction."
  active="true"
  linkDirectionOperator="EXACT_MATCH">
  ...
</matchRule>

matchAll and matchAny

To specify the behavior of a match rule, you can use the following children of the <matchRule> element:

matchAll

The <matchAll> element specifies that all of the conditions within it must be met.

matchAny

The <matchAny> element specifies that at least one of the conditions within it must be met.

A <matchRule> element must have both the <matchAll> and <matchAny> elements. For example:

<matchRule ... >
  <matchAll>
    ...
  </matchAll>
  <matchAny />
</matchRule>

condition

All match rules must contain the <matchAll> and <matchAny> elements, although both can be empty for link type rules. It is valid to create a rule that makes all link records of the same type between the same pair of entity records match each other, regardless of any other considerations.

All entity type rules must contain at least one condition. Many link type rules contain conditions too. Each condition defines a comparison that takes place between values in different records, and specifies when those values are considered to match. Conditions can be refined by using operators, values, and normalizations.

To specify the conditions of a match rule, you use the <condition> element that can be a child of the <matchAll> and <matchAny> elements. Each <condition> element has a mandatory propertyTypeId attribute, which is the identifier for the property type to which the condition applies, as defined in the i2 Analyze schema.

For example:

<condition propertyTypeId="VEH2">
...
</condition>

All conditions contain an <operator> element, most of them a contain <value> element, and many contain <normalizations>.

operator

The <operator> element defines the type of comparison between the property values in different records, or between the property value and a static value specified within the rule. The possible operators are:

Operator

Description

EXACT_MATCH

The values that are compared must match each other exactly.

EXACT_MATCH_START

A specified number of characters at the start of string values must match each other exactly.

EXACT_MATCH_END

A specified number of characters at the end of string values must match each other exactly.

EQUAL_TO

The property values must match each other, and the specified <value>.

For example:

<condition propertyTypeId="VEH2">
  <operator>EXACT_MATCH</operator>
  ...
</condition>

value

The contents of the <value> element affect the behavior of the <operator> of the condition. Different operators require different value types.

  • If the operator is EXACT_MATCH_START or EXACT_MATCH_END, the value is an integer that specifies the number of characters to compare at the start or end of the property value:

    <operator>EXACT_MATCH_START</operator>
    <value xsi:type="xsd:int">3</value>
  • If the operator is EQUAL_TO, the value is a string to compare with the property value:

    <operator>EQUAL_TO</operator>
    <value xsi:type="xsd:string">red</value>
  • If the operator is EXACT_MATCH, it is not valid to specify a <value> element.

    <operator>EXACT_MATCH</operator>

normalizations

The <normalizations> element contains child <normalization> elements that define how property values are compared with each other (and sometimes with the contents of the <value> element). The possible values for the <normalization> element are:

Normalization

Description

IGNORE_CASE

Ignores case during the comparison ('a' matches 'A')

IGNORE_DIACRITICS

Ignores diacritic marks on characters ('Ã' matches 'A')

IGNORE_WHITESPACE_BETWEEN

Ignores whitespace between characters ('a a' matches 'aa')

IGNORE_WHITESPACE_AROUND

Ignore whitespace around a string (' a ' matches 'a')

IGNORE_NUMERIC

Ignore numeric characters ('a50' matches 'a')

IGNORE_ALPHABETIC

Ignore alphabetic characters ('a50' matches '50')

IGNORE_NONALPHANUMERIC

Ignore non-alphanumeric characters ('a-a' matches 'aa')

SIMPLIFY_LIGATURES

Simplify ligatures ('æ' matches 'ae')

For example, you might have the following normalizations for an EXACT_MATCH operator:

<condition ... >
  <operator>EXACT_MATCH</operator>
  <normalizations>
    <normalization>IGNORE_CASE</normalization>
    <normalization>IGNORE_NONALPHANUMERIC</normalization>
    <normalization>IGNORE_WHITESPACE_BETWEEN</normalization>
  </normalizations>
</condition>

In this example, the values "b m w xdrive" and "BMW x-drive" are considered a match.

The operators and normalizations that you can specify for a condition depend on the logical type of the property type to which the condition applies. The following table shows the operators and normalizations that you can use for each logical type:

Schema logical type

Operators

Normalization

SINGLE_LINE_STRING

All

All

SELECTED_FROM

All

All

SUGGESTED_FROM

All

All

BOOLEAN

EXACT_MATCH

None

INTEGER

EXACT_MATCH

None

DECIMAL

EXACT_MATCH

None

DOUBLE

EXACT_MATCH

None

DATE_AND_TIME

EXACT_MATCH

None

DATE

EXACT_MATCH

None

TIME

EXACT_MATCH

None

Property types that have the following logical types cannot be used in match rules:

  • GEOSPATIAL

  • MULTIPLE_LINE_STRING

The following XML is an example of a match rules file that contains a single entity match rule. The rule matches vehicle records that have the same values for the license plate property, and the same values for either the state or region properties.

For example, two vehicle records with the license plates "1233 DC 33" and "1233DC33" from the regions "Ile-de-France and "Île De France" are identified as a match for the following rule:

<tns:matchRules 
    xmlns:xsd="http://www.w3.org/2001/XMLSchema" 
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
    xsi:schemaLocation=
      "http://www.i2group.com/Schemas/2019-05-14/MatchRules MatchRules.xsd"
    version="2"
    xmlns:tns="http://www.i2group.com/Schemas/2019-05-14/MatchRules">
  <matchRule id="4a2b9baa-e3c4-4840-a9fd-d204711af50e"
             itemTypeId="ET3"
             displayName="Match vehicles"
             description="Match vehicles with the same license plate,
                          when either the registered state or
                          region are the same."
             active="true">
    <matchAll>
      <condition propertyTypeId="VEH2">
        <operator>EXACT_MATCH</operator>
        <normalizations>
          <normalization>IGNORE_WHITESPACE_BETWEEN</normalization>
        </normalizations>
      </condition>
    </matchAll>
    <matchAny>
      <condition propertyTypeId="VEH16">
        <operator>EXACT_MATCH</operator>
        <normalizations>
          <normalization>IGNORE_CASE</normalization>
          <normalization>IGNORE_DIACRITICS</normalization>
          <normalization>IGNORE_WHITESPACE_BETWEEN</normalization>
          <normalization>IGNORE_NONALPHANUMERIC</normalization>
        </normalizations>
      </condition>
      <condition propertyTypeId="VEH15">
        <operator>EXACT_MATCH</operator>
        <normalizations>
          <normalization>IGNORE_CASE</normalization>
          <normalization>IGNORE_DIACRITICS</normalization>
          <normalization>IGNORE_WHITESPACE_BETWEEN</normalization>
          <normalization>IGNORE_NONALPHANUMERIC</normalization>
        </normalizations>
      </condition>
    </matchAny>
  </matchRule>
</tns:matchRules>