Match rules syntax

A match rules file is an XML document whose structure is validated when i2 Analyze starts. The match rules syntax is the same for both system match and Find Matching Records match rules files.

Root element: matchRules

The root element of a match rules file is a <matchRules> element from the defined namespace. For example:

<tns:matchRules
  xmlns:tns="http://www.i2group.com/Schemas/2019-05-14/MatchRules"
  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
  xmlns:xsd="http://www.w3.org/2001/XMLSchema"
  xsi:schemaLocation=
    "http://www.i2group.com/Schemas/2019-05-14/MatchRules MatchRules.xsd"
  version="2"
  enableSourceIdentifierMatching="true">
  ...
</tns:matchRules>

The <matchRules> element has the following customizable attributes:

Attribute Description
enableSourceIdentifierMatching For a deployment that contains the Information Store and the i2 Connect gateway, controls whether system matching uses source identifiers to determine whether records match each other, regardless of whether they have property matches.

When this attribute is false or absent, different users can cause duplication by retrieving the same records from external sources, editing them, and uploading them to the Information Store. When it is true, such records are detected by system matching.

version The version of the match rules file, which must be 2 at this release.

matchRule

Inside the root element, each <matchRule> element defines a match rule for records of a particular entity or link type. The <matchRule> element has the following attributes:

Attribute Description
id An identifier for the match rule, which must be unique within the match rules file.
itemTypeId The identifier for the entity or link type to which the rule applies, as defined in the i2 Analyze schema.
Important: For item types that are defined in gateway or connector schemas, i2 Analyze appends the schema short name to the item type identifier. For example, if the gateway schema defines an item type with the identifier ET5, then the identifier to use here might be ET5-external.

In the modified type identifier, the short name is always in lower-case letters and separated from the original identifier with a hyphen. Any whitespace or non-alphanumeric characters in the short name are converted to single hyphens.

When you create or edit match rules through Analyst's Notebook Premium, the application handles these modifications to item type identifiers for you. When you edit the XML file yourself, you are responsible for specifying item type identifiers correctly.

displayName The name of the rule, which is displayed to analysts in Analyst's Notebook Premium in Find Matching Records.
description The description of the rule, which is displayed to analysts in Analyst's Notebook Premium in Find Matching Records.
active Defines whether the rule is active. A value of true means that the rule is active; a value of false means that the rule is not active.
linkDirectionOperator Determines whether two links must have the same direction in order to match. Mandatory for link type rules, where it must have the value EXACT_MATCH or ANY. Must be absent or null for entity type rules.
version In earlier versions, the version attribute was mandatory on <matchRule> elements. At this release, the per-rule version is optional, and any value is ignored.

For example, an entity type rule:

<matchRule
  id="4a2b9baa-e3c4-4840-a9fd-d204711af50e"
  itemTypeId="ET3"
  displayName="Match vehicles"
  description="Match vehicles with the same license plate when
               either the registered state or region are the same."
  active="true">
  ...
</matchRule>

And a link type rule:

<matchRule
  id="8aa5f6f4-a1a8-41de-b5c5-8701e44bcde7"
  itemTypeId="LAS1"
  displayName="Match duplicate links"
  description="Match link records between the same pair of entity
               records when the links are in the same direction."
  active="true"
  linkDirectionOperator="EXACT_MATCH">
  ...
</matchRule>

matchAll and matchAny

To specify the behavior of a match rule, you can use the following children of the <matchRule> element:
matchAll
The <matchAll> element specifies that all of the conditions within it must be met.
matchAny
The <matchAny> element specifies that at least one of the conditions within it must be met.
A <matchRule> element must have both the <matchAll> and <matchAny> elements. For example:
<matchRule ... >
  <matchAll>
    ...
  </matchAll>
  <matchAny />
</matchRule>

condition

All match rules must contain the <matchAll> and <matchAny> elements, although both can be empty for link type rules. It is valid to create a rule that makes all link records of the same type between the same pair of entity records match each other, regardless of any other considerations.

All entity type rules must contain at least one condition. Many link type rules contain conditions too. Each condition defines a comparison that takes place between values in different records, and specifies when those values are considered to match. Conditions can be refined by using operators, values, and normalizations.

To specify the conditions of a match rule, you use the <condition> element that can be a child of the <matchAll> and <matchAny> elements. Each <condition> element has a mandatory propertyTypeId attribute, which is the identifier for the property type to which the condition applies, as defined in the i2 Analyze schema.

For example:
<condition propertyTypeId="VEH2">
...
</condition>

All conditions contain an <operator> element, most of them a contain <value> element, and many contain <normalizations>.

operator
The <operator> element defines the type of comparison between the property values in different records, or between the property value and a static value specified within the rule. The possible operators are:
Operator Description
EXACT_MATCH The values that are compared must match each other exactly.
EXACT_MATCH_START A specified number of characters at the start of string values must match each other exactly.
EXACT_MATCH_END A specified number of characters at the end of string values must match each other exactly.
EQUAL_TO The property values must match each other, and the specified <value>.
For example:
<condition propertyTypeId="VEH2">
  <operator>EXACT_MATCH</operator>
  ...
</condition>

For more information about the operators that you can use, depending on the logical type of the property, see Table 1.

value
The contents of the <value> element affect the behavior of the <operator> of the condition. Different operators require different value types.
  • If the operator is EXACT_MATCH_START or EXACT_MATCH_END, the value is an integer that specifies the number of characters to compare at the start or end of the property value:
    <operator>EXACT_MATCH_START</operator>
    <value xsi:type="xsd:int">3</value>
  • If the operator is EQUAL_TO, the value is a string to compare with the property value:
    <operator>EQUAL_TO</operator>
    <value xsi:type="xsd:string">red</value>
  • If the operator is EXACT_MATCH, it is not valid to specify a <value> element.
    <operator>EXACT_MATCH</operator>
normalizations
The <normalizations> element contains child <normalization> elements that define how property values are compared with each other (and sometimes with the contents of the <value> element). The possible values for the <normalization> element are:
Normalization Description
IGNORE_CASE Ignores case during the comparison ('a' matches 'A')
IGNORE_DIACRITICS Ignores diacritic marks on characters ('Ã' matches 'A')
IGNORE_WHITESPACE_BETWEEN Ignores whitespace between characters ('a a' matches 'aa')
IGNORE_WHITESPACE_AROUND Ignore whitespace around a string (' a ' matches 'a')
IGNORE_NUMERIC Ignore numeric characters ('a50' matches 'a')
IGNORE_ALPHABETIC Ignore alphabetic characters ('a50' matches '50')
IGNORE_NONALPHANUMERIC Ignore non-alphanumeric characters ('a-a' matches 'aa')
SIMPLIFY_LIGATURES Simplify ligatures ('æ' matches 'ae')
For example, you might have the following normalizations for an EXACT_MATCH operator:
<condition ... >
  <operator>EXACT_MATCH</operator>
  <normalizations>
    <normalization>IGNORE_CASE</normalization>
    <normalization>IGNORE_NONALPHANUMERIC</normalization>
    <normalization>IGNORE_WHITESPACE_BETWEEN</normalization>
  </normalizations>
</condition>
In this example, the values "b m w xdrive" and "BMW x-drive" are considered a match.
The operators and normalizations that you can specify for a condition depend on the logical type of the property type to which the condition applies. The following table shows the operators and normalizations that you can use for each logical type:
Table 1. Operators and normalizations for each schema logical type
Schema logical type Operators Normalization
SINGLE_LINE_STRING All All
SELECTED_FROM All All
SUGGESTED_FROM All All
BOOLEAN EXACT_MATCH None
INTEGER EXACT_MATCH None
DECIMAL EXACT_MATCH None
DOUBLE EXACT_MATCH None
DATE_AND_TIME EXACT_MATCH None
DATE EXACT_MATCH None
TIME EXACT_MATCH None
Property types that have the following logical types cannot be used in match rules:
  • GEOSPATIAL
  • MULTIPLE_LINE_STRING

The following XML is an example of a match rules file that contains a single entity match rule. The rule matches vehicle records that have the same values for the license plate property, and the same values for either the state or region properties.

For example, two vehicle records with the license plates "1233 DC 33" and "1233DC33" from the regions "Ile-de-France and "Île De France" are identified as a match for the following rule:
<tns:matchRules 
    xmlns:xsd="http://www.w3.org/2001/XMLSchema" 
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
    xsi:schemaLocation=
      "http://www.i2group.com/Schemas/2019-05-14/MatchRules MatchRules.xsd "
    version="2"
    xmlns:tns="http://www.i2group.com/Schemas/2019-05-14/MatchRules">
  <matchRule id="4a2b9baa-e3c4-4840-a9fd-d204711af50e"
             itemTypeId="ET3"
             displayName="Match vehicles"
             description="Match vehicles with the same license plate,
                          when either the registered state or
                          region are the same."
             active="true">
    <matchAll>
      <condition propertyTypeId="VEH2">
        <operator>EXACT_MATCH</operator>
        <normalizations>
          <normalization>IGNORE_WHITESPACE_BETWEEN</normalization>
        </normalizations>
      </condition>
    </matchAll>
    <matchAny>
      <condition propertyTypeId="VEH16">
        <operator>EXACT_MATCH</operator>
        <normalizations>
          <normalization>IGNORE_CASE</normalization>
          <normalization>IGNORE_DIACRITICS</normalization>
          <normalization>IGNORE_WHITESPACE_BETWEEN</normalization>
          <normalization>IGNORE_NONALPHANUMERIC</normalization>
        </normalizations>
      </condition>
      <condition propertyTypeId="VEH15">
        <operator>EXACT_MATCH</operator>
        <normalizations>
          <normalization>IGNORE_CASE</normalization>
          <normalization>IGNORE_DIACRITICS</normalization>
          <normalization>IGNORE_WHITESPACE_BETWEEN</normalization>
          <normalization>IGNORE_NONALPHANUMERIC</normalization>
        </normalizations>
      </condition>
    </matchAny>
  </matchRule>
</tns:matchRules>