Match rules syntax
A match rules file is an XML document whose structure is validated when i2 Analyze starts. The match rules syntax is the same for both system match and Find Matching Records match rules files.
Root element: matchRules
The root element of a match rules file is a <matchRules> element from the defined namespace. For example:
<tns:matchRules
xmlns:tns="http://www.i2group.com/Schemas/2019-05-14/MatchRules"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:xsd="http://www.w3.org/2001/XMLSchema"
xsi:schemaLocation=
"http://www.i2group.com/Schemas/2019-05-14/MatchRules MatchRules.xsd"
version="2"
enableSourceIdentifierMatching="true">
...
</tns:matchRules>
The <matchRules> element has the following customizable attributes:
enableSourceIdentifierMatching
For a deployment that contains the Information Store and the i2 Connect gateway, controls whether system matching uses source identifiers to determine whether records match each other, regardless of whether they have property matches.
When this attribute is false or absent, different users can cause duplication by retrieving the same records from external sources, editing them, and uploading them to the Information Store. When it is true, such records are detected by system matching.
version
The version of the match rules file, which must be 2 at this release.
matchRule
Inside the root element, each <matchRule> element defines a match rule for records of a particular entity or link type. The <matchRule> element has the following attributes:
id
An identifier for the match rule, which must be unique within the match rules file.
itemTypeId
The identifier for the entity or link type to which the rule applies, as defined in the i2 Analyze schema.
Important: For item types that are defined in gateway or connector schemas, i2 Analyze appends the schema short name to the item type identifier. For example, if the gateway schema defines an item type with the identifier ET5, then the identifier to use here might be ET5-external.
In the modified type identifier, the short name is always in lower-case letters and separated from the original identifier with a hyphen. Any whitespace or non-alphanumeric characters in the short name are converted to single hyphens.
When you create or edit match rules through Analyst's Notebook, the application handles these modifications to item type identifiers for you. When you edit the XML file yourself, you are responsible for specifying item type identifiers correctly.
displayName
The name of the rule, which is displayed to analysts in Analyst's Notebook in Find Matching Records.
description
The description of the rule, which is displayed to analysts in Analyst's Notebook in Find Matching Records.
active
Defines whether the rule is active. A value of true means that the rule is active; a value of false means that the rule is not active.
linkDirectionOperator
Determines whether two links must have the same direction in order to match. Mandatory for link type rules, where it must have the value EXACT_MATCH or ANY. Must be absent or null for entity type rules.
version
In earlier versions, the version attribute was mandatory on <matchRule> elements. At this release, the per-rule version is optional, and any value is ignored.
For example, an entity type rule:
<matchRule
id="4a2b9baa-e3c4-4840-a9fd-d204711af50e"
itemTypeId="ET3"
displayName="Match vehicles"
description="Match vehicles with the same license plate when
either the registered state or region are the same."
active="true">
...
</matchRule>
And a link type rule:
<matchRule
id="8aa5f6f4-a1a8-41de-b5c5-8701e44bcde7"
itemTypeId="LAS1"
displayName="Match duplicate links"
description="Match link records between the same pair of entity
records when the links are in the same direction."
active="true"
linkDirectionOperator="EXACT_MATCH">
...
</matchRule>
matchAll and matchAny
To specify the behavior of a match rule, you can use the following children of the <matchRule> element:
matchAll
The <matchAll> element specifies that all of the conditions within it must be met.
matchAny
The <matchAny> element specifies that at least one of the conditions within it must be met.
A <matchRule> element must have both the <matchAll> and <matchAny> elements. For example:
<matchRule ... >
<matchAll>
...
</matchAll>
<matchAny />
</matchRule>
condition
All match rules must contain the <matchAll> and <matchAny> elements, although both can be empty for link type rules. It is valid to create a rule that makes all link records of the same type between the same pair of entity records match each other, regardless of any other considerations.
All entity type rules must contain at least one condition. Many link type rules contain conditions too. Each condition defines a comparison that takes place between values in different records, and specifies when those values are considered to match. Conditions can be refined by using operators, values, and normalizations.
To specify the conditions of a match rule, you use the <condition> element that can be a child of the <matchAll> and <matchAny> elements. Each <condition> element has a mandatory propertyTypeId attribute, which is the identifier for the property type to which the condition applies, as defined in the i2 Analyze schema.
For example:
<condition propertyTypeId="VEH2">
...
</condition>
All conditions contain an <operator> element, most of them a contain <value> element, and many contain <normalizations>.
operator
The <operator> element defines the type of comparison between the property values in different records, or between the property value and a static value specified within the rule. The possible operators are:
Operator | Description |
---|---|
EXACT_MATCH | The values that are compared must match each other exactly. |
EXACT_MATCH_START | A specified number of characters at the start of string values must match each other exactly. |
EXACT_MATCH_END | A specified number of characters at the end of string values must match each other exactly. |
EQUAL_TO | The property values must match each other, and the specified <value>. |
For example:
<condition propertyTypeId="VEH2">
<operator>EXACT_MATCH</operator>
...
</condition>
value
The contents of the <value> element affect the behavior of the <operator> of the condition. Different operators require different value types.
If the operator is EXACT_MATCH_START or EXACT_MATCH_END, the value is an integer that specifies the number of characters to compare at the start or end of the property value:
<operator>EXACT_MATCH_START</operator> <value xsi:type="xsd:int">3</value>
If the operator is EQUAL_TO, the value is a string to compare with the property value:
<operator>EQUAL_TO</operator> <value xsi:type="xsd:string">red</value>
If the operator is EXACT_MATCH, it is not valid to specify a <value> element.
<operator>EXACT_MATCH</operator>
normalizations
The <normalizations> element contains child <normalization> elements that define how property values are compared with each other (and sometimes with the contents of the <value> element). The possible values for the <normalization> element are:
Normalization | Description |
---|---|
IGNORE_CASE | Ignores case during the comparison ('a' matches 'A') |
IGNORE_DIACRITICS | Ignores diacritic marks on characters ('Ã' matches 'A') |
IGNORE_WHITESPACE_BETWEEN | Ignores whitespace between characters ('a a' matches 'aa') |
IGNORE_WHITESPACE_AROUND | Ignore whitespace around a string (' a ' matches 'a') |
IGNORE_NUMERIC | Ignore numeric characters ('a50' matches 'a') |
IGNORE_ALPHABETIC | Ignore alphabetic characters ('a50' matches '50') |
IGNORE_NONALPHANUMERIC | Ignore non-alphanumeric characters ('a-a' matches 'aa') |
SIMPLIFY_LIGATURES | Simplify ligatures ('æ' matches 'ae') |
For example, you might have the following normalizations for an EXACT_MATCH operator:
<condition ... >
<operator>EXACT_MATCH</operator>
<normalizations>
<normalization>IGNORE_CASE</normalization>
<normalization>IGNORE_NONALPHANUMERIC</normalization>
<normalization>IGNORE_WHITESPACE_BETWEEN</normalization>
</normalizations>
</condition>
In this example, the values "b m w xdrive" and "BMW x-drive" are considered a match.
The operators and normalizations that you can specify for a condition depend on the logical type of the property type to which the condition applies. The following table shows the operators and normalizations that you can use for each logical type:
Schema logical type | Operators | Normalization |
---|---|---|
SINGLE_LINE_STRING | All | All |
SELECTED_FROM | All | All |
SUGGESTED_FROM | All | All |
BOOLEAN | EXACT_MATCH | None |
INTEGER | EXACT_MATCH | None |
DECIMAL | EXACT_MATCH | None |
DOUBLE | EXACT_MATCH | None |
DATE_AND_TIME | EXACT_MATCH | None |
DATE | EXACT_MATCH | None |
TIME | EXACT_MATCH | None |
Property types that have the following logical types cannot be used in match rules:
GEOSPATIAL
MULTIPLE_LINE_STRING
The following XML is an example of a match rules file that contains a single entity match rule. The rule matches vehicle records that have the same values for the license plate property, and the same values for either the state or region properties.
For example, two vehicle records with the license plates "1233 DC 33" and "1233DC33" from the regions "Ile-de-France and "Île De France" are identified as a match for the following rule:
<tns:matchRules
xmlns:xsd="http://www.w3.org/2001/XMLSchema"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation=
"http://www.i2group.com/Schemas/2019-05-14/MatchRules MatchRules.xsd"
version="2"
xmlns:tns="http://www.i2group.com/Schemas/2019-05-14/MatchRules">
<matchRule id="4a2b9baa-e3c4-4840-a9fd-d204711af50e"
itemTypeId="ET3"
displayName="Match vehicles"
description="Match vehicles with the same license plate,
when either the registered state or
region are the same."
active="true">
<matchAll>
<condition propertyTypeId="VEH2">
<operator>EXACT_MATCH</operator>
<normalizations>
<normalization>IGNORE_WHITESPACE_BETWEEN</normalization>
</normalizations>
</condition>
</matchAll>
<matchAny>
<condition propertyTypeId="VEH16">
<operator>EXACT_MATCH</operator>
<normalizations>
<normalization>IGNORE_CASE</normalization>
<normalization>IGNORE_DIACRITICS</normalization>
<normalization>IGNORE_WHITESPACE_BETWEEN</normalization>
<normalization>IGNORE_NONALPHANUMERIC</normalization>
</normalizations>
</condition>
<condition propertyTypeId="VEH15">
<operator>EXACT_MATCH</operator>
<normalizations>
<normalization>IGNORE_CASE</normalization>
<normalization>IGNORE_DIACRITICS</normalization>
<normalization>IGNORE_WHITESPACE_BETWEEN</normalization>
<normalization>IGNORE_NONALPHANUMERIC</normalization>
</normalizations>
</condition>
</matchAny>
</matchRule>
</tns:matchRules>