Smart matching
Search for entities on the charting surface that might represent the same real-world object based on their entity types and criteria such as attribute instances and database property values.
Smart matching uses a set of predefined rules for deciding whether two entities might represent the same real-world object. It examines entities and their associated links, and compares their property values to produce an overall score. A score of one indicates a weak match. A score of nine indicates a strong match. Smart matching reports the search results as sets of potential matches, which are known as matched sets.
You can choose the strength of match that you want smart matching to report by selecting a threshold. Depending on the threshold that you choose, you might get fewer but stronger matches, or more but weaker matches. Typically you start to search your chart with a threshold of 9 to find the strongest matches, and then repeat each search with lower and lower thresholds.
By changing the threshold, you can split sets of matched entities into more accurate matched sets. For example, you might have five entities on your chart, two assigned with a Gender attribute set to Male and three assigned with a Gender attribute set to Female. All entities have the same label but in slightly different letter case. Selecting a threshold of 1 and 6 in different searches matches the same entities in both searches but produces different matched sets, as shown in the table.
Entities in chart | Threshold | Matched sets |
---|---|---|
Sam Steele (with Gender attribute: Male) SAM STEELE (with Gender attribute: Female) sam steele (with Gender attribute: Female) Sam STEELE (with Gender attribute: Male) Sam steele (with Gender attribute: Female) |
1 | Sam Steele (5 Entities)
|
Sam Steele (with Gender attribute: Male) SAM STEELE (with Gender attribute: Female) sam steele (with Gender attribute: Female) Sam STEELE (with Gender attribute: Male) Sam steele (with Gender attribute: Female) |
6 | SAM STEELE (3 Entities)
Sam Steele (2 Entities)
|
Rules for smart matching
Smart matching first determines whether entities are eligible to be matched by checking they meet the following criteria:
- Comparisons are made between entities that share semantic type behavior. For example, a comparison can be made between the Organization and Law Enforcement Agency semantic types because Law Enforcement Agency is a specialization of the Organization semantic type. A comparison cannot be made between Organization and Person semantic types.
- Comparisons are made between entities if they have some attributes that have the same property semantic type, or data record properties that have the same property semantic type.
Where eligible pairs of entities are identified, their properties values are compared and scored according to how closely they match. The scoring takes account of:
- Typographical errors; for example, Michael and Micheal.
- Common synonyms; for example, Richard and Dick.
- Phonetics; for example, Michael and Mikel.
- Titles; for example, Mr, Mrs, and Dr.
- Suffixes; for example, OBE and Phd.
- Numeric properties; for example, phone numbers match from the right to account for area codes that might be formatted differently.
Some entity types and property types have semantic behavior. For these types, higher scores are given for identifying property values such as a car license plate. Less significant facts such as car color contribute less strongly. Lower scores are also given where one piece of information contradicts another, for example Mr D KENT contradicts Mrs D KENT.
Matches with database clashes
When you are using Analyst's Notebook with another database application such as iBase, entities that are extracted from the database have an associated database identity.
Matches involving database clashes can occur when two matching entities originate from a single database that does not support multiple database keys. These entities are included in a matched set and the set name is appended with the word Clash; for example, <set name>: Clash (4 entities). These clashing entities can be merged, however, vital information might be lost. The information that is retained after clashing entities are merged depends on whether the entities have unique database keys and whether the database supports multiple database keys.