Checking for duplicates

You can search for records that contain duplicate values. The results of this type of search are groups of records that have duplicate values in the fields that are used in the search.

About this task

When you use the Duplicate Records Checker:
  • A duplicate entity record is one that has specific values in common with other records for the entity type. You choose which fields are checked.
  • A duplicate link record is one where the link end entities, the direction, and the strength of the link are identical. If you want, you can also specify one or more fields to be checked. If required, you can turn off Same Link Ends in which case only the field values need to match (the link end entities, direction, and strength are ignored).
For example, if you search for vehicles by color and model, you obtain groups of records that are divided by color and model. Select the color and model combination you are interested in and then browse the records in the group. Color and model combinations that are unique to a single record are not shown in the results.
There are some similarities between Duplicate Records Checker and Matching Records because you use both of these features to discover which records in the database share common values. However, you use:
  • Matching Records - to work with a single record.
  • Duplicate Records Checker - to work with a set, a query, or even the whole database.
In both, you make the comparison against all the other records in the database. For example, you compare the values in the single record or in the set against the whole database.

Procedure

  1. Select Analysis > Duplicate Records Checker.
    If the Duplicate Record Checker option is not shown, you need to activate the plug-in (Tools > Plug-in Manager).
  2. In the Duplicate Records Checker, select the entity type or link type.
  3. In the Source area, specify the records that you want to check the entity or link type against:
    • All records- check against any value in the database.
    • Query- check against the records included in the results for a specified query.
    • Set- check against the records included in a specified set.
  4. In the Fields area, turn on the fields that you want to use in the comparison.
    You must select at least one field. Initially, the discriminator fields are selected but you can turn them off (and your selection will be remembered for the next time you use the Duplicate Records Checker).
  5. If you are working on links, you can:
    • Turn on the Same Link Ends to search for links where the link end entities, the direction, and the strength of the link are identical.
    • Turn off the Same Link Ends to search for links where only the field values in the link match (the link end entities, direction, and strength are ignored).
  6. Click Find.
    The results of the check are shown on the right. Duplicate groups are listed and records with duplicate values are listed. The number of groups depends on the number of the combinations of duplicate values found. You can sort both the duplicate groups and the records by clicking the column headings.
  7. Review the records within each duplicate group by selecting it in the top list.