Running a bulk import

Bulk import is a useful way to manage large volumes of data, and can only be run on a SQL Server database.

To mark an import specification as a bulk import, turn on the Bulk Import checkbox on Step 1 of the Import Wizard. The rest of the import specification is the same as for a standard import, with some minor exceptions that are described below. For general information on import specifications, see the iBase help for details.

Import specifications that are not suitable for bulk import

If the current import specification is not suitable for bulk importing, a message is displayed when you select the Bulk Import checkbox to warn you that you need to make some changes in your specification first. This might be for the following reasons:

  • Picture or document fields are assigned in the Assign Fields page of the Import Wizard. You must reverse the assignment of these fields before you can use the specification for a bulk import.

  • The Confirm action option is selected in the Check for Existing Records page of the Import Wizard. You must turn off this option before you can use the specification for a bulk import.

Using all available identifiers

When you are setting up a bulk import, you should try to ensure that the specification imports as much identifying detail as possible for each record. If the data you import is too vague, it might result in large numbers of matches with records that are already in your database.

In a bulk import, you cannot choose what action to perform for each of these possible matches. This could result in a lot of records being incorrectly updated. In the case of link imports, it might result in a lot of records being incorrectly added.

For this reason, use as many as possible of the available identifier fields when setting up the specification. This minimizes the possibility of records being matched that are not the same, as in the following examples:

  • Using only the Last Name field to match existing Person entities in the databases ("Smith") would result in many matches and a lot of records being updated.

  • Using the Last Name field and the First Name field (for example, "Smith, John") would result in fewer matches, but still a number of records for several different John Smiths might all be updated.

  • Using the Last Name field, the First Name field, and the Date of Birth field would result in matches where there already happened to be a John Smith with the same date of birth in the database. This is less likely, but still possible in a large database.

  • Using the Last Name, First Name, Date of Birth, Place of Birth, and Social Security Number field for each record would almost certainly guarantee that any record that matched one already in the database relates to the same real-world person and therefore should be updated.

If you have not specified all of the available identifier fields in a specification for bulk import, then a message is displayed:

WARNING: This specification does not use all available discriminators as identifiers for record matching. Usually all discriminators should be used to ensure that the correct target data is updated.

An import that would result in a high proportion of records being updated might indicate that there is insufficient identifying detail for each record. This concern will be reported as part of the safety check.

Running the bulk import specification

On the last page of the Import Wizard, click Run. If you are importing into a case-enabled database, at this point you will need to specify the case into which the records will be imported.

The import will run and all errors will be reported as they are encountered. Unlike a normal import, the process is not interactive.

Note: A bulk import will stop if the number of errors reaches a threshold that is set to 10,000 errors by default. The following error types count against this threshold:

  • Value not defined in list for field (for 'selected from code list' fields)

  • Mandatory field value not supplied

  • Text field value too long

  • Invalid direction

  • Invalid strength

  • Invalid date

  • Invalid time

  • Invalid numbers

  • Incorrect number of columns

Verifying a bulk import specification

On the last page of the Import Wizard, click Verify. No data will be imported, but any errors that would result from importing the data are listed. This allows you to check the import source data and correct any errors in the source file or the import specification before importing the data.

Performing a safety check

As a safety check, the source file is analyzed to determine the number of records currently in the database that would be updated if the data is imported. Errors are raised based on the ratio of the number of records to be updated to the number of records to be created.

Safety checking is performed automatically when verifying the import specification, unless the Do not perform safety checks before importing checkbox is selected.