Building the Word Search index

The Word Search Index is a list of the words that can be searched when using Word Search in iBase. It contains the words extracted from specific (text-type) fields for selected entity and link types.

About this task

Configuration consists of:
Selecting the fields which can be indexed.
You can specifically exclude or include the values of specific fields.
Excluding specific words from the index, such as words that appear in almost all records.
You may wish to exclude words because:
  • The words occur in all or nearly all records and contribute little to the searchability of the database, while adding greatly to the size of the search index.
  • The words relate to administrative or other uses that you do not wish to be visible to all users of text search.

The excluded word list must be an unformatted text file containing one word per line with no blank lines. The only characters not taken to be part of a word are the space character and the double quotation mark ( " ). You cannot exclude words that are longer than 20 characters.

Double quotation marks are optional, but if they appear they must do so in pairs, one each at the start and end of a line. Paired double quotation marks are ignored and only the words between them are imported.

The words can be in any order, use uppercase, lowercase or any mixture of case, and may be duplicated within the file. Duplicates are removed, and the list is sorted when importing; the case remains unchanged.

If you see the message ERRORS IN FILE when importing, the most likely causes are:
  • A space character anywhere in the file, possibly before or after one of the words.
  • A completely blank line.
  • A mismatched or badly placed double quotation mark.
Note: iBase is supplied with an example file of common words. The file is formatted for importing into the Word Search system. Inspect the file to see if it is suitable for your database: C:\Documents and Settings\All Users\Application Data\i2\i2 iBase 8\en-US\Configuration

You can import this file to provide a starting point, which you can then modify.

Set a limit on the length of the words that you can search for, in order to create a smaller index that returns search results more quickly.
Allows you to set the maximum length of entries in the index. The default setting of 10 is usually satisfactory. Longer words are trimmed to this length when placed in the index. For example, if the first 10 characters are indexed and the word shoplifting is indexed, this list shows only the first 10 characters, shopliftin. A user can search for and find this word with either of the terms shoplifting or shopliftin.

If you set a limit of less than 10, a smaller index is created and search results are returned more quickly. However, the search results may contain unexpected results. For example, it will find words that are different but which have the same first few characters. You can check for false results by inspecting the found records.

Deciding whether to index words that contain only numerical characters.
By default, entirely numeric words are excluded from the index, but you can opt to include these entries.

For example, the text 320 in the BMW 320 is not indexed if numerics are excluded. The text 320i in the entry BMW 320i is indexed because 320i is not a completely numeric value.

A more complex example concerns the hyphen or minus character ( - ), where the character’s position affects the interpretation. Numerals surrounding a hyphen character are considered non-numeric. For example, a banking reference for an account in the format 0012-3963 is treated as a single non-numeric word, not two numerics. Conversely, a number written with a leading minus character, in the style -3, is treated as a numeric (as is the positive equivalent, +3).

If you need to change the configuration of the index, then you must first delete the current index by clicking Delete. Once you have built the index (by clicking Full Build), update it on a regular basis in order to include data from new records. You can use an incremental build for this (click Increment).
Note: iBase users can review how the index was built, in order to understand what can and cannot be searched for, but they cannot update it themselves.
Note: Words need not necessarily contain only letter characters; they can contain, or consist entirely of, numbers for example. There is however the option of excluding words consisting entirely of numbers. In addition, punctuation characters are not included in indexed words. This is because these characters are used to determine where words start and end. For example the space in 'first word' or the underscore in 'first_word' means the index would contain the words 'first' and 'word'.

Procedure

  1. Select Tools > Search > Word Search Indexing.
  2. In the Word Search Index Build dialog, click the Fields tab to display the Fields page.
  3. Click Delete to delete the current index.
  4. To configure the indexed fields, turn on or off the checkbox next to the entity type, link type, or field. If you include an entity or link type it will initially include all its fields, you can then turn off the checkbox next to the fields that you want to exclude.
  5. In the Excluded Words page of the Word Search Index Build dialog, you can exclude words from the index. You can then review which words you have excluded, either on screen or by exporting those words to a text file:
    OptionDescription
    Exclude Word Enter a word in the upper text box of the dialog, then click this button to transfer the word to the list in the lower list box, where it appears in alphabetic order.
    Remove Selected Select one or more words in the list box, then click this button to remove all selected words from the list.
    Import Words

    Click this button to import a text file of the words you have chosen to exclude. In the Import dialog:

    1. Enter a file name or find a file using the Browse button and Select Import File dialog.
    2. Inspect the preview in the Import dialog to check that the files contains the word you wish to exclude and check that the message ERRORS IN FILE does not appear.
      Note: If you see this message, edit the file so that it meets the required format as described next, save the file, and click Refresh to reread the file.
    3. Turn on the Merge with existing data checkbox if you want to keep the existing list.
    4. When you have identified a suitable file, click Import to read the list from this file.
    Export Words Click this button to create a text file of the words you have chosen to exclude. In the Export dialog, enter a file name or find an existing file using the Browse button and Specify Export File dialog. Once you have identified a file, click Export to write the list into this file.
    You cannot exclude words that are longer than 20 characters.

    When you have finished modifying the list of excluded words, click Apply to confirm your changes.

  6. In the advanced index options, select the Number of Characters to index and whether to Exclude Numerics.
  7. Click Full Build to generate a new index.
    Note: Once the build is completed, you can use the Fields tab to view the information that has been included or excluded from the index.