Building the Word Search index
The Word Search Index is a list of the words that can be searched when using Word Search in iBase. It contains the words extracted from specific (text-type) fields for selected entity and link types.
About this task
- Selecting the fields which can be indexed.
- You can specifically exclude or include the values of specific fields.
- Excluding specific words from the index, such as words that appear in almost all records.
-
You may wish to exclude words because:
- The words occur in all or nearly all records and contribute little to the searchability of the database, while adding greatly to the size of the search index.
- The words relate to administrative or other uses that you do not wish to be visible to all users of text search.
The excluded word list must be an unformatted text file containing one word per line with no blank lines. The only characters not taken to be part of a word are the space character and the double quotation mark ( " ). You cannot exclude words that are longer than 20 characters.
Double quotation marks are optional, but if they appear they must do so in pairs, one each at the start and end of a line. Paired double quotation marks are ignored and only the words between them are imported.
The words can be in any order, use uppercase, lowercase or any mixture of case, and may be duplicated within the file. Duplicates are removed, and the list is sorted when importing; the case remains unchanged.
If you see the message ERRORS IN FILE when importing, the most likely causes are:- A space character anywhere in the file, possibly before or after one of the words.
- A completely blank line.
- A mismatched or badly placed double quotation mark.
Note: iBase is supplied with an example file of common words. The file is formatted for importing into the Word Search system. Inspect the file to see if it is suitable for your database: C:\Documents and Settings\All Users\Application Data\i2\i2 iBase 8\en-US\ConfigurationYou can import this file to provide a starting point, which you can then modify.
- Set a limit on the length of the words that you can search for, in order to create a smaller index that returns search results more quickly.
- Allows you to set the maximum length of entries in the index. The default setting of 10 is
usually satisfactory. Longer words are trimmed to this length when placed in the index. For example,
if the first 10 characters are indexed and the word shoplifting is indexed,
this list shows only the first 10 characters, shopliftin. A user can search
for and find this word with either of the terms shoplifting or
shopliftin.
If you set a limit of less than 10, a smaller index is created and search results are returned more quickly. However, the search results may contain unexpected results. For example, it will find words that are different but which have the same first few characters. You can check for false results by inspecting the found records.
- Deciding whether to index words that contain only numerical characters.
- By default, entirely numeric words are excluded from the index, but you can opt to include these
entries.
For example, the text 320 in the BMW 320 is not indexed if numerics are excluded. The text 320i in the entry BMW 320i is indexed because 320i is not a completely numeric value.
A more complex example concerns the hyphen or minus character ( - ), where the character’s position affects the interpretation. Numerals surrounding a hyphen character are considered non-numeric. For example, a banking reference for an account in the format 0012-3963 is treated as a single non-numeric word, not two numerics. Conversely, a number written with a leading minus character, in the style -3, is treated as a numeric (as is the positive equivalent, +3).