TextChart Entity Types Reference

TextChart entity types are defined in the default TokenDefs.xml files under rsk-api-core/LxBase/conf/. This reference reflects those default settings.

Output Entity Types

These entities are defined in the entities section of TokenDefs.xml and are emitted by default:

  • PERSON - Named person entities.

  • ORG - Named organizations.

  • PLACE - Geographic places.

  • FACILITY - Facilities (buildings, installations, infrastructure).

  • ADDRESS - Postal addresses.

  • GEOCOORDINATE - Coordinates for geospatial mapping.

  • CONVEYANCE - Vehicles and other means of conveyance.

  • CRIME - Criminal offenses.

  • TIMESTAMP - Units of time greater than 24 hours (dates, date-like references).

  • TIMESPAN - Ranges of time or date ranges.

  • PRODUCT - Commercial product names.

  • IDNUM - Identification numbers (serial, SIM, ICCID, IBAN, etc.).

  • EVENT - Named or generic events.

  • PUBLICATION - Publications, film titles, awards.

  • WEAPON - Weapons.

  • DRUG - Drugs (legal and illicit).

  • EMAIL - Email addresses and social media account names.

  • SOCIAL - Social media usernames (e.g., handles).

  • PHONE - Phone numbers.

  • URL - Web addresses.

  • MONEY - Numeric amounts of money.

  • IMPLEMENT - Implements and instruments.

  • PUNITIVE_MEASURE - Punitive measures taken against an entity.

  • KEYWORD - Specific keywords or coded speech.

  • TRAUMA - Injuries or trauma descriptions.

  • POI - Person of interest (used in short or informal text sources).

Non-Output Entity Types

These entities are defined in the NoOutputEntities section of TokenDefs.xml and are not output by default. They may still be used for internal matching or can be enabled via rule customization:

  • AWARD

  • BIOMETRIC

  • CONTRACT_TYPE

  • CITATION

  • DNA

  • GENERIC

  • ALERT_TYPE

  • SalientPhrase

  • CONTROL

  • DISEASE

  • FILE_10.1.2Current

  • HASHTAG

  • IDEOLOGY

  • CHEMICAL

  • NON_SALIENT_WEB_CONTENT

  • MEASURE

  • MEDICAL_PROCEDURE

  • MISC

  • SCORE

  • PERCENT

  • NATIONALITY

  • TRANSIT

  • PROGRAM

  • PROFESSION

  • POLITICAL_AFFILIATION

  • GENE

  • RATING

  • QUOTE

  • USER_AGENT

  • FINANCIAL_INDEX

  • TICKER_SYMBOL

  • INFRASTRUCTURE

  • ANATOMICAL_TERM

  • FUNDS

  • CRYPTO

  • CLASSIFICATION_LEVEL

Attributes and Subtypes

Attributes and subtypes are defined per-entity in TokenDefs.xml. The list is extensive and varies by entity type. For the authoritative set of attributes and descriptions, consult:

  • rsk-api-core/LxBase/conf/TokenDefs.xml

  • rsk-api-core/LxBase/conf/TokenDefs_core.xml

Customizing Entity Extraction

You can customize which entity types are extracted and how they are identified by:

  1. Modifying the LxBase: Edit linguistic rules to add or remove entity type extractions

  2. Creating Custom Rules: Define custom patterns specific to your domain

See sdk-lxbase.md for instructions on customizing entity extraction.

Entity Type Relationships

Relationship extraction depends on rule configuration and is not hard-coded to specific entity pairs. If you need specific relationships (e.g., PERSON → ORG), add or adjust rules in the LxBase.