TextChart SDK

Overview

The TextChart SDK (Software Development Kit) is a comprehensive Java-based toolkit for integrating advanced text analytics and entity extraction capabilities into your applications. The SDK allows you to programmatically process documents, extract entities with their types and relationships, and integrate TextChart's powerful analysis engine into custom applications.

The TextChart SDK builds upon the Rosoka Text Analytics Platform and provides a well-documented API for:

  • Document Processing: Process files and strings containing unstructured or semi-structured text

  • Entity Extraction: Automatically identify and classify entities (people, organizations, locations, etc.)

  • Relationship Discovery: Find connections and relationships between extracted entities

  • Multi-Language Support: Process text in over 100 languages

  • Customizable Rules: Extend and customize entity extraction using rule-based configurations

  • Multiple Output Formats: Get results in XML, JSON, or direct Java object access

Key Components

The TextChart SDK architecture consists of several key components:

Core Engine

The Core Engine is the central processing component that coordinates all text analysis operations. It manages the initialization of the system, license validation, and orchestrates the various analysis engines.

Extraction Engine

The extraction engine processes documents to identify and extract entities from text. It applies linguistic rules defined in the LxBase (lexicon) to identify entities and their types.

LxBase (Lexicon Database)

The LxBase is a collection of linguistic rules and patterns that define how entities are identified. It includes:

  • Entity type definitions

  • Extraction rules (exact match, regex patterns, etc.)

  • Rule precedence and conflict resolution

  • Support for custom rules and extensions

Output Objects

Results are returned as JAXB-mapped Java objects or XML structures containing:

  • Entity lists with types and attributes

  • Relationship information

  • Confidence scores and metadata

  • Document metadata

Supported File Formats

The TextChart SDK can process documents in over 30 file formats including:

  • Office Documents: .docx, .xlsx, .pptx, .doc, .xls, .ppt

  • PDF: .pdf

  • Web: .html, .htm, .xml

  • Text: .txt, .csv, .tsv

  • Archives: .zip, .tar, .gz

  • Other: .rtf, .odt, and more

File parsing is handled through Apache Tika integration for maximum compatibility.

System Requirements

  • Java: Java 21 (OpenJDK or Oracle JDK recommended)

  • Memory: 4GB RAM minimum (8GB recommended for production)

  • Disk Space: 500MB for SDK installation plus additional space for LxBase and configuration files

  • Operating Systems: Linux, macOS, Windows

  • Build Tool (for development): Maven 3.6 or higher

Quick Start Summary

  1. Install the TextChart SDK

  2. Set up ROSOKA_HOME environment variable

  3. Configure license keys

  4. Initialize the Rosoka instance in your code

  5. Process documents or strings

  6. Retrieve and analyze results

See SDK Installation and Configuration for detailed setup instructions, SDK Usage Guide for development examples, and RosokaProperties Configuration Reference for all available configuration options.