TextChart SDK
Overview
The TextChart SDK (Software Development Kit) is a comprehensive Java-based toolkit for integrating advanced text analytics and entity extraction capabilities into your applications. The SDK allows you to programmatically process documents, extract entities with their types and relationships, and integrate TextChart's powerful analysis engine into custom applications.
The TextChart SDK builds upon the Rosoka Text Analytics Platform and provides a well-documented API for:
Document Processing: Process files and strings containing unstructured or semi-structured text
Entity Extraction: Automatically identify and classify entities (people, organizations, locations, etc.)
Relationship Discovery: Find connections and relationships between extracted entities
Multi-Language Support: Process text in over 100 languages
Customizable Rules: Extend and customize entity extraction using rule-based configurations
Multiple Output Formats: Get results in XML, JSON, or direct Java object access
Key Components
The TextChart SDK architecture consists of several key components:
Core Engine
The Core Engine is the central processing component that coordinates all text analysis operations. It manages the initialization of the system, license validation, and orchestrates the various analysis engines.
Extraction Engine
The extraction engine processes documents to identify and extract entities from text. It applies linguistic rules defined in the LxBase (lexicon) to identify entities and their types.
LxBase (Lexicon Database)
The LxBase is a collection of linguistic rules and patterns that define how entities are identified. It includes:
Entity type definitions
Extraction rules (exact match, regex patterns, etc.)
Rule precedence and conflict resolution
Support for custom rules and extensions
Output Objects
Results are returned as JAXB-mapped Java objects or XML structures containing:
Entity lists with types and attributes
Relationship information
Confidence scores and metadata
Document metadata
Supported File Formats
The TextChart SDK can process documents in over 30 file formats including:
Office Documents: .docx, .xlsx, .pptx, .doc, .xls, .ppt
PDF: .pdf
Web: .html, .htm, .xml
Text: .txt, .csv, .tsv
Archives: .zip, .tar, .gz
Other: .rtf, .odt, and more
File parsing is handled through Apache Tika integration for maximum compatibility.
System Requirements
Java: Java 21 (OpenJDK or Oracle JDK recommended)
Memory: 4GB RAM minimum (8GB recommended for production)
Disk Space: 500MB for SDK installation plus additional space for LxBase and configuration files
Operating Systems: Linux, macOS, Windows
Build Tool (for development): Maven 3.6 or higher
Quick Start Summary
Install the TextChart SDK
Set up ROSOKA_HOME environment variable
Configure license keys
Initialize the Rosoka instance in your code
Process documents or strings
Retrieve and analyze results
See SDK Installation and Configuration for detailed setup instructions, SDK Usage Guide for development examples, and RosokaProperties Configuration Reference for all available configuration options.