Skip to main content
January 13

API Playground UI

New interactive web playground for testing document extraction:
  • Output format selection - Choose from Markdown, JSON, CSV/Excel, or HTML output
  • Schema Builder - Visual JSON schema editor with support for nested objects, arrays, and enums up to 10 levels deep
  • Field List mode - Quick extraction with simple field name arrays
  • Metadata options - Enable confidence scores and bounding boxes per field
December 15

Streaming Extraction Endpoint

New /v1/extract/stream endpoint for real-time extraction via Server-Sent Events (SSE):
  • Streaming mode - Content delivered in small chunks as it’s generated
  • Batch mode - Content sent all at once when extraction completes
December 8

v1 Extraction API

New /v1/extract endpoints with a cleaner, more consistent interface:
  • Sync extraction (/v1/extract/sync) - Process documents synchronously with immediate results
  • Async extraction (/v1/extract/async) - Queue documents for background processing
  • Batch extraction (/v1/extract/batch) - Process up to 50 files in a single request
  • Fetch result (/v1/extract/results/<record_id>) - Fetch the results for a single record_id
  • Results endpoints (/v1/extract/results) - List and retrieve extraction results with pagination
All endpoints support file upload, URL, and base64 input methods.
November 27

Multi-Page JSON Extraction with Confidence Scoring

Enhanced JSON extraction now processes multi-page documents and returns responses based on the best confidence score, improving accuracy for complex documents.

Bounding Box Extraction API

New /extract-with-bounding-boxes endpoint that returns extracted data with precise coordinate information for each field, enabling document annotation and validation workflows.

Response Dimensions

API responses now include dimension metadata (width/height) for processed documents, useful for coordinate calculations and rendering.
November 21

Streaming & Partial Results

  • New streaming extraction endpoint at /v1/extract/stream
  • Partial results API to retrieve in-progress extractions
  • Improved delimiter handling for chunked responses

Billing & Usage APIs

  • Credit usage reporting integration with Stripe
  • Subscription status tracking
  • Document processing limits per plan

IP Blocking & Rate Limiting

Enhanced security with IP-based access control middleware and improved rate limiting mechanisms.
November 14

On-Premise License APIs

New license management APIs for enterprise on-premise deployments, including activation and validation endpoints.

Repetition Detection & Retry

Improved extraction reliability with automatic retry using nucleus sampling when repetition patterns are detected in model outputs.

Excel & DOCX Processing

Fixed file processing for Excel spreadsheets and Word documents with improved error handling.
October 17

OpenAI-Compatible Chat Completions API

Full OpenAI-compatible /v1/chat/completions endpoint supporting:
  • PDF and document uploads directly in requests
  • All major file types (PDF, Excel, Word, images)
  • Drop-in replacement for OpenAI SDK integrations

Hierarchy Extraction API

New API for extracting document hierarchies and structure, including parent-child relationships and table of contents with linked IDs.
October 10

Custom Prompt Instructions

Support for custom prompt instructions in markdown format, allowing fine-tuned extraction behavior for specific use cases.

Expanded File Type Support

Extended support for additional file formats in chat completions:
  • PDF documents
  • Excel spreadsheets (.xlsx, .xls)
  • Word documents (.docx)
  • All major image formats

OpenAI SDK Upgrade

Updated to latest OpenAI SDK version for improved compatibility and performance.

Coming Soon

Batch Processing

Process multiple documents in a single API call with consolidated results.

Webhooks

Real-time notifications for async extraction completion events.