Output Formats

The API supports four output formats. You can request one or multiple formats in a single API call by comma-separating them (e.g., output_format=markdown,json).

Markdown

Clean, formatted Markdown text preserving document structure including headings, tables, lists, and emphasis.

-F "output_format=markdown"

Markdown Options (markdown_options):

Option	Description
`financial-docs`	Optimized for financial documents with enhanced table and number formatting

-F "output_format=markdown" -F "markdown_options=financial-docs"

HTML

Full HTML representation of the document with semantic tags and structure.

-F "output_format=html"

JSON

Structured JSON extraction with multiple modes:

Mode	`json_options` value	Description
Flat key-value	(omit json_options)	Auto-detected field extraction (default)
Specified fields	`["field1", "field2"]`	Extract specific fields you define
Custom schema	`{"type": "object", "properties": {...}}`	Define exact JSON output structure
Hierarchy output	`hierarchy_output`	Tree-structured nested data from document
Table of contents	`table-of-contents`	Extract document heading structure

Use specified fields when you know exactly what data you need. Use custom schema when you need precise control over the output structure, types, and nesting.

Example: Specified Fields

curl -X POST "https://extraction-api.nanonets.com/api/v1/extract/sync" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -F "file=@invoice.pdf" \
  -F "output_format=json" \
  -F 'json_options=["invoice_number", "date", "total_amount", "vendor"]'

Example: Custom Schema

curl -X POST "https://extraction-api.nanonets.com/api/v1/extract/sync" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -F "file=@invoice.pdf" \
  -F "output_format=json" \
  -F 'json_options={"type": "object", "properties": {"invoice_number": {"type": "string"}, "total_amount": {"type": "number"}}}'

CSV

Tabular data extraction for documents containing tables.

-F "output_format=csv"

CSV Options (csv_options):

Option	Description
`table`	Extract structured table data from the document

-F "output_format=csv" -F "csv_options=table"

Custom Instructions

Guide the extraction with natural language instructions using the custom_instructions parameter:

-F "custom_instructions=Format all dates as YYYY-MM-DD. Extract amounts without currency symbols."

The prompt_mode parameter controls how instructions are applied:

Mode	Description
`append`	Add your instructions to the base prompt (default)
`replace`	Use only your custom instructions

Metadata Options

Include additional metadata like bounding boxes or confidence scores:

-F "include_metadata=bounding_boxes,confidence_score"

Value	Description
`bounding_boxes`	Block-level bounding boxes per paragraph/region from layout detection
`bounding_boxes_word`	Word-level bounding boxes per word using advanced OCR
`confidence_score`	Overall confidence (0-100) for all formats, plus granular per-field/per-cell scores for JSON and CSV

See the Confidence Scoring page for details on how scores are calculated.

Language Support

The API supports multilingual extraction across 29+ languages. The model automatically detects the language — no configuration required.

Supported Scripts

Script	Languages
Latin	English, French, Spanish, Portuguese, German, Italian, Dutch, Polish, Czech, Romanian
Cyrillic	Russian, Ukrainian
Chinese Characters	Simplified Chinese, Traditional Chinese
Japanese	Kanji, Hiragana, Katakana
Korean Hangul	Korean
Arabic Script	Arabic, Persian, Urdu
Devanagari	Hindi, Bengali, Sanskrit

Language Tiers

Tier	Languages	Performance
Tier 1	Chinese (Simplified & Traditional), English, Japanese, Korean	Exceptional
Tier 2	Spanish, French, German, Italian, Portuguese, Russian, Arabic, Hindi, Thai, Vietnamese	Strong
Tier 3	Indonesian, Malaysian, Turkish, Polish, Dutch, Czech, Romanian, Ukrainian, Greek, Hebrew, Swahili	Good

Getting Started

Benchmarks

Resources

Deployment

Markdown

HTML

JSON

Example: Specified Fields

Example: Custom Schema

CSV

Custom Instructions

Metadata Options

Language Support

Supported Scripts

Language Tiers

Getting Started

Benchmarks

Resources

Deployment

Documentation Index

​Markdown

​HTML

​JSON

​Example: Specified Fields

​Example: Custom Schema

​CSV

​Custom Instructions

​Metadata Options

​Language Support

​Supported Scripts

​Language Tiers

Markdown

HTML

JSON

Example: Specified Fields

Example: Custom Schema

CSV

Custom Instructions

Metadata Options

Language Support

Supported Scripts

Language Tiers