Skip to main content
The API supports four output formats. You can request one or multiple formats in a single API call by comma-separating them (e.g., output_format=markdown,json).

Markdown

Clean, formatted Markdown text preserving document structure including headings, tables, lists, and emphasis.
-F "output_format=markdown"
Markdown Options (markdown_options):
OptionDescription
financial-docsOptimized for financial documents with enhanced table and number formatting
-F "output_format=markdown" -F "markdown_options=financial-docs"

HTML

Full HTML representation of the document with semantic tags and structure.
-F "output_format=html"

JSON

Structured JSON extraction with multiple modes:
Modejson_options valueDescription
Flat key-value(omit json_options)Auto-detected field extraction (default)
Specified fields["field1", "field2"]Extract specific fields you define
Custom schema{"type": "object", "properties": {...}}Define exact JSON output structure
Hierarchy outputhierarchy_outputTree-structured nested data from document
Table of contentstable-of-contentsExtract document heading structure
Use specified fields when you know exactly what data you need. Use custom schema when you need precise control over the output structure, types, and nesting.

Example: Specified Fields

curl -X POST "https://extraction-api.nanonets.com/api/v1/extract/sync" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -F "file=@invoice.pdf" \
  -F "output_format=json" \
  -F 'json_options=["invoice_number", "date", "total_amount", "vendor"]'

Example: Custom Schema

curl -X POST "https://extraction-api.nanonets.com/api/v1/extract/sync" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -F "file=@invoice.pdf" \
  -F "output_format=json" \
  -F 'json_options={"type": "object", "properties": {"invoice_number": {"type": "string"}, "total_amount": {"type": "number"}}}'

CSV

Tabular data extraction for documents containing tables.
-F "output_format=csv"
CSV Options (csv_options):
OptionDescription
tableExtract structured table data from the document
-F "output_format=csv" -F "csv_options=table"

Custom Instructions

Guide the extraction with natural language instructions using the custom_instructions parameter:
-F "custom_instructions=Format all dates as YYYY-MM-DD. Extract amounts without currency symbols."
The prompt_mode parameter controls how instructions are applied:
ModeDescription
appendAdd your instructions to the base prompt (default)
replaceUse only your custom instructions

Metadata Options

Include additional metadata like bounding boxes or confidence scores:
-F "include_metadata=bounding_boxes,confidence_score"
ValueDescription
bounding_boxesBlock-level bounding boxes per paragraph/region from layout detection
bounding_boxes_wordWord-level bounding boxes per word using advanced OCR
confidence_scoreOverall confidence (0-100) for all formats, plus granular per-field/per-cell scores for JSON and CSV
See the Confidence Scoring page for details on how scores are calculated.

Language Support

The API supports multilingual extraction across 29+ languages. The model automatically detects the language — no configuration required.

Supported Scripts

ScriptLanguages
LatinEnglish, French, Spanish, Portuguese, German, Italian, Dutch, Polish, Czech, Romanian
CyrillicRussian, Ukrainian
Chinese CharactersSimplified Chinese, Traditional Chinese
JapaneseKanji, Hiragana, Katakana
Korean HangulKorean
Arabic ScriptArabic, Persian, Urdu
DevanagariHindi, Bengali, Sanskrit

Language Tiers

TierLanguagesPerformance
Tier 1Chinese (Simplified & Traditional), English, Japanese, KoreanExceptional
Tier 2Spanish, French, German, Italian, Portuguese, Russian, Arabic, Hindi, Thai, VietnameseStrong
Tier 3Indonesian, Malaysian, Turkish, Polish, Dutch, Czech, Romanian, Ukrainian, Greek, Hebrew, SwahiliGood