Introduction

Overview

The Nanonets Document Extraction API uses advanced AI models to extract structured content from documents. Convert PDFs, images, Word documents, Excel spreadsheets, and more into clean Markdown, HTML, JSON, or CSV formats.

Key Features

Multiple Output Formats

Extract content as Markdown, HTML, JSON, or CSV. Request multiple formats in a single API call.

Real-Time Streaming

Stream extraction results via SSE for real-time UI updates as content is generated.

Batch Processing

Process up to 50 documents in a single batch request with shared extraction options.

Custom Instructions

Guide the extraction with custom instructions for formatting, field focus, and output structure.

Quick Start

Get your API key

Sign in to docstrange.nanonets.com and grab your API key from the top-right menu.All requests use Bearer token authentication:

Authorization: Bearer YOUR_API_KEY

Extract a document

curl -X POST "https://extraction-api.nanonets.com/api/v1/extract/sync" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -F "file=@document.pdf" \
  -F "output_format=markdown"

For a complete walkthrough including streaming and async processing, see the Quickstart guide.

API Endpoints

Document Extraction

Endpoint	Method	Description
`/api/v1/extract/sync`	POST	Synchronous extraction - returns results immediately
`/api/v1/extract/async`	POST	Asynchronous extraction - returns job ID for polling
`/api/v1/extract/stream`	POST	Streaming extraction - real-time results via SSE
`/api/v1/extract/batch`	POST	Batch processing - process multiple files at once

Document Classification

Endpoint	Method	Description
`/api/v1/classify/sync`	POST	Synchronous document classification
`/api/v1/classify/batch`	POST	Batch classification - process multiple files at once
`/api/v1/classify/results/{record_id}`	GET	Get classification result by record ID

Results

Endpoint	Method	Description
`/api/v1/extract/results/{record_id}`	GET	Get extraction result by job ID
`/api/v1/extract/results`	GET	List all extraction results (paginated)

Input Methods

Provide your document using one of these methods:

Parameter	Description
`file`	Direct file upload (multipart/form-data)
`file_url`	URL to download the file from
`file_base64`	Base64-encoded file content

Supported file types: PDF, Word (.docx), Excel (.xlsx, .xls), PowerPoint (.pptx), Images (PNG, JPG, TIFF, WebP)

Benchmarks

Nanonets OCR ranks #1 on the IDP Leaderboard — an open benchmark comparing document AI models across OCR, table extraction, key information extraction, and visual QA.

#	Model	Overall	OlmOCR	OmniDoc	IDP
1	Nanonets OCR2+	81.8	82.2	89.5	73.8
2	Gemini-3-Pro	81.4	73.5	88.8	81.8
3	Claude Sonnet 4.6	80.8	74.4	86.9	81.2
4	Claude Opus 4.6	80.3	73.9	85.9	81.1
5	Gemini-3-Flash	79.9	69.2	90.1	80.5

View Full Leaderboard

See all models compared across OCR, table extraction, key information extraction, and visual QA benchmarks.

Next Steps

Quickstart

Step-by-step guide to your first extraction, streaming, and async processing.

Output Formats

Markdown, HTML, JSON, CSV options and language support details.

Authentication

API keys, rate limits, and auth error handling.

API Reference

Interactive API docs with request/response examples.

Getting Started

Benchmarks

Resources

Deployment

Overview

Key Features

Multiple Output Formats

Real-Time Streaming

Batch Processing

Custom Instructions

Quick Start

API Endpoints

Document Extraction

Document Classification

Results

Input Methods

Benchmarks

View Full Leaderboard

Next Steps

Quickstart

Output Formats

Authentication

API Reference

Getting Started

Benchmarks

Resources

Deployment

Documentation Index

​Overview

​Key Features

Multiple Output Formats

Real-Time Streaming

Batch Processing

Custom Instructions

​Quick Start

​API Endpoints

​Document Extraction

​Document Classification

​Results

​Input Methods

​Benchmarks

View Full Leaderboard

​Next Steps

Quickstart

Output Formats

Authentication

API Reference

Overview

Key Features

Quick Start

API Endpoints

Document Extraction

Document Classification

Results

Input Methods

Benchmarks

Next Steps