Skip to main content

Overview

The Nanonets Document Extraction API uses advanced AI models to extract structured content from documents. Convert PDFs, images, Word documents, Excel spreadsheets, and more into clean Markdown, HTML, JSON, or CSV formats.

Key Features

Multiple Output Formats

Extract content as Markdown, HTML, JSON, or CSV. Request multiple formats in a single API call.

Real-Time Streaming

Stream extraction results via SSE for real-time UI updates as content is generated.

Batch Processing

Process up to 50 documents in a single batch request with shared extraction options.

Custom Instructions

Guide the extraction with custom instructions for formatting, field focus, and output structure.

Quick Start

1

Get your API key

Sign in to docstrange.nanonets.com and grab your API key from the top-right menu.All requests use Bearer token authentication:
Authorization: Bearer YOUR_API_KEY
2

Extract a document

curl -X POST "https://extraction-api.nanonets.com/api/v1/extract/sync" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -F "file=@document.pdf" \
  -F "output_format=markdown"
For a complete walkthrough including streaming and async processing, see the Quickstart guide.

API Endpoints

Document Extraction

EndpointMethodDescription
/api/v1/extract/syncPOSTSynchronous extraction - returns results immediately
/api/v1/extract/asyncPOSTAsynchronous extraction - returns job ID for polling
/api/v1/extract/streamPOSTStreaming extraction - real-time results via SSE
/api/v1/extract/batchPOSTBatch processing - process multiple files at once

Document Classification

EndpointMethodDescription
/api/v1/classify/syncPOSTSynchronous document classification
/api/v1/classify/batchPOSTBatch classification - process multiple files at once
/api/v1/classify/results/{record_id}GETGet classification result by record ID

Results

EndpointMethodDescription
/api/v1/extract/results/{record_id}GETGet extraction result by job ID
/api/v1/extract/resultsGETList all extraction results (paginated)

Input Methods

Provide your document using one of these methods:
ParameterDescription
fileDirect file upload (multipart/form-data)
file_urlURL to download the file from
file_base64Base64-encoded file content
Supported file types: PDF, Word (.docx), Excel (.xlsx, .xls), PowerPoint (.pptx), Images (PNG, JPG, TIFF, WebP)

Benchmarks

Nanonets OCR ranks #1 on the IDP Leaderboard — an open benchmark comparing document AI models across OCR, table extraction, key information extraction, and visual QA.
#ModelOverallOlmOCROmniDocIDP
1Nanonets OCR2+81.882.289.573.8
2Gemini-3-Pro81.473.588.881.8
3Claude Sonnet 4.680.874.486.981.2
4Claude Opus 4.680.373.985.981.1
5Gemini-3-Flash79.969.290.180.5

View Full Leaderboard

See all models compared across OCR, table extraction, key information extraction, and visual QA benchmarks.

Next Steps

Quickstart

Step-by-step guide to your first extraction, streaming, and async processing.

Output Formats

Markdown, HTML, JSON, CSV options and language support details.

Authentication

API keys, rate limits, and auth error handling.

API Reference

Interactive API docs with request/response examples.