Skip to main content
Follow this guide to make your first API call, explore streaming, and try async processing.

Prerequisites

  • An API key from docstrange.nanonets.com (sign in, then find the key in the top-right menu)
  • A document to extract (PDF, image, Word, Excel, or PowerPoint)
1

Make your first extraction

Send a document to the synchronous endpoint and get back structured Markdown:
curl -X POST "https://extraction-api.nanonets.com/api/v1/extract/sync" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -F "file=@document.pdf" \
  -F "output_format=markdown"
You should receive a JSON response with "success": true and your extracted content in result.markdown.content.
2

Try multiple output formats

Request Markdown and JSON in a single call:
curl -X POST "https://extraction-api.nanonets.com/api/v1/extract/sync" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -F "file=@invoice.pdf" \
  -F "output_format=markdown,json"
The response includes both result.markdown and result.json.
3

Stream results in real-time

Use the streaming endpoint for real-time content delivery via Server-Sent Events:
Python
import requests
import json

with open("document.pdf", "rb") as f:
    response = requests.post(
        "https://extraction-api.nanonets.com/api/v1/extract/stream",
        headers={"Authorization": "Bearer YOUR_API_KEY"},
        files={"file": f},
        data={"output_format": "markdown", "enable_streaming": "true"},
        stream=True
    )

for line in response.iter_lines():
    if line:
        line = line.decode("utf-8")
        if line.startswith("data: "):
            event = json.loads(line[6:])
            if event["type"] == "content":
                print(event["data"], end="", flush=True)
            elif event["type"] == "done":
                print(f"\n\nCompleted in {event['processing_time']:.2f}s")
4

Process large documents asynchronously

For documents over 5 pages, use async processing:
curl -X POST "https://extraction-api.nanonets.com/api/v1/extract/async" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -F "file=@large-report.pdf" \
  -F "output_format=markdown"
Then poll for results using the record_id from the response:
curl -X GET "https://extraction-api.nanonets.com/api/v1/extract/results/12345" \
  -H "Authorization: Bearer YOUR_API_KEY"
The result returns "status": "processing" until the job completes, then switches to "status": "completed" with the full content.

What’s Next

Output Formats

Learn about all output options: Markdown, HTML, JSON schemas, CSV, and more.

Code Examples

Complete examples for every endpoint, including batch processing and React integration.

Python SDK

Type-safe Python client with async support and streaming.

TypeScript SDK

Type-safe TypeScript client for Node.js and browser.