MCP Server

Overview

The Nanonets MCP server lets AI assistants (Claude, Cursor, VS Code Copilot, and other MCP-compatible clients) extract and explore documents directly through natural language. Instead of writing API calls, you describe what you need and the assistant handles the extraction.

15 Extraction Tools

Full-featured tools for sync, async, and streaming extraction across all output formats.

Document Exploration

Navigate large documents with TOC, page-level reading, search, and section extraction.

OAuth 2.1 Authentication

Secure Auth0-based login — no API key configuration needed for end users.

Workflow Recipes

Built-in skill recipes that teach AI assistants optimal extraction patterns.

Quick Start

Connect with Claude Desktop

Add this to your Claude Desktop configuration file (claude_desktop_config.json):

{
  "mcpServers": {
    "nanonets": {
      "url": "https://mcp.nanonets.com/mcp"
    }
  }
}

Claude Desktop will open a browser for OAuth login on first use. No API key needed.

Connect with Claude Code

claude mcp add nanonets --transport http https://mcp.nanonets.com/mcp

Local Development

For local testing without OAuth, set your API key and run the server directly:

export DOCSTRANGE_API_KEY="your-api-key"
cd mcp/
fastmcp run server/main.py:mcp --transport http --port 8080

Then connect your MCP client to http://localhost:8080/mcp.

Authentication

The MCP server supports two authentication modes:

Mode	When	How
OAuth 2.1 (Auth0)	Production / hosted server	Users log in via browser redirect. The server resolves their Nanonets API key from their email.
API key fallback	Local development	Set `DOCSTRANGE_API_KEY` env var. No OAuth flow needed.

OAuth is enabled automatically when AUTH0_MCP_CLIENT_ID is set. When disabled, the server falls back to the DOCSTRANGE_API_KEY environment variable.

Tools

The server exposes 15 tools organized into three categories.

Core Extraction

These tools call the Nanonets extraction API to process documents.

Tool	Description	Best For
`extract_sync`	Synchronous extraction with full options	Small documents (≤5 pages)
`extract_async`	Queue for async processing, returns `record_id`	Large documents (>5 pages)
`extract_stream`	Real-time SSE streaming of extraction results	UI integrations, progress display
`get_result`	Get status and result of a previous extraction	Polling async jobs
`list_results`	List all extraction jobs (paginated)	Browsing extraction history

Convenience Extraction

Pre-configured tools for common document types.

Tool	Description	Output
`extract_to_markdown`	Simple document-to-markdown conversion	Markdown
`extract_invoice`	Structured invoice data extraction	JSON (invoice fields)
`extract_contract_clauses`	Key clause extraction from contracts	JSON (clause fields)
`extract_bank_statement`	Tabular bank statement extraction	CSV
`extract_hierarchy`	Hierarchical document structure	JSON (nested sections)
`extract_with_bounding_boxes`	Content with spatial coordinates	Markdown + bounding box metadata

Document Exploration

Navigate large documents without loading everything into context.

Tool	Description	Key Inputs
`get_toc`	Structural overview (table of contents)	`file_url` or `record_id`
`get_result_pages`	Read specific pages	`record_id` + `pages` (e.g., `"1-3,5"`)
`search_document`	Find text or regex in document	`record_id` + `query`
`get_sections`	Get content of named sections	`section_titles` + `file_url` or `record_id`

Document Exploration Workflow

For large documents (50+ pages), use the exploration tools to avoid flooding the AI context window:

Step 1: Get the Table of Contents

get_toc(file_url="/path/to/annual-report.pdf")

Returns section titles, hierarchy levels, page numbers, and a record_id for follow-up calls.

Step 2: Read Specific Pages

get_result_pages(record_id="abc-123", pages="1-3,15,20-22")

Returns only the requested pages. Reports total_pages and any missing_pages.

Step 3: Search for Content

search_document(record_id="abc-123", query="revenue growth")

Returns matching lines with page numbers, line numbers, and surrounding context (up to 50 matches).

Step 4: Get Named Sections

get_sections(record_id="abc-123", section_titles="Executive Summary, Financial Results")

Returns the full content of matching sections. Supports partial, case-insensitive title matching.

All exploration tools work with the record_id from any prior extraction. Once a document is extracted, you can explore it as many times as needed without re-processing.

Input Methods

All extraction tools accept documents via:

Parameter	Description	Recommendation
`file_url`	Local file path or HTTPS URL	Preferred — saves context tokens
`file_base64`	Base64-encoded file content	Avoid when possible — uses lots of tokens

Local file paths (e.g., /Users/name/doc.pdf, ~/Downloads/report.pdf) are automatically detected and uploaded. Always prefer file paths over base64 encoding.

Supported file types: PDF, Word (.docx), Excel (.xlsx, .xls), PowerPoint (.pptx), Images (PNG, JPG, TIFF, WebP)

Output Formats

Format	Parameter	Use Case
Markdown	`output_format="markdown"`	General text extraction, summaries
JSON	`output_format="json"`	Structured data, specific fields, schemas
CSV	`output_format="csv"`	Table extraction, financial data
HTML	`output_format="html"`	Preserving rich formatting

JSON Options

Field list: json_options='["field1", "field2"]' — extract specific fields
Custom schema: json_options='{"type": "object", ...}' — define exact output structure
Hierarchy: json_options="hierarchy_output" — nested document structure
Table of contents: json_options="table-of-contents" — structural overview only

Example Conversations

Extract an invoice

User: Extract the invoice at ~/Downloads/invoice.pdf Assistant calls extract_invoice(file_url="~/Downloads/invoice.pdf") and returns structured fields: invoice number, vendor, line items, totals.

Explore a large report

User: I have a 200-page annual report. What’s in it? Assistant calls get_toc(file_url="~/Documents/report.pdf") to get the structure, then uses get_result_pages and search_document to read specific sections.

Search within a contract

User: Does this contract mention indemnification? Assistant calls search_document(record_id="...", query="indemnification") to find all mentions with page numbers and context.

Environment Variables

For self-hosted deployments, configure the server with these environment variables:

Variable	Description	Default
`MCP_HOST`	Host to bind to	`0.0.0.0`
`MCP_PORT`	Port to bind to	`8080`
`AUTH0_MCP_CLIENT_ID`	Auth0 application Client ID	(none — OAuth disabled)
`AUTH0_MCP_CLIENT_SECRET`	Auth0 application Client Secret	—
`AUTH0_MCP_AUDIENCE`	Auth0 API audience	`https://mcp.nanonets.com`
`MCP_BASE_URL`	Public URL of the MCP server	`https://mcp.nanonets.com`
`DATABASE_URL`	PostgreSQL connection string	—
`DOCSTRANGE_API_KEY`	API key for local dev (no OAuth)	—

Getting Started

Benchmarks

Resources

Deployment

Overview

15 Extraction Tools

Document Exploration

OAuth 2.1 Authentication

Workflow Recipes

Quick Start

Connect with Claude Desktop

Connect with Claude Code

Local Development

Authentication

Tools

Core Extraction

Convenience Extraction

Document Exploration

Document Exploration Workflow

Step 1: Get the Table of Contents

Step 2: Read Specific Pages

Step 3: Search for Content

Step 4: Get Named Sections

Input Methods

Output Formats

JSON Options

Example Conversations

Extract an invoice

Explore a large report

Search within a contract

Environment Variables

Getting Started

Benchmarks

Resources

Deployment

Documentation Index

​Overview

15 Extraction Tools

Document Exploration

OAuth 2.1 Authentication

Workflow Recipes

​Quick Start

​Connect with Claude Desktop

​Connect with Claude Code

​Local Development

​Authentication

​Tools

​Core Extraction

​Convenience Extraction

​Document Exploration

​Document Exploration Workflow

​Step 1: Get the Table of Contents

​Step 2: Read Specific Pages

​Step 3: Search for Content

​Step 4: Get Named Sections

​Input Methods

​Output Formats

​JSON Options

​Example Conversations

​Extract an invoice

​Explore a large report

​Search within a contract

​Environment Variables

Overview

Quick Start

Connect with Claude Desktop

Connect with Claude Code

Local Development

Authentication

Tools

Core Extraction

Convenience Extraction

Document Exploration

Document Exploration Workflow

Step 1: Get the Table of Contents

Step 2: Read Specific Pages

Step 3: Search for Content

Step 4: Get Named Sections

Input Methods

Output Formats

JSON Options

Example Conversations

Extract an invoice

Explore a large report

Search within a contract

Environment Variables