TypeScript SDK

Overview

The docstrange TypeScript SDK provides a type-safe, ergonomic interface for the Nanonets Document Extraction API. It includes full TypeScript types, native async/await, streaming, and automatic pagination.

Key Features

Type Safety

Full TypeScript types for all requests and responses with IntelliSense support.

Async Support

All methods return Promises with native async/await support.

Streaming

Consume SSE streams with for await...of and typed event objects.

File Upload

Upload files from ReadStreams, Buffers, Blobs, or File objects.

Error Handling

Typed exceptions with status codes, messages, and request IDs.

Pagination

Automatic pagination helpers for list endpoints.

Installation

npm install docstrange-api

Requires Node.js 18+

Authentication

The SDK reads your API key from the DOCSTRANGE_API_KEY environment variable by default.

export DOCSTRANGE_API_KEY="your-api-key"

You can also pass it explicitly when creating the client:

import Docstrange from "docstrange-api";

// Uses DOCSTRANGE_API_KEY env var
const client = new Docstrange();

// Or pass explicitly
const client = new Docstrange({ apiKey: "your-api-key" });

Get your API key from the top right menu on docstrange.nanonets.com.

Core Methods

Synchronous Extraction

Extract content from a document and get results immediately. Best for files with 5 pages or less.

import Docstrange from "docstrange-api";
import fs from "fs";

const client = new Docstrange();

const result = await client.extract.sync({
  file: fs.createReadStream("invoice.pdf"),
  output_format: "markdown",
  custom_instructions: "Format all dates as YYYY-MM-DD",
  include_metadata: "bounding_boxes,confidence_score",
});

console.log(result.result.markdown.content);

Parameters:

Parameter	Type	Required	Description
`file`	`Uploadable`	*	File to upload (PDF, Word, Excel, PowerPoint, images). Provide exactly one of `file`, `file_url`, or `file_base64`.
`file_url`	`string`	*	URL to download the file from.
`file_base64`	`string`	*	Base64-encoded file content.
`output_format`	`string`	Yes	Output format(s): `markdown`, `html`, `json`, `csv`. Comma-separate for multiple (e.g., `"markdown,json"`).
`custom_instructions`	`string`	No	Custom extraction instructions (max 8,000 chars). E.g., `"Format dates as YYYY-MM-DD"`.
`prompt_mode`	`string`	No	`"append"` (default) adds to base prompt, `"replace"` uses only your custom instructions.
`json_options`	`string`	No	JSON extraction mode. Values: `"hierarchy_output"`, `"table-of-contents"`, field list `'["field1", "field2"]'`, or JSON schema `'{...}'`.
`csv_options`	`string`	No	CSV extraction options. E.g., `"table"`.
`include_metadata`	`string`	No	Comma-separated metadata types: `bounding_boxes`, `bounding_boxes_word`, `confidence_score`.

Provide exactly one file input: file, file_url, or file_base64. The file parameter accepts fs.createReadStream(), Buffer, Blob, or File objects.

Asynchronous Extraction

Queue a document for background processing. Returns a record_id to poll results. Recommended for large documents (>5 pages).

const response = await client.extract.async({
  file: fs.createReadStream("large-report.pdf"),
  output_format: "json",
  json_options: '["invoice_number", "date", "total_amount"]',
});

console.log(`Queued with record_id: ${response.record_id}`);

// Poll for results
const result = await client.extract.results.retrieve({
  record_id: response.record_id,
});
console.log(result.status);

Parameters: Same as Synchronous Extraction.

Streaming Extraction

Stream extraction results in real-time via Server-Sent Events.

const stream = await client.extract.stream({
  file: fs.createReadStream("document.pdf"),
  output_format: "markdown",
  enable_streaming: true,
});

for await (const event of stream) {
  if (event.type === "content") {
    process.stdout.write(event.data);
  } else if (event.type === "done") {
    console.log(`\nCompleted in ${event.processing_time}s`);
  } else if (event.type === "async_queued") {
    console.log(`Large file queued: ${event.record_id}`);
  }
}

Parameters: All Synchronous Extraction parameters, plus:

Parameter	Type	Required	Description
`enable_streaming`	`boolean`	No	`true` (default) for real-time incremental chunks. `false` for batch mode (complete content in a single SSE event).

SSE Event Types:

Event Type	Description
`content`	Incremental content chunk (streaming mode)
`complete`	Full content at once (batch mode, when `enable_streaming` is `false`)
`done`	Final event with `record_id` and `processing_time`
`error`	Error information
`async_queued`	Large files automatically queued for async processing

Batch Extraction

Process multiple documents in a single request (max 50 files). All files share the same extraction options.

const response = await client.extract.batch({
  files: [
    fs.createReadStream("invoice1.pdf"),
    fs.createReadStream("invoice2.pdf"),
    fs.createReadStream("invoice3.pdf"),
  ],
  output_format: "json",
  json_options: '["invoice_number", "date", "total_amount"]',
  custom_instructions: "Extract amounts without currency symbols",
});

console.log(`Batch ${response.batch_id}: ${response.accepted_files} files queued`);

for (const record of response.records) {
  console.log(`  ${record.filename}: ${record.record_id}`);
}

Parameters:

Parameter	Type	Required	Description
`files`	`Uploadable[]`	Yes	List of files to process (max 50).
`output_format`	`string`	Yes	Output format(s) applied to all files.
`custom_instructions`	`string`	No	Custom extraction instructions (max 8,000 chars).
`prompt_mode`	`string`	No	`"append"` (default) or `"replace"`.
`json_options`	`string`	No	JSON extraction mode.
`csv_options`	`string`	No	CSV extraction options.
`include_metadata`	`string`	No	Comma-separated metadata types.

Document Classification

Classify a document into predefined categories. Each page is classified individually with a category, confidence score (0-100), and reasoning.

const result = await client.classify.sync({
  file: fs.createReadStream("document.pdf"),
  categories:
    '[{"name": "Invoice", "description": "Bills and invoices"}, {"name": "Contract", "description": "Legal agreements"}, {"name": "Receipt"}]',
});

for (const page of result.result.pages) {
  console.log(
    `Page ${page.page_number}: ${page.category} (${page.confidence}%) - ${page.reasoning}`
  );
}

Parameters:

Parameter	Type	Required	Description
`file`	`Uploadable`	Yes	File to classify (PDF, PNG, JPG, JPEG, TIFF, BMP, WebP).
`categories`	`string`	Yes	JSON array of category objects: `[{"name": "Category", "description": "Optional description"}]`.

Batch Classification

Classify multiple documents at once (max 50 files).

const response = await client.classify.batch({
  files: [
    fs.createReadStream("doc1.pdf"),
    fs.createReadStream("doc2.pdf"),
  ],
  categories:
    '[{"name": "Invoice"}, {"name": "Receipt"}, {"name": "Contract"}]',
});

console.log(`Batch ${response.batch_id}: ${response.successful_files} classified`);

for (const result of response.results) {
  console.log(`  ${result.filename}: ${result.pages[0].category}`);
}

Parameters:

Parameter	Type	Required	Description
`files`	`Uploadable[]`	Yes	Files to classify (max 50).
`categories`	`string`	Yes	JSON array of category objects (max 50 categories).

Retrieve Results

Get the status and result of a previous extraction by record ID.

const result = await client.extract.results.retrieve({
  record_id: "12345",
  include_content: true,
});

if (result.status === "completed") {
  console.log(result.result.markdown.content);
} else if (result.status === "processing") {
  console.log("Still processing...");
}

Parameters:

Parameter	Type	Required	Description
`record_id`	`string`	Yes	Extraction job ID (numeric string returned by the API).
`include_content`	`boolean`	No	Include full extracted content (default: `true`). Set to `false` to retrieve only status and metadata.

List Results

List all extraction results for the authenticated user (paginated).

const page = await client.extract.results.list({
  page: 1,
  page_size: 10,
  sort_by: "created_at",
  sort_order: "desc",
});

for (const record of page.results) {
  console.log(`${record.record_id}: ${record.status} (${record.filename})`);
}

console.log(`Page ${page.pagination.page} of ${page.pagination.total_pages}`);
console.log(`Total records: ${page.pagination.total_count}`);

Parameters:

Parameter	Type	Required	Description
`page`	`number`	No	Page number (default: `1`, minimum: `1`).
`page_size`	`number`	No	Results per page (default: `20`, range: `1`-`100`).
`sort_by`	`string`	No	Sort field. One of: `created_at` (default), `updated_at`, `original_filename`, `file_size`, `processing_status`.
`sort_order`	`string`	No	Sort direction: `"desc"` (default) or `"asc"`.

Error Handling

The SDK raises typed exceptions for API errors.

import Docstrange, {
  APIStatusError,
  APIConnectionError,
  APITimeoutError,
} from "docstrange-api";

const client = new Docstrange();

try {
  const result = await client.extract.sync({
    file: fs.createReadStream("document.pdf"),
    output_format: "markdown",
  });
} catch (err) {
  if (err instanceof APIConnectionError) {
    console.log("Failed to connect to the API");
  } else if (err instanceof APITimeoutError) {
    console.log("Request timed out");
  } else if (err instanceof APIStatusError) {
    console.log(`API error ${err.status}: ${err.message}`);
    console.log(`Request ID: ${err.headers?.["x-request-id"]}`);
  }
}

Exception	Description
`APIConnectionError`	Network connectivity issues
`APITimeoutError`	Request exceeded timeout
`APIStatusError`	API returned an error status code
`AuthenticationError`	Invalid or missing API key (401)
`PermissionDeniedError`	Insufficient permissions (403)
`NotFoundError`	Resource not found (404)
`RateLimitError`	Too many requests (429)
`InternalServerError`	Server error (500+)

Configuration

Custom Base URL

For on-premise deployments, point the client to your own instance:

const client = new Docstrange({
  apiKey: "your-api-key",
  baseURL: "https://your-instance.example.com",
});

Timeouts

Configure request timeouts (in milliseconds):

const client = new Docstrange({
  timeout: 60000, // 60 seconds
});

Retries

The SDK automatically retries failed requests with exponential backoff. Configure the maximum number of retries:

const client = new Docstrange({ maxRetries: 3 }); // default is 2

Python

TypeScript

Overview

Key Features

Type Safety

Async Support

Streaming

File Upload

Error Handling

Pagination

Installation

Authentication

Core Methods

Synchronous Extraction

Asynchronous Extraction

Streaming Extraction

Batch Extraction

Document Classification

Batch Classification

Retrieve Results

List Results

Error Handling

Configuration

Custom Base URL

Timeouts

Retries

Next Steps

API Reference

Examples

Python

TypeScript

Documentation Index

​Overview

​Key Features

Type Safety

Async Support

Streaming

File Upload

Error Handling

Pagination

​Installation

​Authentication

​Core Methods

​Synchronous Extraction

​Asynchronous Extraction

​Streaming Extraction

​Batch Extraction

​Document Classification

​Batch Classification

​Retrieve Results

​List Results

​Error Handling

​Configuration

​Custom Base URL

​Timeouts

​Retries

​Next Steps

API Reference

Examples

Overview

Key Features

Installation

Authentication

Core Methods

Synchronous Extraction

Asynchronous Extraction

Streaming Extraction

Batch Extraction

Document Classification

Batch Classification

Retrieve Results

List Results

Error Handling

Configuration

Custom Base URL

Timeouts

Retries

Next Steps