Skip to main content

Overview

The docstrange TypeScript SDK provides a type-safe, ergonomic interface for the Nanonets Document Extraction API. It includes full TypeScript types, native async/await, streaming, and automatic pagination.

Key Features

Type Safety

Full TypeScript types for all requests and responses with IntelliSense support.

Async Support

All methods return Promises with native async/await support.

Streaming

Consume SSE streams with for await...of and typed event objects.

File Upload

Upload files from ReadStreams, Buffers, Blobs, or File objects.

Error Handling

Typed exceptions with status codes, messages, and request IDs.

Pagination

Automatic pagination helpers for list endpoints.

Installation

npm install docstrange-api
Requires Node.js 18+

Authentication

The SDK reads your API key from the DOCSTRANGE_API_KEY environment variable by default.
export DOCSTRANGE_API_KEY="your-api-key"
You can also pass it explicitly when creating the client:
import Docstrange from "docstrange-api";

// Uses DOCSTRANGE_API_KEY env var
const client = new Docstrange();

// Or pass explicitly
const client = new Docstrange({ apiKey: "your-api-key" });
Get your API key from the top right menu on docstrange.nanonets.com.

Core Methods

Synchronous Extraction

Extract content from a document and get results immediately. Best for files with 5 pages or less.
import Docstrange from "docstrange-api";
import fs from "fs";

const client = new Docstrange();

const result = await client.extract.sync({
  file: fs.createReadStream("invoice.pdf"),
  output_format: "markdown",
  custom_instructions: "Format all dates as YYYY-MM-DD",
  include_metadata: "bounding_boxes,confidence_score",
});

console.log(result.result.markdown.content);
Parameters:
ParameterTypeRequiredDescription
fileUploadable*File to upload (PDF, Word, Excel, PowerPoint, images). Provide exactly one of file, file_url, or file_base64.
file_urlstring*URL to download the file from.
file_base64string*Base64-encoded file content.
output_formatstringYesOutput format(s): markdown, html, json, csv. Comma-separate for multiple (e.g., "markdown,json").
custom_instructionsstringNoCustom extraction instructions (max 8,000 chars). E.g., "Format dates as YYYY-MM-DD".
prompt_modestringNo"append" (default) adds to base prompt, "replace" uses only your custom instructions.
json_optionsstringNoJSON extraction mode. Values: "hierarchy_output", "table-of-contents", field list '["field1", "field2"]', or JSON schema '{...}'.
csv_optionsstringNoCSV extraction options. E.g., "table".
include_metadatastringNoComma-separated metadata types: bounding_boxes, bounding_boxes_word, confidence_score.
Provide exactly one file input: file, file_url, or file_base64. The file parameter accepts fs.createReadStream(), Buffer, Blob, or File objects.

Asynchronous Extraction

Queue a document for background processing. Returns a record_id to poll results. Recommended for large documents (>5 pages).
const response = await client.extract.async({
  file: fs.createReadStream("large-report.pdf"),
  output_format: "json",
  json_options: '["invoice_number", "date", "total_amount"]',
});

console.log(`Queued with record_id: ${response.record_id}`);

// Poll for results
const result = await client.extract.results.retrieve({
  record_id: response.record_id,
});
console.log(result.status);
Parameters: Same as Synchronous Extraction.

Streaming Extraction

Stream extraction results in real-time via Server-Sent Events.
const stream = await client.extract.stream({
  file: fs.createReadStream("document.pdf"),
  output_format: "markdown",
  enable_streaming: true,
});

for await (const event of stream) {
  if (event.type === "content") {
    process.stdout.write(event.data);
  } else if (event.type === "done") {
    console.log(`\nCompleted in ${event.processing_time}s`);
  } else if (event.type === "async_queued") {
    console.log(`Large file queued: ${event.record_id}`);
  }
}
Parameters: All Synchronous Extraction parameters, plus:
ParameterTypeRequiredDescription
enable_streamingbooleanNotrue (default) for real-time incremental chunks. false for batch mode (complete content in a single SSE event).
SSE Event Types:
Event TypeDescription
contentIncremental content chunk (streaming mode)
completeFull content at once (batch mode, when enable_streaming is false)
doneFinal event with record_id and processing_time
errorError information
async_queuedLarge files automatically queued for async processing

Batch Extraction

Process multiple documents in a single request (max 50 files). All files share the same extraction options.
const response = await client.extract.batch({
  files: [
    fs.createReadStream("invoice1.pdf"),
    fs.createReadStream("invoice2.pdf"),
    fs.createReadStream("invoice3.pdf"),
  ],
  output_format: "json",
  json_options: '["invoice_number", "date", "total_amount"]',
  custom_instructions: "Extract amounts without currency symbols",
});

console.log(`Batch ${response.batch_id}: ${response.accepted_files} files queued`);

for (const record of response.records) {
  console.log(`  ${record.filename}: ${record.record_id}`);
}
Parameters:
ParameterTypeRequiredDescription
filesUploadable[]YesList of files to process (max 50).
output_formatstringYesOutput format(s) applied to all files.
custom_instructionsstringNoCustom extraction instructions (max 8,000 chars).
prompt_modestringNo"append" (default) or "replace".
json_optionsstringNoJSON extraction mode.
csv_optionsstringNoCSV extraction options.
include_metadatastringNoComma-separated metadata types.

Document Classification

Classify a document into predefined categories. Each page is classified individually with a category, confidence score (0-100), and reasoning.
const result = await client.classify.sync({
  file: fs.createReadStream("document.pdf"),
  categories:
    '[{"name": "Invoice", "description": "Bills and invoices"}, {"name": "Contract", "description": "Legal agreements"}, {"name": "Receipt"}]',
});

for (const page of result.result.pages) {
  console.log(
    `Page ${page.page_number}: ${page.category} (${page.confidence}%) - ${page.reasoning}`
  );
}
Parameters:
ParameterTypeRequiredDescription
fileUploadableYesFile to classify (PDF, PNG, JPG, JPEG, TIFF, BMP, WebP).
categoriesstringYesJSON array of category objects: [{"name": "Category", "description": "Optional description"}].

Batch Classification

Classify multiple documents at once (max 50 files).
const response = await client.classify.batch({
  files: [
    fs.createReadStream("doc1.pdf"),
    fs.createReadStream("doc2.pdf"),
  ],
  categories:
    '[{"name": "Invoice"}, {"name": "Receipt"}, {"name": "Contract"}]',
});

console.log(`Batch ${response.batch_id}: ${response.successful_files} classified`);

for (const result of response.results) {
  console.log(`  ${result.filename}: ${result.pages[0].category}`);
}
Parameters:
ParameterTypeRequiredDescription
filesUploadable[]YesFiles to classify (max 50).
categoriesstringYesJSON array of category objects (max 50 categories).

Retrieve Results

Get the status and result of a previous extraction by record ID.
const result = await client.extract.results.retrieve({
  record_id: "12345",
  include_content: true,
});

if (result.status === "completed") {
  console.log(result.result.markdown.content);
} else if (result.status === "processing") {
  console.log("Still processing...");
}
Parameters:
ParameterTypeRequiredDescription
record_idstringYesExtraction job ID (numeric string returned by the API).
include_contentbooleanNoInclude full extracted content (default: true). Set to false to retrieve only status and metadata.

List Results

List all extraction results for the authenticated user (paginated).
const page = await client.extract.results.list({
  page: 1,
  page_size: 10,
  sort_by: "created_at",
  sort_order: "desc",
});

for (const record of page.results) {
  console.log(`${record.record_id}: ${record.status} (${record.filename})`);
}

console.log(`Page ${page.pagination.page} of ${page.pagination.total_pages}`);
console.log(`Total records: ${page.pagination.total_count}`);
Parameters:
ParameterTypeRequiredDescription
pagenumberNoPage number (default: 1, minimum: 1).
page_sizenumberNoResults per page (default: 20, range: 1-100).
sort_bystringNoSort field. One of: created_at (default), updated_at, original_filename, file_size, processing_status.
sort_orderstringNoSort direction: "desc" (default) or "asc".

Error Handling

The SDK raises typed exceptions for API errors.
import Docstrange, {
  APIStatusError,
  APIConnectionError,
  APITimeoutError,
} from "docstrange-api";

const client = new Docstrange();

try {
  const result = await client.extract.sync({
    file: fs.createReadStream("document.pdf"),
    output_format: "markdown",
  });
} catch (err) {
  if (err instanceof APIConnectionError) {
    console.log("Failed to connect to the API");
  } else if (err instanceof APITimeoutError) {
    console.log("Request timed out");
  } else if (err instanceof APIStatusError) {
    console.log(`API error ${err.status}: ${err.message}`);
    console.log(`Request ID: ${err.headers?.["x-request-id"]}`);
  }
}
ExceptionDescription
APIConnectionErrorNetwork connectivity issues
APITimeoutErrorRequest exceeded timeout
APIStatusErrorAPI returned an error status code
AuthenticationErrorInvalid or missing API key (401)
PermissionDeniedErrorInsufficient permissions (403)
NotFoundErrorResource not found (404)
RateLimitErrorToo many requests (429)
InternalServerErrorServer error (500+)

Configuration

Custom Base URL

For on-premise deployments, point the client to your own instance:
const client = new Docstrange({
  apiKey: "your-api-key",
  baseURL: "https://your-instance.example.com",
});

Timeouts

Configure request timeouts (in milliseconds):
const client = new Docstrange({
  timeout: 60000, // 60 seconds
});

Retries

The SDK automatically retries failed requests with exponential backoff. Configure the maximum number of retries:
const client = new Docstrange({ maxRetries: 3 }); // default is 2

Next Steps

API Reference

Explore the complete API documentation with interactive examples.

Examples

See full code examples for every endpoint and output format.