Overview
Nanonets Document Extraction API is designed for high availability, horizontal scalability, and enterprise-grade security. The architecture supports both cloud-hosted (SaaS) and on-premise deployment models to meet diverse organizational requirements.Cloud API
Fully managed, auto-scaling infrastructure hosted on Nanonets cloud.
On-Premise
Self-hosted deployment within your own infrastructure for maximum data control.
Cloud API Architecture
The cloud-hosted API runs on a Kubernetes-based infrastructure with automatic scaling, load balancing, and high availability built in.Architecture Diagram

Components
Client Layer
- Client Applications: Your applications connect to the API via HTTPS
- Load Balancer: Distributes incoming requests across API instances for optimal performance
Application Layer
| Component | Description |
|---|---|
| API Service | FastAPI-based REST service handling synchronous and asynchronous extraction requests |
| Worker Service | Background processor for async jobs, polling from the task queue |
| Autoscaling | Horizontal Pod Autoscaler (HPA) for API pods based on CPU/memory; queue-based scaling for workers |
AI Infrastructure
| Component | Description |
|---|---|
| OCR Cluster | GPU-accelerated servers running vision-language models for document understanding |
| Layout Detection | Dedicated service for document layout analysis and region detection |
| Load Balancer | Distributes OCR requests across GPU nodes using least-connections routing |
Managed Services
| Service | Purpose |
|---|---|
| Database | Stores extraction records, job metadata, and audit logs |
| File Storage | Secure object storage for uploaded documents and results |
| Task Queue | Message queue for async job processing with guaranteed delivery |
Scaling Behavior
- API Pods: Scale based on CPU and memory utilization
- Worker Pods: Scale based on queue depth (number of pending jobs)
- GPU Nodes: Pre-provisioned capacity with burst scaling for peak loads
On-Premise Architecture
For organizations requiring full data sovereignty, the on-premise deployment runs entirely within your infrastructure.Architecture Diagram

Components
Client Layer
- Applications: Internal applications connect via your private network
- Load Balancer: Your choice of load balancer (Nginx, HAProxy, etc.)
Application Layer
| Component | Description |
|---|---|
| API Deployment | Containerized FastAPI service for extraction requests |
| Worker Deployment | Optional async task processor for background jobs |
| Autoscaling | Optional HPA for API; queue-based scaling for workers |
AI Infrastructure
| Component | Description |
|---|---|
| OCR Cluster | Self-hosted GPU nodes running Nanonets vision models |
| Layout Detection | Containerized layout analysis service |
GPU requirements vary based on throughput needs. Contact Nanonets for sizing guidance.
Optional Services
| Service | Purpose | Alternatives |
|---|---|---|
| Task Queue | Async job processing | Redis, RabbitMQ, or cloud-managed queues |
| Database | Extraction records storage | PostgreSQL, MySQL |
| File Storage | Document storage | Local filesystem, S3-compatible storage, NFS |
Deployment Options
Docker Compose
Simple deployment for development and small-scale production use.
Kubernetes
Production-grade deployment with full orchestration and scaling capabilities.
Air-Gapped
Fully isolated deployment with no external network dependencies.
Hybrid
On-premise API with cloud-based AI infrastructure for optimal cost efficiency.
Security
Both deployment models include enterprise security features:Encryption
TLS 1.3 for data in transit; AES-256 for data at rest
Authentication
API key authentication with optional OAuth 2.0 / SAML integration
Audit Logging
Comprehensive logging of all API requests and document processing events
Data Isolation
Tenant-level data isolation with configurable retention policies