Skip to main content

Overview

Nanonets Document Extraction API is designed for high availability, horizontal scalability, and enterprise-grade security. The architecture supports both cloud-hosted (SaaS) and on-premise deployment models to meet diverse organizational requirements.

Cloud API Architecture

The cloud-hosted API runs on a Kubernetes-based infrastructure with automatic scaling, load balancing, and high availability built in.

Architecture Diagram

Cloud API Architecture Diagram

Components

Client Layer

  • Client Applications: Your applications connect to the API via HTTPS
  • Load Balancer: Distributes incoming requests across API instances for optimal performance

Application Layer

ComponentDescription
API ServiceFastAPI-based REST service handling synchronous and asynchronous extraction requests
Worker ServiceBackground processor for async jobs, polling from the task queue
AutoscalingHorizontal Pod Autoscaler (HPA) for API pods based on CPU/memory; queue-based scaling for workers

AI Infrastructure

ComponentDescription
OCR ClusterGPU-accelerated servers running vision-language models for document understanding
Layout DetectionDedicated service for document layout analysis and region detection
Load BalancerDistributes OCR requests across GPU nodes using least-connections routing

Managed Services

ServicePurpose
DatabaseStores extraction records, job metadata, and audit logs
File StorageSecure object storage for uploaded documents and results
Task QueueMessage queue for async job processing with guaranteed delivery

Scaling Behavior

  • API Pods: Scale based on CPU and memory utilization
  • Worker Pods: Scale based on queue depth (number of pending jobs)
  • GPU Nodes: Pre-provisioned capacity with burst scaling for peak loads

On-Premise Architecture

For organizations requiring full data sovereignty, the on-premise deployment runs entirely within your infrastructure.

Architecture Diagram

On-Premise Architecture Diagram

Components

Client Layer

  • Applications: Internal applications connect via your private network
  • Load Balancer: Your choice of load balancer (Nginx, HAProxy, etc.)

Application Layer

ComponentDescription
API DeploymentContainerized FastAPI service for extraction requests
Worker DeploymentOptional async task processor for background jobs
AutoscalingOptional HPA for API; queue-based scaling for workers

AI Infrastructure

ComponentDescription
OCR ClusterSelf-hosted GPU nodes running Nanonets vision models
Layout DetectionContainerized layout analysis service
GPU requirements vary based on throughput needs. Contact Nanonets for sizing guidance.

Optional Services

ServicePurposeAlternatives
Task QueueAsync job processingRedis, RabbitMQ, or cloud-managed queues
DatabaseExtraction records storagePostgreSQL, MySQL
File StorageDocument storageLocal filesystem, S3-compatible storage, NFS

Deployment Options

Docker Compose

Simple deployment for development and small-scale production use.

Kubernetes

Production-grade deployment with full orchestration and scaling capabilities.

Air-Gapped

Fully isolated deployment with no external network dependencies.

Hybrid

On-premise API with cloud-based AI infrastructure for optimal cost efficiency.

Security

Both deployment models include enterprise security features:

Encryption

TLS 1.3 for data in transit; AES-256 for data at rest

Authentication

API key authentication with optional OAuth 2.0 / SAML integration

Audit Logging

Comprehensive logging of all API requests and document processing events

Data Isolation

Tenant-level data isolation with configurable retention policies

Next Steps