| # | Model | Overall | OlmOCR | OmniDoc | IDP | Size |
|---|---|---|---|---|---|---|
| 1 | Nanonets OCR2+ | 81.8 | 82.2 | 89.5 | 73.8 | |
| 2 | Gemini-3-Pro | 81.4 | 73.5 | 88.8 | 81.8 | |
| 3 | Claude Sonnet 4.6 | 80.8 | 74.4 | 86.9 | 81.2 | |
| 4 | Claude Opus 4.6 | 80.3 | 73.9 | 85.9 | 81.1 | |
| 5 | Gemini-3-Flash | 79.9 | 69.2 | 90.1 | 80.5 | |
| 6 | GPT-5.2 | 79.2 | 72.2 | 88.0 | 77.4 | |
| 7 | GPT-5-Mini | 70.8 | 56.7 | 82.5 | 73.3 | |
| 8 | GPT-4.1 | 70.0 | 55.5 | 79.9 | 74.7 | |
| 9 | Claude Haiku 4.5 | 69.6 | 56.2 | 79.6 | 72.9 | |
| 10 | Ministral-8B | 68.0 | 57.8 | 78.3 | 67.9 | 8B |
What the benchmarks measure
OlmOCR
Optical character recognition accuracy across diverse document types and layouts.
OmniDoc
End-to-end document understanding covering structure, tables, and formatting.
IDP
Pulling structured fields from invoices, receipts, and forms.
Overall
The mean of all benchmark scores, giving a single measure of document AI capability.
Methodology
The overall score is the mean of all individual benchmark scores. The full benchmark is open source and fully reproducible.View Full Leaderboard
See all models, detailed breakdowns, and methodology on the IDP Leaderboard.