Model fiche
DeepSeek-OCR
By DeepSeek · China
vision
chat
small
Overview
DeepSeek's 3B MIT-licensed OCR specialist built on DeepEncoder, notable for its 'optical compression' approach. Punches well above its weight on documents, LaTeX, and tables.
When to pick this model
- High-volume document OCR pipelines
- Extracting LaTeX formulas from scientific papers
- Parsing tables from PDFs, scans, and receipts
- Edge deployments needing OCR in ~2 GB VRAM
- MIT-licensed alternative to closed OCR APIs
VRAM requirements by quantization
| Quantization | VRAM required |
|---|---|
| Q4_K_M (recommended) | 2 GB |
| Q5_K_M | 2.5 GB |
| Q8_0 | 4 GB |
| FP16 (no quantization) | 6 GB |
VRAM figures include model weights plus a typical 8k KV cache and ~600 MB runtime overhead (Ollama / llama.cpp baseline). Add headroom for higher context lengths.
Strengths
- Best-in-class OCR quality at only 3B parameters
- Handles LaTeX formulas and table structure cleanly
- Runs in ~2 GB VRAM at Q4 — fits anywhere
- MIT license, no commercial restrictions
- Optical-compression approach reduces token usage on long documents
Limitations
- 8k context limits multi-page document handling
- OCR-only — not a general-purpose VLM
- Limited reasoning capability beyond extraction
Architecture & training
Architecture: Dense vision · 3B · DeepSeek-OCR · specialized in document reading
Training: DeepSeek — massive OCR fine-tuning on scanned documents, receipts, and LaTeX formulas.
Verdict
Drop-in MIT OCR engine that beats far larger general VLMs at extraction tasks.
Quick start
ollama run deepseek-ocr:3bOr use the open-source MCP server to query this model from Claude Desktop, Cursor, or any MCP-compatible client.