Model fiche
HunyuanOCR 1B
By Tencent · China
vision
chat
small
Overview
Tencent's 1B end-to-end OCR model that outperforms 235B general VLMs on document tasks. Engineered for edge and mobile deployment.
When to pick this model
- On-device or mobile OCR with strict memory budgets
- High-throughput batch OCR where latency matters
- Receipt, invoice, and form processing at scale
- Embedded systems and edge gateways
- Cost-sensitive OCR pipelines replacing cloud APIs
VRAM requirements by quantization
| Quantization | VRAM required |
|---|---|
| Q4_K_M (recommended) | 0.8 GB |
| Q5_K_M | 1 GB |
| Q8_0 | 1.5 GB |
| FP16 (no quantization) | 2 GB |
VRAM figures include model weights plus a typical 8k KV cache and ~600 MB runtime overhead (Ollama / llama.cpp baseline). Add headroom for higher context lengths.
Strengths
- Runs in under 1 GB VRAM at Q4
- Beats 200B+ general VLMs on document benchmarks
- End-to-end model — no separate detection/recognition stages
- Latency low enough for real-time mobile use
Limitations
- 1B ceiling shows on noisy or complex layouts
- 8k context limits multi-page workflows
- Tencent Hunyuan License is custom — review before commercial use
Architecture & training
Architecture: Dense vision · 1B · Tencent Hunyuan OCR ultra-compact
Training: Tencent — text extraction from scanned documents and images, ultra-compact version.
Verdict
The OCR model to pick when every megabyte counts; for messy real-world documents, step up to DeepSeek-OCR.
Quick start
ollama pull hf.co/tencent/Hunyuan-OCR-1B-GGUFOr use the open-source MCP server to query this model from Claude Desktop, Cursor, or any MCP-compatible client.