BestLLMfor EN Your hardware. Your LLM. Your call.
APIOpen data Find my LLM
Model fiche

DeepSeek-OCR

By DeepSeek · China

vision chat small
Parameters
3B
License
MIT
Context
8k
VRAM (Q4)
2 GB
Released
April 2025

Overview

DeepSeek's 3B MIT-licensed OCR specialist built on DeepEncoder, notable for its 'optical compression' approach. Punches well above its weight on documents, LaTeX, and tables.

When to pick this model

  • High-volume document OCR pipelines
  • Extracting LaTeX formulas from scientific papers
  • Parsing tables from PDFs, scans, and receipts
  • Edge deployments needing OCR in ~2 GB VRAM
  • MIT-licensed alternative to closed OCR APIs

VRAM requirements by quantization

QuantizationVRAM required
Q4_K_M (recommended)2 GB
Q5_K_M2.5 GB
Q8_04 GB
FP16 (no quantization)6 GB

VRAM figures include model weights plus a typical 8k KV cache and ~600 MB runtime overhead (Ollama / llama.cpp baseline). Add headroom for higher context lengths.

Strengths

  • Best-in-class OCR quality at only 3B parameters
  • Handles LaTeX formulas and table structure cleanly
  • Runs in ~2 GB VRAM at Q4 — fits anywhere
  • MIT license, no commercial restrictions
  • Optical-compression approach reduces token usage on long documents

Limitations

  • 8k context limits multi-page document handling
  • OCR-only — not a general-purpose VLM
  • Limited reasoning capability beyond extraction

Architecture & training

Architecture: Dense vision · 3B · DeepSeek-OCR · specialized in document reading

Training: DeepSeek — massive OCR fine-tuning on scanned documents, receipts, and LaTeX formulas.

Verdict

Drop-in MIT OCR engine that beats far larger general VLMs at extraction tasks.

Quick start

ollama run deepseek-ocr:3b

Or use the open-source MCP server to query this model from Claude Desktop, Cursor, or any MCP-compatible client.

Tools

Is DeepSeek-OCR the right pick for you?

Compute self-hosted ROI → Back to catalog