MiniCPM-V 2.6 8B
By OpenBMB · China
Overview
OpenBMB's 8B vision-language model pairing SigLIP and Qwen2, scoring 65.2 on OpenCompass and beating GPT-4o on OCRBench among sub-25B models.
When to pick this model
- OCR and document extraction at high resolution
- Multi-image and video understanding on a single GPU
- VLM workloads needing 32k context
- Replacing GPT-4V for screenshot and form parsing
- Mobile and consumer-grade inference of multimodal apps
VRAM requirements by quantization
| Quantization | VRAM required |
|---|---|
| Q4_K_M (recommended) | 5.5 GB |
| Q5_K_M | 7 GB |
| Q8_0 | 10 GB |
| FP16 (no quantization) | 18 GB |
VRAM figures include model weights plus a typical 8k KV cache and ~600 MB runtime overhead (Ollama / llama.cpp baseline). Add headroom for higher context lengths.
Published benchmark scores
| Benchmark | Score |
|---|---|
| OpenCompass | 65.2 |
Scores published by the model author or aggregated from public leaderboards. Re-measured monthly by our editorial team.
Strengths
- Beats GPT-4o on OCRBench in the sub-25B class
- OpenCompass 65.2 matches much larger VLMs
- Handles 1.8MP inputs without aggressive downsampling
- Native multi-image and video reasoning
- Free aspect-ratio handling avoids letterboxing artifacts
Limitations
- MiniCPM Model License requires registration for commercial use
- Smaller community than Qwen2-VL or Llama-class VLMs
- Tooling support varies across inference backends
Architecture & training
Architecture: VLM 8B ยท SigLIP-400M + Qwen2-7B
Training: Multi-image, video, free aspect ratio.
The OCR champion among compact open VLMs โ the right call when document fidelity beats pure chat quality.
Quick start
ollama run minicpm-v:8bOr use the open-source MCP server to query this model from Claude Desktop, Cursor, or any MCP-compatible client.