MiniCPM-o 2.6 8B
By OpenBMB · China
Overview
OpenBMB's omni-modal 8B model adding audio and full-duplex speech streaming on top of vision, scoring 70.2 on OpenCompass and beating GPT-4o on single-image tasks.
When to pick this model
- Voice assistants needing on-prem omni-modal capability
- Real-time speech-to-speech demos and prototypes
- Multimodal chat combining vision, audio, and text in one model
- Replacing GPT-4o omni for privacy-sensitive deployments
- Streaming applications that benefit from full-duplex inference
VRAM requirements by quantization
| Quantization | VRAM required |
|---|---|
| Q4_K_M (recommended) | 5.5 GB |
| Q5_K_M | 7 GB |
| Q8_0 | 10 GB |
| FP16 (no quantization) | 18 GB |
VRAM figures include model weights plus a typical 8k KV cache and ~600 MB runtime overhead (Ollama / llama.cpp baseline). Add headroom for higher context lengths.
Published benchmark scores
| Benchmark | Score |
|---|---|
| OpenCompass | 70.2 |
Scores published by the model author or aggregated from public leaderboards. Re-measured monthly by our editorial team.
Strengths
- End-to-end full-duplex speech streaming
- OpenCompass 70.2 across vision-language tasks
- Beats GPT-4o on single-image evaluations
- Unified omni-modal architecture in 8B
Limitations
- Ollama integration is image-only — audio needs native inference
- Speech and audio paths require the official runtime
- Same MiniCPM license registration requirements
Architecture & training
Architecture: Omni 8B · SigLIP + Whisper-medium + ChatTTS + Qwen2.5-7B
Training: End-to-end streaming speech.
The closest open answer to GPT-4o omni — pick it when you need streaming voice and vision in a single self-hosted 8B model.
Quick start
ollama run openbmb/minicpm-o2.6Or use the open-source MCP server to query this model from Claude Desktop, Cursor, or any MCP-compatible client.