LLaVA-OneVision 72B
By LMMs-Lab · Singapore
Overview
The 72B Apache-licensed flagship from LMMs-Lab, built on Qwen2-72B with strong English and Chinese vision performance. A 2024 state-of-the-art open VLM.
When to pick this model
- High-fidelity image and video understanding on owned hardware
- Commercial deployments needing Apache-licensed VLMs at scale
- Multi-image reasoning over technical documents
- Bilingual EN/CN visual tasks
VRAM requirements by quantization
| Quantization | VRAM required |
|---|---|
| Q4_K_M (recommended) | 42 GB |
| Q5_K_M | 50 GB |
| Q8_0 | 78 GB |
| FP16 (no quantization) | 144 GB |
VRAM figures include model weights plus a typical 8k KV cache and ~600 MB runtime overhead (Ollama / llama.cpp baseline). Add headroom for higher context lengths.
Published benchmark scores
| Benchmark | Score |
|---|---|
| MMMU | 69.5 |
Scores published by the model author or aggregated from public leaderboards. Re-measured monthly by our editorial team.
Strengths
- State-of-the-art open vision quality at 2024 release
- Robust multi-image and video reasoning
- Apache 2.0 with no usage restrictions
- Solid bilingual EN/CN coverage
Limitations
- Around 42 GB VRAM at Q4 — needs serious GPU resources
- 32k context limits long-document workflows
- Surpassed by Qwen3-VL 30B in 2025 benchmarks
Architecture & training
Architecture: Dense vision · 72B · LLaVA-OneVision · Qwen2 72B backbone
Training: LMMs-Lab — images, videos, documents, multi-image. OneVision dataset.
A heavyweight Apache VLM that still delivers, though Qwen3-VL has since taken the open-vision crown at lower cost.
Quick start
ollama run llava-onevision:72bOr use the open-source MCP server to query this model from Claude Desktop, Cursor, or any MCP-compatible client.