BestLLMfor EN Your hardware. Your LLM. Your call.
APIOpen data Find my LLM
Model fiche

LLaVA-OneVision 72B

By LMMs-Lab · Singapore

vision chat
Parameters
72B
License
Apache 2.0
Context
32k
VRAM (Q4)
42 GB
Released
September 2024

Overview

The 72B Apache-licensed flagship from LMMs-Lab, built on Qwen2-72B with strong English and Chinese vision performance. A 2024 state-of-the-art open VLM.

When to pick this model

  • High-fidelity image and video understanding on owned hardware
  • Commercial deployments needing Apache-licensed VLMs at scale
  • Multi-image reasoning over technical documents
  • Bilingual EN/CN visual tasks

VRAM requirements by quantization

QuantizationVRAM required
Q4_K_M (recommended)42 GB
Q5_K_M50 GB
Q8_078 GB
FP16 (no quantization)144 GB

VRAM figures include model weights plus a typical 8k KV cache and ~600 MB runtime overhead (Ollama / llama.cpp baseline). Add headroom for higher context lengths.

Published benchmark scores

BenchmarkScore
MMMU69.5

Scores published by the model author or aggregated from public leaderboards. Re-measured monthly by our editorial team.

Strengths

  • State-of-the-art open vision quality at 2024 release
  • Robust multi-image and video reasoning
  • Apache 2.0 with no usage restrictions
  • Solid bilingual EN/CN coverage

Limitations

  • Around 42 GB VRAM at Q4 — needs serious GPU resources
  • 32k context limits long-document workflows
  • Surpassed by Qwen3-VL 30B in 2025 benchmarks

Architecture & training

Architecture: Dense vision · 72B · LLaVA-OneVision · Qwen2 72B backbone

Training: LMMs-Lab — images, videos, documents, multi-image. OneVision dataset.

Verdict

A heavyweight Apache VLM that still delivers, though Qwen3-VL has since taken the open-vision crown at lower cost.

Quick start

ollama run llava-onevision:72b

Or use the open-source MCP server to query this model from Claude Desktop, Cursor, or any MCP-compatible client.

Tools

Is LLaVA-OneVision 72B the right pick for you?

Compute self-hosted ROI → Back to catalog