Qwen 2.5 72B Instruct
By Alibaba · China
Overview
Alibaba's flagship Qwen 2.5 dense at 72B, with MMLU 86.1 and HumanEval 86.6. Strong across the board but under the custom Qwen License with a 100M MAU threshold.
When to pick this model
- Top-tier dense chat under 100M MAU
- Math-heavy workloads needing MATH 83.1
- Code generation where HumanEval 86.6 matters
- Multi-GPU deployments wanting near-frontier quality
VRAM requirements by quantization
| Quantization | VRAM required |
|---|---|
| Q4_K_M (recommended) | 42 GB |
| Q5_K_M | 50 GB |
| Q8_0 | 78 GB |
| FP16 (no quantization) | 144 GB |
VRAM figures include model weights plus a typical 8k KV cache and ~600 MB runtime overhead (Ollama / llama.cpp baseline). Add headroom for higher context lengths.
Published benchmark scores
| Benchmark | Score |
|---|---|
| MMLU | 86.1 |
| HumanEval | 86.6 |
| MATH | 83.1 |
Scores published by the model author or aggregated from public leaderboards. Re-measured monthly by our editorial team.
Strengths
- MMLU 86.1 — close to much larger models
- HumanEval 86.6 strong for a general-purpose model
- MATH 83.1
- 131k context with solid long-context behavior
Limitations
- Custom Qwen License with the 100M MAU clause
- ~42GB at Q4 — dual-GPU territory
- Slower than MoE alternatives like Qwen 3 30B-A3B for similar quality
Architecture & training
Architecture: Dense 72B · GQA · 131k ctx via YaRN
Training: Qwen 2.5 dense flagship.
The strongest open dense 72B you can self-host — just check the license before scaling past 100M MAU.
Quick start
ollama run qwen2.5:72bOr use the open-source MCP server to query this model from Claude Desktop, Cursor, or any MCP-compatible client.