BestLLMfor EN Your hardware. Your LLM. Your call.
APIOpen data Find my LLM
Model fiche

Qwen 2.5 72B Instruct

By Alibaba · China

chat general reasoning multilingual
Parameters
72B
License
Qwen License
Context
128k
VRAM (Q4)
42 GB
Released
September 2024

Overview

Alibaba's flagship Qwen 2.5 dense at 72B, with MMLU 86.1 and HumanEval 86.6. Strong across the board but under the custom Qwen License with a 100M MAU threshold.

When to pick this model

  • Top-tier dense chat under 100M MAU
  • Math-heavy workloads needing MATH 83.1
  • Code generation where HumanEval 86.6 matters
  • Multi-GPU deployments wanting near-frontier quality

VRAM requirements by quantization

QuantizationVRAM required
Q4_K_M (recommended)42 GB
Q5_K_M50 GB
Q8_078 GB
FP16 (no quantization)144 GB

VRAM figures include model weights plus a typical 8k KV cache and ~600 MB runtime overhead (Ollama / llama.cpp baseline). Add headroom for higher context lengths.

Published benchmark scores

BenchmarkScore
MMLU86.1
HumanEval86.6
MATH83.1

Scores published by the model author or aggregated from public leaderboards. Re-measured monthly by our editorial team.

Strengths

  • MMLU 86.1 — close to much larger models
  • HumanEval 86.6 strong for a general-purpose model
  • MATH 83.1
  • 131k context with solid long-context behavior

Limitations

  • Custom Qwen License with the 100M MAU clause
  • ~42GB at Q4 — dual-GPU territory
  • Slower than MoE alternatives like Qwen 3 30B-A3B for similar quality

Architecture & training

Architecture: Dense 72B · GQA · 131k ctx via YaRN

Training: Qwen 2.5 dense flagship.

Verdict

The strongest open dense 72B you can self-host — just check the license before scaling past 100M MAU.

Quick start

ollama run qwen2.5:72b

Or use the open-source MCP server to query this model from Claude Desktop, Cursor, or any MCP-compatible client.

Tools

Is Qwen 2.5 72B Instruct the right pick for you?

Compute self-hosted ROI → Back to catalog