Model fiche
Qwen 2.5 14B Instruct
By Alibaba · China
chat
general
multilingual
Overview
Alibaba's Apache 2.0 dense 14B hitting MMLU 79.7 and HumanEval 83.5 across 29+ languages. The pragmatic sweet spot for self-hosted general-purpose chat.
When to pick this model
- General-purpose chat on a single 16–24GB GPU
- Multilingual production workloads needing a permissive license
- RAG pipelines balancing quality and inference cost
- Replacing 7B models that hit a quality ceiling
VRAM requirements by quantization
| Quantization | VRAM required |
|---|---|
| Q4_K_M (recommended) | 9 GB |
| Q5_K_M | 11 GB |
| Q8_0 | 16 GB |
| FP16 (no quantization) | 28 GB |
VRAM figures include model weights plus a typical 8k KV cache and ~600 MB runtime overhead (Ollama / llama.cpp baseline). Add headroom for higher context lengths.
Published benchmark scores
| Benchmark | Score |
|---|---|
| MMLU | 79.7 |
| HumanEval | 83.5 |
| GSM8K | 83.1 |
Scores published by the model author or aggregated from public leaderboards. Re-measured monthly by our editorial team.
Strengths
- Apache 2.0 — fully commercial-friendly
- MMLU 79.7 and HumanEval 83.5 at 14B scale
- Excellent VRAM-to-quality ratio
- 131k context via YaRN extension
Limitations
- Native context is 32k — 131k requires YaRN configuration
- Outscored on hard reasoning by 30B+ alternatives
- Vision not included — pick Qwen2.5-VL if you need it
Architecture & training
Architecture: Dense 14B · GQA · 131k ctx
Training: 29+ languages.
Verdict
The default Apache 2.0 dense model for self-hosted general chat — solid quality at a price most teams can run.
Quick start
ollama run qwen2.5:14bOr use the open-source MCP server to query this model from Claude Desktop, Cursor, or any MCP-compatible client.