Qwen 2.5 Coder 14B Instruct
By Alibaba · China
Overview
Alibaba's Qwen 2.5 Coder 14B under Apache 2.0 with HumanEval 89.6 and LiveCodeBench 37.1. The VRAM sweet spot for serious self-hosted code generation.
When to pick this model
- Self-hosted coding agents on a single 24GB GPU
- Repo-scale code generation needing 131k context
- Permissively licensed alternative to Codestral
- Multi-language production codebases
VRAM requirements by quantization
| Quantization | VRAM required |
|---|---|
| Q4_K_M (recommended) | 9 GB |
| Q5_K_M | 11 GB |
| Q8_0 | 16 GB |
| FP16 (no quantization) | 28 GB |
VRAM figures include model weights plus a typical 8k KV cache and ~600 MB runtime overhead (Ollama / llama.cpp baseline). Add headroom for higher context lengths.
Published benchmark scores
| Benchmark | Score |
|---|---|
| HumanEval | 89.6 |
| MBPP | 86.2 |
| LiveCodeBench | 37.1 |
Scores published by the model author or aggregated from public leaderboards. Re-measured monthly by our editorial team.
Strengths
- HumanEval 89.6 — competitive with much larger coders
- LiveCodeBench 37.1
- Apache 2.0 license
- 131k context for long-file work
Limitations
- Weaker than general 14B models on non-code chat
- No vision input
- Outscored by frontier closed APIs on the hardest benchmarks
Architecture & training
Architecture: Dense 14B code · FIM
Training: 5.5T tokens of code.
The pragmatic Apache 2.0 coder — strong benchmarks, 24GB VRAM, and no licensing landmines.
Quick start
ollama run qwen2.5-coder:14bOr use the open-source MCP server to query this model from Claude Desktop, Cursor, or any MCP-compatible client.
Is Qwen 2.5 Coder 14B Instruct the right pick for you?