Qwen 2.5 Coder 7B
By Alibaba · China
Overview
A 7B coding specialist from Alibaba covering 92 programming languages with a 128k context. Competitive with proprietary models on HumanEval at this size.
When to pick this model
- Local IDE autocomplete and inline code suggestions
- Code review and refactoring assistants on a consumer GPU
- Multi-language codebases needing broad language coverage
- Repo-scale Q&A using the 128k window
- Cheap, high-throughput code generation pipelines
VRAM requirements by quantization
| Quantization | VRAM required |
|---|---|
| Q4_K_M (recommended) | 5 GB |
| Q5_K_M | 6 GB |
| Q8_0 | 9 GB |
| FP16 (no quantization) | 16 GB |
VRAM figures include model weights plus a typical 8k KV cache and ~600 MB runtime overhead (Ollama / llama.cpp baseline). Add headroom for higher context lengths.
Published benchmark scores
| Benchmark | Score |
|---|---|
| HumanEval | 88.4 |
| MBPP | 83.5 |
Scores published by the model author or aggregated from public leaderboards. Re-measured monthly by our editorial team.
Strengths
- Strong HumanEval and code completion for a 7B
- 128k context for repo-scale prompts
- Coverage of 92 programming languages
- Apache 2.0 license
Limitations
- Beaten clearly by the 32B variant on complex code tasks
- Weaker than general 7Bs for non-coding chat
- Limited reasoning on multi-step debugging
Architecture & training
Architecture: Dense Transformer specialized for code · Qwen 2.5 Coder 7B
Training: Qwen 2.5 pre-training + 5.5T code tokens, 92 programming languages.
The right pick when you want a local code model that fits on a single 8GB-class GPU and still pulls its weight.
Quick start
ollama run qwen2.5-coder:7bOr use the open-source MCP server to query this model from Claude Desktop, Cursor, or any MCP-compatible client.