Qwen 2.5 Coder 1.5B Instruct
By Alibaba · China
Overview
Alibaba's smallest Qwen 2.5 Coder at 1.5B parameters under Apache 2.0, covering 92 programming languages. HumanEval 70.7 makes it a serious on-device completion model.
When to pick this model
- Local inline completion in IDE plugins
- Edge devices and laptops without dedicated GPUs
- Latency-critical code suggestions where 7B is too slow
- Fallback model when bigger coders are unavailable
VRAM requirements by quantization
| Quantization | VRAM required |
|---|---|
| Q4_K_M (recommended) | 1 GB |
| Q5_K_M | 1.2 GB |
| Q8_0 | 2 GB |
| FP16 (no quantization) | 3 GB |
VRAM figures include model weights plus a typical 8k KV cache and ~600 MB runtime overhead (Ollama / llama.cpp baseline). Add headroom for higher context lengths.
Strengths
- Around 1GB VRAM at Q4 — runs nearly anywhere
- Strong inline completion for a 1.5B model
- Apache 2.0 license
- 92 programming languages covered
Limitations
- 1.5B caps code quality — not for complex generation
- 32k context only
- Outclassed on harder tasks by 7B+ coders
Architecture & training
Architecture: Dense · 1.5B · Qwen 2.5 Coder · compact code-specialized
Training: 1.5B params, code corpus across 92 languages, ideal for lightweight completion.
An impressively capable 1.5B coder — keep it for on-device completion, not for whole-feature generation.
Quick start
ollama run qwen2.5-coder:1.5bOr use the open-source MCP server to query this model from Claude Desktop, Cursor, or any MCP-compatible client.
Is Qwen 2.5 Coder 1.5B Instruct the right pick for you?