Qwen 2.5 3B Instruct
By Alibaba · China
Overview
Alibaba's compact 3B Qwen 2.5 instruct model with surprisingly strong MMLU 65.6 and HumanEval 74.4. Locked to non-commercial use under the Qwen Research License.
When to pick this model
- Edge and on-device inference where 2GB VRAM is the budget
- Multilingual prototypes before scaling up to a larger model
- Research and personal projects under the Qwen Research License
- Latency-critical paths where larger models are too slow
VRAM requirements by quantization
| Quantization | VRAM required |
|---|---|
| Q4_K_M (recommended) | 2 GB |
| Q5_K_M | 2.5 GB |
| Q8_0 | 4 GB |
| FP16 (no quantization) | 6 GB |
VRAM figures include model weights plus a typical 8k KV cache and ~600 MB runtime overhead (Ollama / llama.cpp baseline). Add headroom for higher context lengths.
Published benchmark scores
| Benchmark | Score |
|---|---|
| MMLU | 65.6 |
Scores published by the model author or aggregated from public leaderboards. Re-measured monthly by our editorial team.
Strengths
- Around 2GB VRAM at Q4 — runs on almost anything
- Multilingual coverage rare at this size
- MMLU 65.6 and HumanEval 74.4 punch above its weight
- 32k context out of the box
Limitations
- Qwen Research License blocks commercial use
- Quality gap vs 7B-and-up is meaningful for non-trivial tasks
- 32k context limits long-document work
Architecture & training
Architecture: Dense transformer · Qwen 2.5 3B · compact multilingual
Training: Full Qwen 2.5 corpus, 3B compressed version.
A strong 3B for research and edge prototyping, but the Qwen Research License rules it out of production.
Quick start
ollama run qwen2.5:3bOr use the open-source MCP server to query this model from Claude Desktop, Cursor, or any MCP-compatible client.