Qwen 3 14B
By Alibaba · China
Overview
A 14B dense model from Alibaba that matches Qwen 2.5 32B Base on STEM and code, with the same hybrid thinking system as the rest of the Qwen 3 family. The pragmatic sweet spot for a single 24GB GPU.
When to pick this model
- You have a single 24GB GPU and want the strongest dense Qwen 3 that fits
- You need solid STEM and coding performance without jumping to a 32B
- You want a toggleable thinking mode for harder problems
- You need 131K context for long documents or codebases
VRAM requirements by quantization
| Quantization | VRAM required |
|---|---|
| Q4_K_M (recommended) | 9 GB |
| Q5_K_M | 11 GB |
| Q8_0 | 16 GB |
| FP16 (no quantization) | 28 GB |
VRAM figures include model weights plus a typical 8k KV cache and ~600 MB runtime overhead (Ollama / llama.cpp baseline). Add headroom for higher context lengths.
Published benchmark scores
| Benchmark | Score |
|---|---|
| MMLU (base) | 81.05 |
| SuperGPQA | 34.27 |
Scores published by the model author or aggregated from public leaderboards. Re-measured monthly by our editorial team.
Strengths
- Matches Qwen 2.5 32B Base on STEM and code at less than half the size
- Hybrid thinking mode for harder reasoning passes
- 131K context window
- Apache 2.0
Limitations
- Still trails dedicated reasoners like QwQ-32B on AIME-class problems
- Thinking mode output can balloon for simple prompts
Architecture & training
Architecture: Dense · GQA · hybrid thinking
Training: 36T token corpus.
The smartest dense 14B you can run locally — ideal for a single high-end consumer GPU.
Quick start
ollama run qwen3:14bOr use the open-source MCP server to query this model from Claude Desktop, Cursor, or any MCP-compatible client.