Qwen 3 32B
By Alibaba · China
Overview
Alibaba's 32B dense flagship with thinking mode, scoring 65.5 on MMLU-Pro and 39.8 on SuperGPQA. The strongest general-purpose Qwen 3 dense model before stepping up to the MoE.
When to pick this model
- You want a single dense model for chat, code, and reasoning on a 48GB-class GPU
- You need multilingual coverage with strong reasoning headroom
- You want one Apache 2.0 model to standardize on for production
- You need 131K context for long-form work
VRAM requirements by quantization
| Quantization | VRAM required |
|---|---|
| Q4_K_M (recommended) | 19 GB |
| Q5_K_M | 23 GB |
| Q8_0 | 35 GB |
| FP16 (no quantization) | 64 GB |
VRAM figures include model weights plus a typical 8k KV cache and ~600 MB runtime overhead (Ollama / llama.cpp baseline). Add headroom for higher context lengths.
Published benchmark scores
| Benchmark | Score |
|---|---|
| MMLU-Pro | 65.54 |
| SuperGPQA | 39.78 |
Scores published by the model author or aggregated from public leaderboards. Re-measured monthly by our editorial team.
Strengths
- Strong reasoning with thinking mode enabled
- Solid MMLU-Pro and SuperGPQA scores for its size
- 131K context window
- Apache 2.0 license
Limitations
- QwQ-32B is sharper for pure reasoning tasks
- Verbose thinking traces inflate latency and cost
Architecture & training
Architecture: Dense · GQA · hybrid thinking
Training: Same 36T pre-training as the rest of the Qwen 3 family.
The most versatile Apache-licensed 32B available — pick this when you want one model for everything.
Quick start
ollama run qwen3:32bOr use the open-source MCP server to query this model from Claude Desktop, Cursor, or any MCP-compatible client.