QwQ 32B
By Alibaba · China
Overview
Alibaba's dedicated 32B reasoner, trained with reinforcement learning rather than distillation. Hits 79.5 on AIME24 and 90.6 on MATH-500 — a direct Apache-licensed alternative to DeepSeek R1.
When to pick this model
- You need a frontier-class reasoner you can run on a single 48GB GPU
- You're solving math, logic, or formal problems where chain-of-thought matters
- You want an Apache-licensed alternative to DeepSeek R1
- You need 131K context for long reasoning traces
VRAM requirements by quantization
| Quantization | VRAM required |
|---|---|
| Q4_K_M (recommended) | 19 GB |
| Q5_K_M | 23 GB |
| Q8_0 | 35 GB |
| FP16 (no quantization) | 64 GB |
VRAM figures include model weights plus a typical 8k KV cache and ~600 MB runtime overhead (Ollama / llama.cpp baseline). Add headroom for higher context lengths.
Published benchmark scores
| Benchmark | Score |
|---|---|
| AIME 2024 | 79.5 |
| MATH-500 | 90.6 |
Scores published by the model author or aggregated from public leaderboards. Re-measured monthly by our editorial team.
Strengths
- Direct competitor to DeepSeek R1 at a fraction of the size
- 131K context for long thinking traces
- Trained with RL, not just distilled
- Apache 2.0
Limitations
- Very verbose — token costs add up fast
- Requires YaRN for context beyond 8K
- Overkill for non-reasoning chat workloads
Architecture & training
Architecture: Dense · 64 layers · GQA (40Q/8KV) · RoPE · SwiGLU · trained with outcome-based RL
Training: RL on reasoning (not a simple distillation).
The best Apache-licensed reasoner you can run on a single GPU.
Quick start
ollama run qwq:32bOr use the open-source MCP server to query this model from Claude Desktop, Cursor, or any MCP-compatible client.