Model fiche
DeepSeek R2 32B
By DeepSeek · China
reasoning
Overview
DeepSeek's dense 32B reasoning model under MIT, scoring 92.7% on AIME. Fits on a single RTX 4090 in Q4 and is the best consumer-GPU reasoner available.
When to pick this model
- Math, competition, and STEM reasoning
- Single-GPU production reasoning workloads
- Chain-of-thought research on consumer hardware
- Commercial deployments under MIT
- Replacing closed reasoning APIs on a 4090
VRAM requirements by quantization
| Quantization | VRAM required |
|---|---|
| Q4_K_M (recommended) | 19 GB |
| Q5_K_M | 23 GB |
| Q8_0 | 35 GB |
| FP16 (no quantization) | 64 GB |
VRAM figures include model weights plus a typical 8k KV cache and ~600 MB runtime overhead (Ollama / llama.cpp baseline). Add headroom for higher context lengths.
Published benchmark scores
| Benchmark | Score |
|---|---|
| AIME | 92.7 |
Scores published by the model author or aggregated from public leaderboards. Re-measured monthly by our editorial team.
Strengths
- 92.7% on AIME, frontier-level math reasoning
- Runs on a single RTX 4090 in Q4
- MIT license with full commercial rights
- Best consumer-GPU reasoner of its generation
Limitations
- Verbose chain-of-thought inflates token costs
- Specialized for reasoning, less polished for chat
- Latency can spike on hard problems
Architecture & training
Architecture: Dense 32B · MIT · reasoner
Training: Successor to R1 and R1-Distill.
Verdict
The best open reasoning model that fits on a single consumer GPU.
Quick start
# HuggingFace : deepseek-ai/DeepSeek-R2 (pas encore de tag Ollama officiel)Or use the open-source MCP server to query this model from Claude Desktop, Cursor, or any MCP-compatible client.