DeepSeek R1 Distill 32B
By DeepSeek · China
Overview
The 32B DeepSeek R1 distill — the best accessible open-weight reasoner we've tested. Explicit chain-of-thought, MIT-licensed, runs on a single 24GB GPU.
When to pick this model
- Math, logic, and proof-style problems
- Code debugging where explicit reasoning helps
- Research workflows needing visible chain-of-thought
- Self-hosted alternatives to o1-mini-class APIs
- Commercial use under MIT license
VRAM requirements by quantization
| Quantization | VRAM required |
|---|---|
| Q4_K_M (recommended) | 19 GB |
| Q5_K_M | 23 GB |
| Q8_0 | 35 GB |
| FP16 (no quantization) | 64 GB |
VRAM figures include model weights plus a typical 8k KV cache and ~600 MB runtime overhead (Ollama / llama.cpp baseline). Add headroom for higher context lengths.
Published benchmark scores
| Benchmark | Score |
|---|---|
| AIME 2024 | 72.6 |
| MATH-500 | 94.3 |
| GPQA | 62.1 |
Scores published by the model author or aggregated from public leaderboards. Re-measured monthly by our editorial team.
Strengths
- Best open-weight reasoner that fits on one consumer GPU
- Excellent math and science performance
- Explicit step-by-step thinking
- MIT license
- 32k context
Limitations
- Heavy thinking-token output inflates latency and cost
- Slow time-to-first-useful-answer
- 32k context is shorter than most 2025 peers
- Overkill for simple chat
Architecture & training
Architecture: DeepSeek R1 distillation · reinforced chain-of-thought
Training: Distilled from R1 671B · RL on reasoning problems.
The go-to local reasoning model for STEM and code — accept the verbosity, get the accuracy.
Quick start
ollama run deepseek-r1:32bOr use the open-source MCP server to query this model from Claude Desktop, Cursor, or any MCP-compatible client.