DeepSeek R1 Distill 7B
By DeepSeek · China
Overview
A 7B DeepSeek model distilled from R1 671B with explicit chain-of-thought reasoning. Surprisingly strong on AIME and MATH for its size.
When to pick this model
- Math, logic, and step-by-step problem solving
- Reasoning-heavy tasks on a single consumer GPU
- Experimenting with explicit chain-of-thought outputs
- MIT-licensed local reasoning assistants
- Tutoring and STEM Q&A
VRAM requirements by quantization
| Quantization | VRAM required |
|---|---|
| Q4_K_M (recommended) | 5 GB |
| Q5_K_M | 6 GB |
| Q8_0 | 9 GB |
| FP16 (no quantization) | 16 GB |
VRAM figures include model weights plus a typical 8k KV cache and ~600 MB runtime overhead (Ollama / llama.cpp baseline). Add headroom for higher context lengths.
Published benchmark scores
| Benchmark | Score |
|---|---|
| AIME 2024 | 55.5 |
| MATH-500 | 92.8 |
Scores published by the model author or aggregated from public leaderboards. Re-measured monthly by our editorial team.
Strengths
- Explicit chain-of-thought reasoning at 7B scale
- Strong AIME and MATH scores for its size
- 32k context
- MIT license
Limitations
- Very verbose due to thinking tokens
- Trails the 32B distill on complex reasoning
- Higher token costs per response
- Weaker than general 7Bs on casual chat
Architecture & training
Architecture: DeepSeek R1 distillation to Qwen 2.5 7B ยท explicit chain-of-thought
Training: Distilled from R1 671B. RL on reasoning problems (math, code, logic).
A capable reasoning-specialist 7B โ but bump up to the 32B distill if accuracy matters more than tokens.
Quick start
ollama run deepseek-r1:7bOr use the open-source MCP server to query this model from Claude Desktop, Cursor, or any MCP-compatible client.