BestLLMfor EN Your hardware. Your LLM. Your call.
APIOpen data Find my LLM
Model fiche

DeepSeek R2 32B

By DeepSeek · China

reasoning
Parameters
32B
License
MIT
Context
125k
VRAM (Q4)
19 GB
Released
April 2026

Overview

DeepSeek's dense 32B reasoning model under MIT, scoring 92.7% on AIME. Fits on a single RTX 4090 in Q4 and is the best consumer-GPU reasoner available.

When to pick this model

  • Math, competition, and STEM reasoning
  • Single-GPU production reasoning workloads
  • Chain-of-thought research on consumer hardware
  • Commercial deployments under MIT
  • Replacing closed reasoning APIs on a 4090

VRAM requirements by quantization

QuantizationVRAM required
Q4_K_M (recommended)19 GB
Q5_K_M23 GB
Q8_035 GB
FP16 (no quantization)64 GB

VRAM figures include model weights plus a typical 8k KV cache and ~600 MB runtime overhead (Ollama / llama.cpp baseline). Add headroom for higher context lengths.

Published benchmark scores

BenchmarkScore
AIME92.7

Scores published by the model author or aggregated from public leaderboards. Re-measured monthly by our editorial team.

Strengths

  • 92.7% on AIME, frontier-level math reasoning
  • Runs on a single RTX 4090 in Q4
  • MIT license with full commercial rights
  • Best consumer-GPU reasoner of its generation

Limitations

  • Verbose chain-of-thought inflates token costs
  • Specialized for reasoning, less polished for chat
  • Latency can spike on hard problems

Architecture & training

Architecture: Dense 32B · MIT · reasoner

Training: Successor to R1 and R1-Distill.

Verdict

The best open reasoning model that fits on a single consumer GPU.

Quick start

# HuggingFace : deepseek-ai/DeepSeek-R2 (pas encore de tag Ollama officiel)

Or use the open-source MCP server to query this model from Claude Desktop, Cursor, or any MCP-compatible client.

Tools

Is DeepSeek R2 32B the right pick for you?

Compute self-hosted ROI → Back to catalog