BestLLMfor EN Your hardware. Your LLM. Your call.
APIOpen data Find my LLM
Model fiche

DeepSeek V3.2

By DeepSeek · China

chat general moe
Parameters
685B
License
MIT
Context
125k
VRAM (Q4)
410 GB
Released
December 2025

Overview

DeepSeek's 685B MoE featuring DeepSeek Sparse Attention for lower memory use. Holds an IMO gold-medal score and ranks #2 by volume on OpenRouter.

When to pick this model

  • Frontier-class generalist tasks on a multi-GPU server
  • Competition-level math and reasoning
  • Replacing closed APIs with MIT-licensed weights
  • High-volume production inference
  • Long-context enterprise workloads

VRAM requirements by quantization

QuantizationVRAM required
Q4_K_M (recommended)410 GB
Q5_K_M490 GB
Q8_0735 GB
FP16 (no quantization)1370 GB

VRAM figures include model weights plus a typical 8k KV cache and ~600 MB runtime overhead (Ollama / llama.cpp baseline). Add headroom for higher context lengths.

Strengths

  • IMO gold-medal reasoning quality
  • DeepSeek Sparse Attention reduces memory pressure
  • MIT license
  • #2 by usage volume on OpenRouter

Limitations

  • 410GB+ in Q4 needs a serious multi-GPU server
  • Sparse attention adds inference engine complexity
  • Operational overhead is significant

Architecture & training

Architecture: MoE 685B/37B active · DeepSeek Sparse Attention · MIT

Training: V3 successor with DSA for reduced memory.

Verdict

A frontier-grade MIT-licensed MoE if you can run a multi-GPU cluster.

Quick start

# HuggingFace : deepseek-ai/DeepSeek-V3.2 (alternative locale : ollama run deepseek-v3:671b)

Or use the open-source MCP server to query this model from Claude Desktop, Cursor, or any MCP-compatible client.

Tools

Is DeepSeek V3.2 the right pick for you?

Compute self-hosted ROI → Back to catalog