Model fiche
DeepSeek V3.2
By DeepSeek · China
chat
general
moe
Overview
DeepSeek's 685B MoE featuring DeepSeek Sparse Attention for lower memory use. Holds an IMO gold-medal score and ranks #2 by volume on OpenRouter.
When to pick this model
- Frontier-class generalist tasks on a multi-GPU server
- Competition-level math and reasoning
- Replacing closed APIs with MIT-licensed weights
- High-volume production inference
- Long-context enterprise workloads
VRAM requirements by quantization
| Quantization | VRAM required |
|---|---|
| Q4_K_M (recommended) | 410 GB |
| Q5_K_M | 490 GB |
| Q8_0 | 735 GB |
| FP16 (no quantization) | 1370 GB |
VRAM figures include model weights plus a typical 8k KV cache and ~600 MB runtime overhead (Ollama / llama.cpp baseline). Add headroom for higher context lengths.
Strengths
- IMO gold-medal reasoning quality
- DeepSeek Sparse Attention reduces memory pressure
- MIT license
- #2 by usage volume on OpenRouter
Limitations
- 410GB+ in Q4 needs a serious multi-GPU server
- Sparse attention adds inference engine complexity
- Operational overhead is significant
Architecture & training
Architecture: MoE 685B/37B active · DeepSeek Sparse Attention · MIT
Training: V3 successor with DSA for reduced memory.
Verdict
A frontier-grade MIT-licensed MoE if you can run a multi-GPU cluster.
Quick start
# HuggingFace : deepseek-ai/DeepSeek-V3.2 (alternative locale : ollama run deepseek-v3:671b)Or use the open-source MCP server to query this model from Claude Desktop, Cursor, or any MCP-compatible client.