Model fiche
DeepSeek V4 Flash 284B
By DeepSeek · China
chat
general
reasoning
moe
multilingual
Overview
DeepSeek V4's efficient sibling: 284B MoE with 13B active params, MIT-licensed, 1M context, and the same three-mode reasoning stack. Frontier-adjacent quality at a fraction of the inference cost.
When to pick this model
- Frontier-class reasoning at single-server scale
- Million-token context analysis without datacenter budgets
- MIT-licensed alternatives to V4 Pro
- Workloads choosing between Base and Instruct variants
- Cost-sensitive deployments needing three thinking modes
VRAM requirements by quantization
| Quantization | VRAM required |
|---|---|
| Q4_K_M (recommended) | 170 GB |
| Q5_K_M | 205 GB |
| Q8_0 | 305 GB |
| FP16 (no quantization) | 568 GB |
VRAM figures include model weights plus a typical 8k KV cache and ~600 MB runtime overhead (Ollama / llama.cpp baseline). Add headroom for higher context lengths.
Strengths
- MIT license
- 1M context window
- Only 13B active params — fast for its total size
- Three thinking modes inherited from V4 Pro
- Base and Instruct variants available
Limitations
- Around 170 GB VRAM in Q4 — still multi-GPU
- Official community quantizations were lagging at launch
- Quality trails V4 Pro on the hardest reasoning tasks
Architecture & training
Architecture: MoE 284B/13B active · CSA+HCA hybrid · mHC · Muon · mixed FP4+FP8
Training: Targets "efficient reasoning" at reduced cost vs V4 Pro.
Verdict
The efficient way into the V4 family — MIT, 1M context, and inference cost that won't bankrupt you.
Quick start
# HuggingFace : deepseek-ai/DeepSeek-V4-Flash (GGUF communautaire en cours)Or use the open-source MCP server to query this model from Claude Desktop, Cursor, or any MCP-compatible client.