BestLLMfor EN Your hardware. Your LLM. Your call.
APIOpen data Find my LLM
Model fiche

DeepSeek V4 Flash 284B

By DeepSeek · China

chat general reasoning moe multilingual
Parameters
284B
License
MIT
Context
976k
VRAM (Q4)
170 GB
Released
April 2026

Overview

DeepSeek V4's efficient sibling: 284B MoE with 13B active params, MIT-licensed, 1M context, and the same three-mode reasoning stack. Frontier-adjacent quality at a fraction of the inference cost.

When to pick this model

  • Frontier-class reasoning at single-server scale
  • Million-token context analysis without datacenter budgets
  • MIT-licensed alternatives to V4 Pro
  • Workloads choosing between Base and Instruct variants
  • Cost-sensitive deployments needing three thinking modes

VRAM requirements by quantization

QuantizationVRAM required
Q4_K_M (recommended)170 GB
Q5_K_M205 GB
Q8_0305 GB
FP16 (no quantization)568 GB

VRAM figures include model weights plus a typical 8k KV cache and ~600 MB runtime overhead (Ollama / llama.cpp baseline). Add headroom for higher context lengths.

Strengths

  • MIT license
  • 1M context window
  • Only 13B active params — fast for its total size
  • Three thinking modes inherited from V4 Pro
  • Base and Instruct variants available

Limitations

  • Around 170 GB VRAM in Q4 — still multi-GPU
  • Official community quantizations were lagging at launch
  • Quality trails V4 Pro on the hardest reasoning tasks

Architecture & training

Architecture: MoE 284B/13B active · CSA+HCA hybrid · mHC · Muon · mixed FP4+FP8

Training: Targets "efficient reasoning" at reduced cost vs V4 Pro.

Verdict

The efficient way into the V4 family — MIT, 1M context, and inference cost that won't bankrupt you.

Quick start

# HuggingFace : deepseek-ai/DeepSeek-V4-Flash (GGUF communautaire en cours)

Or use the open-source MCP server to query this model from Claude Desktop, Cursor, or any MCP-compatible client.

Tools

Is DeepSeek V4 Flash 284B the right pick for you?

Compute self-hosted ROI → Back to catalog