BestLLMfor EN Your hardware. Your LLM. Your call.
APIOpen data Find my LLM
Model fiche

DeepSeek V3 671B

By DeepSeek · China

chat general moe
Parameters
671B
License
DeepSeek License
Context
125k
VRAM (Q4)
400 GB
Released
December 2024

Overview

DeepSeek's frontier-open MoE — 671B total, 37B active — with multi-head latent attention and an auxiliary-loss-free balancing scheme. The V3.1-Terminus update relicenses under MIT.

When to pick this model

  • You're running server-class inference and want frontier-open performance
  • You need a non-reasoning frontier model for general chat and code at scale
  • You want the MLA architecture's reduced KV-cache footprint
  • You can move to V3.1-Terminus for MIT licensing

VRAM requirements by quantization

QuantizationVRAM required
Q4_K_M (recommended)400 GB
Q5_K_M480 GB
Q8_0720 GB
FP16 (no quantization)1342 GB

VRAM figures include model weights plus a typical 8k KV cache and ~600 MB runtime overhead (Ollama / llama.cpp baseline). Add headroom for higher context lengths.

Strengths

  • Frontier-open performance in chat, code, and general tasks
  • MLA cuts KV memory significantly vs standard attention
  • V3.1-Terminus available under MIT
  • Pretrained on 14.8T tokens

Limitations

  • Original V3 uses the restrictive DeepSeek License
  • 400GB+ in Q4 — server-class hardware only
  • Overkill for most workloads under 10B requests/month

Architecture & training

Architecture: MoE 256 experts, 8 active · MLA · auxiliary-loss-free · FP8 training

Training: 14.8T tokens pre-training. V3.1-Terminus (Sep 2025) re-licensed MIT.

Verdict

Frontier-open performance for teams with serious inference infrastructure — go straight to V3.1-Terminus for the MIT license.

Quick start

ollama run deepseek-v3:671b

Or use the open-source MCP server to query this model from Claude Desktop, Cursor, or any MCP-compatible client.

Tools

Is DeepSeek V3 671B the right pick for you?

Compute self-hosted ROI → Back to catalog