Mixtral 8x7B
By Mistral AI · France
Overview
The Mistral AI MoE that popularized open-weight sparse models. Eight 7B experts deliver 47B-class output, but you pay 47B-class VRAM costs.
When to pick this model
- Inference servers with ample VRAM where speed-per-quality matters
- Workloads needing strong multilingual and coding performance
- Apache 2.0 commercial deployments
- Comparative benchmarks against newer dense models
- Fine-tuning research on a well-documented MoE architecture
VRAM requirements by quantization
| Quantization | VRAM required |
|---|---|
| Q4_K_M (recommended) | 26 GB |
| Q5_K_M | 32 GB |
| Q8_0 | 50 GB |
| FP16 (no quantization) | 94 GB |
VRAM figures include model weights plus a typical 8k KV cache and ~600 MB runtime overhead (Ollama / llama.cpp baseline). Add headroom for higher context lengths.
Published benchmark scores
| Benchmark | Score |
|---|---|
| MMLU | 70.6 |
| HumanEval | 40.2 |
| HellaSwag | 86.7 |
Scores published by the model author or aggregated from public leaderboards. Re-measured monthly by our editorial team.
Strengths
- Quality well above dense models of equivalent active params
- Strong coding and multilingual performance
- Apache 2.0 license
- Battle-tested in production stacks
Limitations
- Roughly 26GB VRAM at Q4 — same footprint as a dense 47B
- Eclipsed by Qwen 3 and Llama 3.3 in 2025 benchmarks
- 32k context now feels limiting
- Knowledge cutoff predates current tooling
Architecture & training
Architecture: MoE 8×7B · 32 experts, 2 active per token · 47B total / 13B active
Training: Mistral AI multilingual corpus. First popular large open-weight MoE.
A historically important MoE, now a second-tier choice — newer dense 24-32B models match it for less VRAM.
Quick start
ollama run mixtral:8x7bOr use the open-source MCP server to query this model from Claude Desktop, Cursor, or any MCP-compatible client.