Model fiche
Mistral Small 4
By Mistral AI · France
chat
general
code
vision
reasoning
multilingual
fr
moe
Overview
Mistral AI's 2026 flagship MoE with 119B total and 6.5B active parameters, unifying chat, reasoning, vision, and code in a single Apache 2.0 model.
When to pick this model
- Consolidating multiple Mistral deployments into one model
- Vision plus reasoning workloads on a prosumer rig
- Long-context analysis up to 256K tokens
- European-data-sovereignty deployments
- Apache-licensed commercial products
VRAM requirements by quantization
| Quantization | VRAM required |
|---|---|
| Q4_K_M (recommended) | 72 GB |
| Q5_K_M | 86 GB |
| Q8_0 | 128 GB |
| FP16 (no quantization) | 238 GB |
VRAM figures include model weights plus a typical 8k KV cache and ~600 MB runtime overhead (Ollama / llama.cpp baseline). Add headroom for higher context lengths.
Strengths
- Unifies chat, reasoning, vision, and code in one model
- Only 6.5B active parameters for fast inference
- 256K context window
- Apache 2.0 license
- European lab with strong French and EU-language support
Limitations
- 72GB+ in Q4 requires a prosumer multi-GPU setup
- Breaks continuity with the Small 3.x line
- Newer release means thinner ecosystem
Architecture & training
Architecture: MoE 119B/6.5B active · 256k ctx · unifies instruct+reasoning+vision+code
Training: Replaces Small 3.x and Pixtral in a single model.
Verdict
Mistral's most ambitious open release yet, ideal if you want one model covering four product lines.
Quick start
# HuggingFace : mistralai/Mistral-Small-4 (pas encore de tag Ollama officiel)Or use the open-source MCP server to query this model from Claude Desktop, Cursor, or any MCP-compatible client.