Model fiche
Magistral Small 24B
By Mistral AI · France
reasoning
fr
Overview
Mistral AI's first open reasoning model, built on Small 3.1 with RL-trained chain-of-thought. Hits 70.7% on AIME24 under Apache 2.0.
When to pick this model
- Math, science, and competition-style problem solving on local hardware
- Transparent reasoning where visible CoT helps debugging
- Reasoning workloads requiring a permissively licensed alternative to DeepSeek R1
- Multi-step planning agents with reasoning budgets under 40k tokens
VRAM requirements by quantization
| Quantization | VRAM required |
|---|---|
| Q4_K_M (recommended) | 14 GB |
| Q5_K_M | 17 GB |
| Q8_0 | 26 GB |
| FP16 (no quantization) | 48 GB |
VRAM figures include model weights plus a typical 8k KV cache and ~600 MB runtime overhead (Ollama / llama.cpp baseline). Add headroom for higher context lengths.
Published benchmark scores
| Benchmark | Score |
|---|---|
| AIME 2024 | 70.7 |
| MATH-500 | 90 |
Scores published by the model author or aggregated from public leaderboards. Re-measured monthly by our editorial team.
Strengths
- First open Mistral reasoner with a real RL training pipeline
- AIME24 70.7% — competitive with much larger reasoners
- Apache 2.0 license
- Runs on a single 24GB GPU at Q4
Limitations
- Highly verbose in thinking mode — token costs add up
- Recommended effective context capped around 40k
- Trails DeepSeek R1 distills on hardest math benchmarks
Architecture & training
Architecture: Dense 24B · CoT reasoning · Small 3.1 base
Training: RL on reasoning.
Verdict
Mistral's first credible reasoning model — solid math chops under Apache 2.0, if you can stomach the verbose CoT.
Quick start
ollama run magistral:24bOr use the open-source MCP server to query this model from Claude Desktop, Cursor, or any MCP-compatible client.