Model fiche
Mistral Small 3.2 24B
By Mistral AI · France
chat
general
vision
multilingual
fr
Overview
Mistral AI's June 2025 refresh of Small 3.1: a 24B Apache 2.0 dense model with vision input, sharper function calling, and roughly half the rate of runaway generations seen in 3.1.
When to pick this model
- Self-hosted multilingual chat assistant on a single 24GB GPU
- Agentic workflows that need reliable tool calls without paying for a frontier model
- OCR and document Q&A pipelines combining text and screenshots
- European deployments needing strong French, German, and Spanish coverage
- Drop-in upgrade for existing Small 3.1 deployments
VRAM requirements by quantization
| Quantization | VRAM required |
|---|---|
| Q4_K_M (recommended) | 14 GB |
| Q5_K_M | 17 GB |
| Q8_0 | 26 GB |
| FP16 (no quantization) | 48 GB |
VRAM figures include model weights plus a typical 8k KV cache and ~600 MB runtime overhead (Ollama / llama.cpp baseline). Add headroom for higher context lengths.
Strengths
- Roughly 50% fewer infinite-generation loops than 3.1
- Notably improved function calling and structured output reliability
- Vision encoder included for multimodal tasks
- Apache 2.0 — unrestricted commercial use
- Fits comfortably on a single 24GB consumer GPU at Q4
Limitations
- Requires a recent Ollama build for full chat-template support
- Still trails frontier models on hard reasoning benchmarks
- Vision quality lags dedicated VLMs like Qwen2.5-VL
Architecture & training
Architecture: Dense 24B · vision · Tekken tokenizer
Training: Minor update to Small 3.1 (2503).
Verdict
The pragmatic choice for self-hosted multilingual chat and tool-using agents on a single GPU — and a no-brainer upgrade from Small 3.1.
Quick start
ollama run mistral-small3.2:24bOr use the open-source MCP server to query this model from Claude Desktop, Cursor, or any MCP-compatible client.