Mistral Medium 3.5 128B
By Mistral AI · France
Overview
Mistral AI's first merged flagship — a dense 128B with vision, 256k context, and configurable reasoning. Hits 77.6% on SWE-Bench Verified, consolidating Medium 3.1, Magistral, and Devstral 2 into one model.
When to pick this model
- Agentic coding workflows demanding state-of-the-art SWE-Bench performance
- Customer-support automation needing top τ³-Telecom scores
- Long-document analysis up to 256k tokens
- Multilingual vision tasks across 24 languages
- Production deployments wanting reasoning toggleable per request
VRAM requirements by quantization
| Quantization | VRAM required |
|---|---|
| Q4_K_M (recommended) | 74 GB |
| Q5_K_M | 91 GB |
| Q8_0 | 137 GB |
| FP16 (no quantization) | 256 GB |
VRAM figures include model weights plus a typical 8k KV cache and ~600 MB runtime overhead (Ollama / llama.cpp baseline). Add headroom for higher context lengths.
Published benchmark scores
| Benchmark | Score |
|---|---|
| SWE-Bench Verified | 77.6 |
| τ³-Telecom | 91.4 |
Scores published by the model author or aggregated from public leaderboards. Re-measured monthly by our editorial team.
Strengths
- SWE-Bench Verified 77.6% — best-in-class for open weights
- τ³-Telecom 91.4% for tool-using agents
- 256k context with strong long-context retention
- Vision-enabled and multilingual across 24 languages
- Modified MIT — permissive for most commercial use
Limitations
- ~74GB at Q4 — needs a 4-GPU box for comfortable serving
- Revenue clause kicks in for large enterprises
- Single-model consolidation means no separate specialized variants
Architecture & training
Architecture: Dense 128B · vision encoder · 256k ctx · configurable reasoning · integrated EAGLE draft head
Training: First merged Mistral flagship: replaces Medium 3.1, Magistral and Devstral 2 in Le Chat / Vibe.
The first Mistral flagship that bundles coding, reasoning, and vision into one model — and it's competitive on every axis.
Quick start
ollama run mistral-medium-3.5:128bOr use the open-source MCP server to query this model from Claude Desktop, Cursor, or any MCP-compatible client.