MiMo V2.5 Pro
By Xiaomi · China
Overview
Xiaomi's MIT-licensed frontier agentic model: 1.02T MoE with 42B active params, 57.2% on SWE-Bench Pro, 1M context, and 6:1 hybrid attention. Released April 2026.
When to pick this model
- Frontier autonomous coding agents at MIT licensing
- Workflows chaining 1,000+ tool calls in a single session
- Million-token codebase reasoning
- Research on multi-teacher distillation outcomes
- Datacenter deployments seeking the open agentic ceiling
VRAM requirements by quantization
| Quantization | VRAM required |
|---|---|
| Q4_K_M (recommended) | 595 GB |
| Q5_K_M | 720 GB |
| Q8_0 | 1090 GB |
| FP16 (no quantization) | 2040 GB |
VRAM figures include model weights plus a typical 8k KV cache and ~600 MB runtime overhead (Ollama / llama.cpp baseline). Add headroom for higher context lengths.
Published benchmark scores
| Benchmark | Score |
|---|---|
| SWE-Bench Pro | 57.2 |
| Claw-Eval | 63.8 |
| τ3-Bench | 72.9 |
Scores published by the model author or aggregated from public leaderboards. Re-measured monthly by our editorial team.
Strengths
- MIT license at frontier agentic scale
- 1M context window
- Supports 1,000+ tool calls per chain
- 57.2% on SWE-Bench Pro
- Hybrid 6:1 attention cuts KV-cache by 7x vs. full attention
Limitations
- Roughly 600 GB VRAM in Q4 — datacenter only
- No official Ollama quantization
- MTP support is uneven across inference engines
Architecture & training
Architecture: MoE 1.02T/42B active · 70 layers (1 dense + 69 MoE) · 384 experts top-8 · hybrid SWA/GA 6:1 · MTP 3 layers · FP8 E4M3
Training: Three-stage post-training: SFT → domain-specialized RL (math, safety, agentic) → Multi-Teacher On-Policy Distillation.
The open agentic frontier — MIT, million-token, thousand-call — if you have the silicon to run it.
Quick start
# HuggingFace : XiaomiMiMo/MiMo-V2.5-ProOr use the open-source MCP server to query this model from Claude Desktop, Cursor, or any MCP-compatible client.