Model fiche

MiMo V2.5

By Xiaomi · China

Updated 2026-07-13

chat vision audio moe multilingual

Parameters

310B

License

MIT

Context

976k

VRAM (Q4)

180 GB

Released

22 April 2026

Overview

Xiaomi's MIT-licensed omnimodal model: 310B MoE with 15B active params handling text, image, video, and audio. Scores 87.7 on Video-MME with 1M context. Released April 2026.

When to pick this model

Video understanding pipelines (Video-MME 87.7)
Unified text, image, video, and audio workflows
Million-token multimodal context tasks
MIT-licensed alternative to closed omnimodal APIs
Document and chart reasoning (CharXiv RQ 81.0)

VRAM requirements by quantization

Quantization	VRAM required
Q4_K_M (recommended)	180 GB
Q5_K_M	220 GB
Q8_0	330 GB
FP16 (no quantization)	620 GB

VRAM figures include model weights plus a typical 8k KV cache and ~600 MB runtime overhead (Ollama / llama.cpp baseline). Add headroom for higher context lengths.

In practice, MiMo V2.5 is server-class even at Q4_K_M (180 GB). Stepping up to Q8_0 nearly doubles the footprint to 330 GB, and unquantized FP16 weights take 620 GB — plan your GPU around the Q4 or Q5 figure unless you specifically need the higher fidelity.

Without a GPU, MiMo V2.5 needs roughly 230 GB of system RAM to run on CPU via llama.cpp or Ollama — workable for background jobs, but far slower than GPU inference. Throughput estimates from our compatibility engine: around 1 tokens/sec on entry-level GPUs, on the order of 5 tokens/sec on a mid-range card, and up to 15 tokens/sec on high-end hardware — assuming the chosen quantization fully fits in VRAM.

What hardware do you need

The table below matches MiMo V2.5 to common GPU memory tiers, using the highest-fidelity quantization that fully fits each card class. Spilling layers to system RAM works but costs most of the speed, so size your card to the quantization you actually want to run.

GPU memory	Example cards	Best fit for MiMo V2.5
8 GB	RTX 5070 Laptop, RTX 5060, RTX 5060 Ti 8GB	Does not fit — needs 180 GB at Q4_K_M
12 GB	RTX 5070, RTX 5070 Ti Laptop, RTX 4080 Laptop	Does not fit — needs 180 GB at Q4_K_M
16 GB	RTX 5080, RTX 4080 Super, Radeon RX 9070 XT	Does not fit — needs 180 GB at Q4_K_M
24 GB	RTX 4090, Radeon RX 7900 XTX, RTX 5090 Laptop	Does not fit — needs 180 GB at Q4_K_M
32 GB	RTX 5090	Does not fit — needs 180 GB at Q4_K_M

Which GPU should you buy to run MiMo V2.5?

To run MiMo V2.5 locally at Q4, you need ~180 GB of VRAM. The best value for this is a Apple Mac Studio (64+ GB unified memory).

Check Apple Mac Studio price on Amazon →

As an Amazon Associate, BestLLMfor earns from qualifying purchases, at no extra cost to you. It does not influence our independent rankings.

Published benchmark scores

Benchmark	Score
Video-MME	87.7
CharXiv RQ	81
MMMU-Pro	77.9

Scores published by the model author or aggregated from public leaderboards. Re-measured monthly by our editorial team.

Strengths

Omnimodal under MIT — text, image, video, audio
1M context window
87.7 Video-MME and 81.0 CharXiv RQ
Permissive MIT license at frontier scale
MoE design keeps active compute reasonable

Limitations

Around 180 GB VRAM in Q4
Video and audio inference pipelines are not yet standardized
No Ollama support

Typical workloads

In our catalog grid, MiMo V2.5 is filed under Local Omnimodal, Video+Audio, Long Multimodal — the use cases where its size/quality trade-off makes the most sense. Its tags translate to concrete workloads: vision-language work — screenshots, charts, scanned documents; audio understanding; multilingual workloads.

The 976k-token context window is large enough to hold entire codebases' worth of files or long reports in a single prompt, which is what makes local RAG and document analysis practical. The MIT license is permissive, so shipping it inside a commercial product raises no special legal questions.

Architecture & training

Architecture: MoE 310B/15B active · 48 layers (1 dense + 47 MoE) · 256 experts top-8 · ViT 729M + Audio 261M · MTP 329M · FP8

Training: ≈48T tokens · pipeline text pre-train → projector warmup → multimodal pre-train → agentic SFT → RL+MOPD.

Verdict

The first MIT-licensed model that genuinely handles video alongside everything else.

Quick start

# HuggingFace : XiaomiMiMo/MiMo-V2.5

Or use the open-source MCP server to query this model from Claude Desktop, Cursor, or any MCP-compatible client.

Similar models worth comparing

309B · MIT · q4 185 GB

MiMo V2 Flash

same family · needs 5 GB more VRAM at Q4

30B · Apache 2.0 · q4 19 GB

Qwen 3 Omni 30B-A3B

needs 161 GB less VRAM at Q4

7B · Apache 2.0 · q4 6 GB

Qwen 2.5 Omni 7B

needs 174 GB less VRAM at Q4

1020B · MIT · q4 595 GB

MiMo V2.5 Pro

same family · needs 415 GB more VRAM at Q4

314B · Apache 2.0 · q4 188 GB

Grok-1 (base)

needs 8 GB more VRAM at Q4

Frequently asked questions

How much VRAM does MiMo V2.5 need?

At the recommended Q4_K_M quantization, MiMo V2.5 needs about 180 GB of VRAM. Q8_0 takes 330 GB, and unquantized FP16 weights take 620 GB.

Can MiMo V2.5 run without a GPU?

Yes — with roughly 230 GB of system RAM it runs CPU-only through llama.cpp or Ollama. Expect a fraction of GPU speed, which is fine for background or batch jobs but slow for interactive chat.

What context window does MiMo V2.5 support?

MiMo V2.5 supports a 976k-token context window (1,000,000 tokens).

Can I use MiMo V2.5 commercially?

Yes. MiMo V2.5 is released under MIT, a permissive open-source license that allows commercial use, modification and redistribution.

How fast is MiMo V2.5 on consumer hardware?

Our compatibility engine estimates on the order of 5 tokens/sec on a mid-range GPU and up to 15 tokens/sec on high-end cards, assuming the quantization fully fits in VRAM.

Which quantization of MiMo V2.5 should I download first?

Start with Q4_K_M (180 GB) — the standard size/quality sweet spot. Step up to Q5_K_M or Q8_0 only if you have VRAM headroom. It does not fit a single 24 GB consumer card — plan for multi-GPU or server hardware.

Tools

Is MiMo V2.5 the right pick for you?

Compute self-hosted ROI → Back to catalog