Model fiche
Kimi K2.5
By Moonshot AI · China
chat
general
moe
Overview
Moonshot AI's 1-trillion-parameter MoE with 32B active parameters and a multimodal agent-swarm mode. Around 595GB on disk, aimed at serious home labs and small clusters.
When to pick this model
- Multi-agent orchestration with swarm-mode coordination
- Frontier-scale local inference on a home lab cluster
- Long-context multimodal workflows up to 256K tokens
- Research into trillion-parameter models
- Replacing closed APIs at the high end
VRAM requirements by quantization
| Quantization | VRAM required |
|---|---|
| Q4_K_M (recommended) | 600 GB |
| Q5_K_M | 720 GB |
| Q8_0 | 1080 GB |
| FP16 (no quantization) | 2000 GB |
VRAM figures include model weights plus a typical 8k KV cache and ~600 MB runtime overhead (Ollama / llama.cpp baseline). Add headroom for higher context lengths.
Strengths
- Genuine 1-trillion-parameter open-weight model
- Built-in agent swarm coordination mode
- 256K context with multimodal input
- Only 32B active parameters per token
Limitations
- ~600GB in Q4 demands a small cluster
- Modified MIT license needs legal review for commercial use
- Operational complexity is extreme
- Power and cooling budget rules out most home setups
Architecture & training
Architecture: MoE 1T/32B active · multimodal · 'agent swarm' mode · 256k ctx
Training: The largest practical open-weight model.
Verdict
The largest practical open-weight model in 2026, for teams that can host it.
Quick start
# HuggingFace : moonshotai/Kimi-K2.5Or use the open-source MCP server to query this model from Claude Desktop, Cursor, or any MCP-compatible client.