Model fiche
Kimi K2.6
By Moonshot AI · China
chat
general
vision
moe
Overview
Moonshot AI's April 2026 flagship: roughly 1T total parameters with 32B active, native multimodal, plus an agent-swarm mode coordinating up to 300 sub-agents.
When to pick this model
- Frontier-tier agentic workloads with parallel sub-agents
- Long-context analysis up to 256k tokens
- Multimodal pipelines needing top-end open quality
- API-driven applications where local hosting isn't required
VRAM requirements by quantization
| Quantization | VRAM required |
|---|---|
| Q4_K_M (recommended) | 600 GB |
| Q5_K_M | 720 GB |
| Q8_0 | 1080 GB |
| FP16 (no quantization) | 2000 GB |
VRAM figures include model weights plus a typical 8k KV cache and ~600 MB runtime overhead (Ollama / llama.cpp baseline). Add headroom for higher context lengths.
Strengths
- 1T total parameters with frontier-class performance
- Native 256k context window
- Unique 300-agent swarm coordination mode
- Multimodal across text and vision
Limitations
- Around 600 GB VRAM at Q4 — datacenter only
- API-first; local hosting is impractical for most teams
- Modified MIT terms need legal review
Architecture & training
Architecture: MoE · 1T total / ~32B active · Moonshot AI · 256k context
Training: Moonshot AI Kimi K2.6 — massive web corpus with long-context focus.
Verdict
A genuine frontier open-weight model, but you'll be consuming it via API unless you run a datacenter.
Quick start
# Multi-GPU data-center requis — API Moonshot recommandéeOr use the open-source MCP server to query this model from Claude Desktop, Cursor, or any MCP-compatible client.