BestLLMfor EN Your hardware. Your LLM. Your call.
APIOpen data Find my LLM
Model fiche

Kimi K2.5

By Moonshot AI · China

chat general moe
Parameters
1000B
License
Modified MIT
Context
250k
VRAM (Q4)
600 GB
Released
January 2026

Overview

Moonshot AI's 1-trillion-parameter MoE with 32B active parameters and a multimodal agent-swarm mode. Around 595GB on disk, aimed at serious home labs and small clusters.

When to pick this model

  • Multi-agent orchestration with swarm-mode coordination
  • Frontier-scale local inference on a home lab cluster
  • Long-context multimodal workflows up to 256K tokens
  • Research into trillion-parameter models
  • Replacing closed APIs at the high end

VRAM requirements by quantization

QuantizationVRAM required
Q4_K_M (recommended)600 GB
Q5_K_M720 GB
Q8_01080 GB
FP16 (no quantization)2000 GB

VRAM figures include model weights plus a typical 8k KV cache and ~600 MB runtime overhead (Ollama / llama.cpp baseline). Add headroom for higher context lengths.

Strengths

  • Genuine 1-trillion-parameter open-weight model
  • Built-in agent swarm coordination mode
  • 256K context with multimodal input
  • Only 32B active parameters per token

Limitations

  • ~600GB in Q4 demands a small cluster
  • Modified MIT license needs legal review for commercial use
  • Operational complexity is extreme
  • Power and cooling budget rules out most home setups

Architecture & training

Architecture: MoE 1T/32B active · multimodal · 'agent swarm' mode · 256k ctx

Training: The largest practical open-weight model.

Verdict

The largest practical open-weight model in 2026, for teams that can host it.

Quick start

# HuggingFace : moonshotai/Kimi-K2.5

Or use the open-source MCP server to query this model from Claude Desktop, Cursor, or any MCP-compatible client.

Tools

Is Kimi K2.5 the right pick for you?

Compute self-hosted ROI → Back to catalog