Model fiche
Hunyuan Large 2.0
By Tencent · China
chat
general
reasoning
moe
Overview
Tencent's 406B flagship MoE with 32B active parameters and 256k context. Strong on Chinese and English, but gated by the custom Tencent Hunyuan license.
When to pick this model
- Frontier Chinese-language production workloads
- Long-context RAG and document analysis up to 256k tokens
- Bilingual EN/CN enterprise deployments
- Use cases compatible with the Tencent Hunyuan license
VRAM requirements by quantization
| Quantization | VRAM required |
|---|---|
| Q4_K_M (recommended) | 245 GB |
| Q5_K_M | 290 GB |
| Q8_0 | 435 GB |
| FP16 (no quantization) | 810 GB |
VRAM figures include model weights plus a typical 8k KV cache and ~600 MB runtime overhead (Ollama / llama.cpp baseline). Add headroom for higher context lengths.
Strengths
- 262k native context window
- Top-tier Chinese-language performance
- Efficient inference relative to 406B total size
- Strong on RAG and long-document tasks
Limitations
- Around 245 GB VRAM at Q4 — heavy infrastructure needed
- Custom Tencent license requires careful legal review
- Limited adoption outside Chinese-speaking markets
Architecture & training
Architecture: MoE · 406B total / 52B active · Tencent HunyuanLLM · 262k ctx
Training: Tencent — strong in Chinese and English, RAG and long document.
Verdict
Frontier-class bilingual long-context performance, but licensing and infrastructure demands narrow its practical audience.
Quick start
# Infrastructure Tencent — non disponible en local standardOr use the open-source MCP server to query this model from Claude Desktop, Cursor, or any MCP-compatible client.