Model fiche
Tencent Hy3 Preview 295B
By Tencent · China
chat
general
reasoning
moe
Overview
Tencent's frontier preview: 295B MoE with 21B active params plus a 3.8B MTP module, 80 layers, top-8 of 192 experts, with fused fast/slow thinking. Released April 2026 under the custom Hunyuan license.
When to pick this model
- Research on fused fast/slow-thinking architectures
- Long-context workloads up to 256k tokens
- Base or Instruct fine-tuning at frontier scale
- Deployments where Tencent's Hunyuan license is acceptable
- Comparing Chinese hyperscaler frontier weights
VRAM requirements by quantization
| Quantization | VRAM required |
|---|---|
| Q4_K_M (recommended) | 177 GB |
| Q5_K_M | 210 GB |
| Q8_0 | 315 GB |
| FP16 (no quantization) | 590 GB |
VRAM figures include model weights plus a typical 8k KV cache and ~600 MB runtime overhead (Ollama / llama.cpp baseline). Add headroom for higher context lengths.
Strengths
- Tencent's first frontier-scale open-weight release
- 256k context window
- Both Base and Instruct variants shipped
- MTP module accelerates long-form generation
- Fused fast/slow thinking in one model
Limitations
- Custom Tencent Hunyuan Community License — legal review required
- Around 177 GB VRAM in Q4
- No Ollama support at launch
- Preview status means rough edges in tooling
Architecture & training
Architecture: MoE 295B/21B active · 80 layers + 1 MTP layer · 192 experts top-8 · GQA 64Q/8KV · BF16
Training: Fused fast/slow-thinking.
Verdict
A serious frontier preview from Tencent, held back from broader adoption by its custom license.
Quick start
# HuggingFace : tencent/Hy3-previewOr use the open-source MCP server to query this model from Claude Desktop, Cursor, or any MCP-compatible client.