BestLLMfor EN Your hardware. Your LLM. Your call.
APIOpen data Find my LLM
Model fiche

Tencent Hy3 Preview 295B

By Tencent · China

chat general reasoning moe
Parameters
295B
License
Tencent Hunyuan License
Context
250k
VRAM (Q4)
177 GB
Released
April 2026

Overview

Tencent's frontier preview: 295B MoE with 21B active params plus a 3.8B MTP module, 80 layers, top-8 of 192 experts, with fused fast/slow thinking. Released April 2026 under the custom Hunyuan license.

When to pick this model

  • Research on fused fast/slow-thinking architectures
  • Long-context workloads up to 256k tokens
  • Base or Instruct fine-tuning at frontier scale
  • Deployments where Tencent's Hunyuan license is acceptable
  • Comparing Chinese hyperscaler frontier weights

VRAM requirements by quantization

QuantizationVRAM required
Q4_K_M (recommended)177 GB
Q5_K_M210 GB
Q8_0315 GB
FP16 (no quantization)590 GB

VRAM figures include model weights plus a typical 8k KV cache and ~600 MB runtime overhead (Ollama / llama.cpp baseline). Add headroom for higher context lengths.

Strengths

  • Tencent's first frontier-scale open-weight release
  • 256k context window
  • Both Base and Instruct variants shipped
  • MTP module accelerates long-form generation
  • Fused fast/slow thinking in one model

Limitations

  • Custom Tencent Hunyuan Community License — legal review required
  • Around 177 GB VRAM in Q4
  • No Ollama support at launch
  • Preview status means rough edges in tooling

Architecture & training

Architecture: MoE 295B/21B active · 80 layers + 1 MTP layer · 192 experts top-8 · GQA 64Q/8KV · BF16

Training: Fused fast/slow-thinking.

Verdict

A serious frontier preview from Tencent, held back from broader adoption by its custom license.

Quick start

# HuggingFace : tencent/Hy3-preview

Or use the open-source MCP server to query this model from Claude Desktop, Cursor, or any MCP-compatible client.

Tools

Is Tencent Hy3 Preview 295B the right pick for you?

Compute self-hosted ROI → Back to catalog