Model fiche
Hunyuan-A13B Instruct
By Tencent · China
chat
general
reasoning
moe
Overview
Tencent's fine-grained MoE activating 13B of 80B parameters, with dual fast/slow thinking modes and a 256k context. Released under Tencent's custom Hunyuan license.
When to pick this model
- Reasoning-heavy tasks needing toggleable thinking modes
- Long-context analysis up to 256k tokens
- Cost-sensitive deployment of a frontier-class MoE
- Chinese-language production workloads
VRAM requirements by quantization
| Quantization | VRAM required |
|---|---|
| Q4_K_M (recommended) | 48 GB |
| Q5_K_M | 57 GB |
| Q8_0 | 85 GB |
| FP16 (no quantization) | 160 GB |
VRAM figures include model weights plus a typical 8k KV cache and ~600 MB runtime overhead (Ollama / llama.cpp baseline). Add headroom for higher context lengths.
Strengths
- Competitive with o1 and DeepSeek on mainstream benchmarks
- Native 256k context
- Dual fast/slow thinking for latency-quality tradeoffs
- Only 13B active parameters keeps inference cheap
Limitations
- Tencent Hunyuan license has commercial restrictions
- No official Ollama distribution
- Tooling support trails Qwen and Llama
Architecture & training
Architecture: Fine-grained MoE · 80B/13B active · dual fast/slow thinking
Training: 256k native ctx.
Verdict
Frontier-tier MoE reasoning at a manageable active-parameter count, held back mainly by the custom Tencent license.
Quick start
# HuggingFace : tencent/Hunyuan-A13B-InstructOr use the open-source MCP server to query this model from Claude Desktop, Cursor, or any MCP-compatible client.