Model fiche
Pangu Pro MoE 72B
By Huawei · China
chat
general
moe
Overview
Huawei's first open-weight release, a 72B MoE optimized for Ascend silicon. Strong on enterprise code and Chinese business scenarios, but the custom Pangu license needs careful review.
When to pick this model
- Deployments already running on Huawei Ascend hardware
- Enterprise code and business workflows in Chinese markets
- Research on non-NVIDIA training and inference stacks
- Workloads where Huawei's ecosystem integration matters
VRAM requirements by quantization
| Quantization | VRAM required |
|---|---|
| Q4_K_M (recommended) | 42 GB |
| Q5_K_M | 50 GB |
| Q8_0 | 78 GB |
| FP16 (no quantization) | 144 GB |
VRAM figures include model weights plus a typical 8k KV cache and ~600 MB runtime overhead (Ollama / llama.cpp baseline). Add headroom for higher context lengths.
Strengths
- First-class optimization for Ascend NPUs
- Solid enterprise code and business reasoning
- Open weights from a major hyperscaler
- MoE design keeps inference tractable
Limitations
- Around 42 GB VRAM in Q4
- 32k context trails modern flagships
- Custom Pangu license requires legal review
- Tooling outside Huawei's stack is thin
Architecture & training
Architecture: MoE · 72B · Huawei PanGu Pro · proprietary architecture
Training: Huawei — specialized in enterprise code and CN business scenarios.
Verdict
A reasonable pick if you're on Ascend; on NVIDIA hardware, Qwen 3.5 or DeepSeek will serve you better.
Quick start
ollama pull hf.co/huawei/pangu-pro-moe-72b-GGUFOr use the open-source MCP server to query this model from Claude Desktop, Cursor, or any MCP-compatible client.