Model fiche
Qwen 3.5 397B-A17B
By Alibaba · China
chat
general
reasoning
multilingual
moe
Overview
Alibaba's flagship MoE with 397B total and 17B active parameters, ranked #5 open-weight on Artificial Analysis. Apache 2.0 with a 262K context.
When to pick this model
- Top-tier open-weight performance on a multi-GPU server
- Long-context enterprise workloads
- Replacing closed frontier models with self-hosted weights
- Commercial deployments needing Apache licensing
VRAM requirements by quantization
| Quantization | VRAM required |
|---|---|
| Q4_K_M (recommended) | 240 GB |
| Q5_K_M | 285 GB |
| Q8_0 | 425 GB |
| FP16 (no quantization) | 794 GB |
VRAM figures include model weights plus a typical 8k KV cache and ~600 MB runtime overhead (Ollama / llama.cpp baseline). Add headroom for higher context lengths.
Strengths
- #5 on Artificial Analysis's open leaderboard
- 262K context window
- Only 17B active parameters keeps inference efficient
- Apache 2.0 license
Limitations
- 240GB+ in Q4 demands a multi-GPU server
- MoE deployment adds operational complexity
- Beaten by GLM-5.1 and MiniMax-M2.7 on key benchmarks
Architecture & training
Architecture: MoE 397B/17B active · 262k ctx · hybrid thinking
Training: New flagship of the Qwen 3.5 family.
Verdict
A strong flagship MoE with permissive licensing, though no longer the top of the open leaderboard.
Quick start
# HuggingFace : Qwen/Qwen3.5-397B-A17B (alternative locale plus accessible : ollama run qwen3.5:122b)Or use the open-source MCP server to query this model from Claude Desktop, Cursor, or any MCP-compatible client.