Model fiche
Qwen 3.5 122B-A10B
By Alibaba · China
chat
general
reasoning
multilingual
moe
Overview
Alibaba's mid-flagship Qwen 3.5 with 122B total / 10B active params and 262k native context. Frontier-class quality that fits on a single H100.
When to pick this model
- Frontier-quality inference on a single H100
- Long-context document and codebase analysis (262k)
- Multilingual reasoning workloads
- Apache 2.0 deployments where Qwen 397B is overkill
- Cost-sensitive agentic systems needing top-tier quality
VRAM requirements by quantization
| Quantization | VRAM required |
|---|---|
| Q4_K_M (recommended) | 73 GB |
| Q5_K_M | 88 GB |
| Q8_0 | 131 GB |
| FP16 (no quantization) | 244 GB |
VRAM figures include model weights plus a typical 8k KV cache and ~600 MB runtime overhead (Ollama / llama.cpp baseline). Add headroom for higher context lengths.
Strengths
- Frontier-class quality with only 10B active params
- 262k native context window
- Apache 2.0
- Single-H100 deployment is realistic
- Strong multilingual coverage
Limitations
- Roughly 73 GB VRAM in Q4 — still needs multi-GPU on consumer cards
- Mid-flagship positioning means it's eclipsed by 397B on the hardest tasks
Architecture & training
Architecture: MoE · 122B total / 10B active · Qwen 3.5 flagship · 262k context
Training: Qwen 3.5 accessible flagship — 10B active out of 122B, native 262k ctx.
Verdict
The sweet spot of the Qwen 3.5 lineup: H100-friendly with frontier-grade output.
Quick start
ollama run qwen3.5:122b-a10bOr use the open-source MCP server to query this model from Claude Desktop, Cursor, or any MCP-compatible client.