BestLLMfor EN Your hardware. Your LLM. Your call.
APIOpen data Find my LLM
Model fiche

Qwen 3.5 122B-A10B

By Alibaba · China

chat general reasoning multilingual moe
Parameters
122B
License
Apache 2.0
Context
255k
VRAM (Q4)
73 GB
Released
April 2025

Overview

Alibaba's mid-flagship Qwen 3.5 with 122B total / 10B active params and 262k native context. Frontier-class quality that fits on a single H100.

When to pick this model

  • Frontier-quality inference on a single H100
  • Long-context document and codebase analysis (262k)
  • Multilingual reasoning workloads
  • Apache 2.0 deployments where Qwen 397B is overkill
  • Cost-sensitive agentic systems needing top-tier quality

VRAM requirements by quantization

QuantizationVRAM required
Q4_K_M (recommended)73 GB
Q5_K_M88 GB
Q8_0131 GB
FP16 (no quantization)244 GB

VRAM figures include model weights plus a typical 8k KV cache and ~600 MB runtime overhead (Ollama / llama.cpp baseline). Add headroom for higher context lengths.

Strengths

  • Frontier-class quality with only 10B active params
  • 262k native context window
  • Apache 2.0
  • Single-H100 deployment is realistic
  • Strong multilingual coverage

Limitations

  • Roughly 73 GB VRAM in Q4 — still needs multi-GPU on consumer cards
  • Mid-flagship positioning means it's eclipsed by 397B on the hardest tasks

Architecture & training

Architecture: MoE · 122B total / 10B active · Qwen 3.5 flagship · 262k context

Training: Qwen 3.5 accessible flagship — 10B active out of 122B, native 262k ctx.

Verdict

The sweet spot of the Qwen 3.5 lineup: H100-friendly with frontier-grade output.

Quick start

ollama run qwen3.5:122b-a10b

Or use the open-source MCP server to query this model from Claude Desktop, Cursor, or any MCP-compatible client.

Tools

Is Qwen 3.5 122B-A10B the right pick for you?

Compute self-hosted ROI → Back to catalog