BestLLMfor EN Your hardware. Your LLM. Your call.
APIOpen data Find my LLM
Model fiche

Qwen 3.5 397B-A17B

By Alibaba · China

chat general reasoning multilingual moe
Parameters
397B
License
Apache 2.0
Context
255k
VRAM (Q4)
240 GB
Released
February 2026

Overview

Alibaba's flagship MoE with 397B total and 17B active parameters, ranked #5 open-weight on Artificial Analysis. Apache 2.0 with a 262K context.

When to pick this model

  • Top-tier open-weight performance on a multi-GPU server
  • Long-context enterprise workloads
  • Replacing closed frontier models with self-hosted weights
  • Commercial deployments needing Apache licensing

VRAM requirements by quantization

QuantizationVRAM required
Q4_K_M (recommended)240 GB
Q5_K_M285 GB
Q8_0425 GB
FP16 (no quantization)794 GB

VRAM figures include model weights plus a typical 8k KV cache and ~600 MB runtime overhead (Ollama / llama.cpp baseline). Add headroom for higher context lengths.

Strengths

  • #5 on Artificial Analysis's open leaderboard
  • 262K context window
  • Only 17B active parameters keeps inference efficient
  • Apache 2.0 license

Limitations

  • 240GB+ in Q4 demands a multi-GPU server
  • MoE deployment adds operational complexity
  • Beaten by GLM-5.1 and MiniMax-M2.7 on key benchmarks

Architecture & training

Architecture: MoE 397B/17B active · 262k ctx · hybrid thinking

Training: New flagship of the Qwen 3.5 family.

Verdict

A strong flagship MoE with permissive licensing, though no longer the top of the open leaderboard.

Quick start

# HuggingFace : Qwen/Qwen3.5-397B-A17B (alternative locale plus accessible : ollama run qwen3.5:122b)

Or use the open-source MCP server to query this model from Claude Desktop, Cursor, or any MCP-compatible client.

Tools

Is Qwen 3.5 397B-A17B the right pick for you?

Compute self-hosted ROI → Back to catalog