Qwen 3 235B-A22B
By Alibaba · China
Overview
Alibaba's flagship MoE — 235B total, 22B active per token across 128 experts. Hits 85.7 on AIME 2024 and 70.7 on LiveCodeBench, putting it in frontier-open territory.
When to pick this model
- You're running multi-GPU or a high-memory Apple Silicon machine and want frontier-open performance
- You need top-tier math and code reasoning under an Apache license
- You want MoE-class throughput (22B active) rather than dense 200B+ latency
- You're evaluating against closed frontier models and need a serious local baseline
VRAM requirements by quantization
| Quantization | VRAM required |
|---|---|
| Q4_K_M (recommended) | 142 GB |
| Q5_K_M | 170 GB |
| Q8_0 | 250 GB |
| FP16 (no quantization) | 470 GB |
VRAM figures include model weights plus a typical 8k KV cache and ~600 MB runtime overhead (Ollama / llama.cpp baseline). Add headroom for higher context lengths.
Published benchmark scores
| Benchmark | Score |
|---|---|
| AIME 2024 | 85.7 |
| AIME 2025 | 81.5 |
| LiveCodeBench v5 | 70.7 |
Scores published by the model author or aggregated from public leaderboards. Re-measured monthly by our editorial team.
Strengths
- Frontier-open scores on AIME 2024 (85.7) and LiveCodeBench (70.7)
- Only 22B active parameters — fast for its total size
- Instruct-2507 and Thinking-2507 variants available
- Apache 2.0
Limitations
- ~142GB at Q4 — needs multi-GPU or a 192GB+ Apple Silicon host
- Not realistic for laptop or single-GPU deployment
Architecture & training
Architecture: MoE · 128 experts, 8 active · 94 layers · GQA 64Q/4KV
Training: 36T tokens. Instruct-2507 and Thinking-2507 variants (July 2025).
Pick this when you have the hardware for frontier-open performance under an Apache license.
Quick start
ollama run qwen3:235bOr use the open-source MCP server to query this model from Claude Desktop, Cursor, or any MCP-compatible client.