Model fiche
Qwen 3.6 35B-A3B
By Alibaba · China
chat
code
reasoning
moe
Overview
Alibaba's agentic coding MoE with 35B total and just 3B active parameters, released April 16, 2026. Scores 73.4% on SWE-Bench while running on a single 24GB GPU.
When to pick this model
- Local SWE-Bench-grade coding agents
- Single 24GB GPU coding workstations
- Repository-scale refactoring with 262K context
- Cost-sensitive autonomous coding pipelines
- Commercial code assistants under Apache 2.0
VRAM requirements by quantization
| Quantization | VRAM required |
|---|---|
| Q4_K_M (recommended) | 21 GB |
| Q5_K_M | 25 GB |
| Q8_0 | 38 GB |
| FP16 (no quantization) | 70 GB |
VRAM figures include model weights plus a typical 8k KV cache and ~600 MB runtime overhead (Ollama / llama.cpp baseline). Add headroom for higher context lengths.
Published benchmark scores
| Benchmark | Score |
|---|---|
| SWE-Bench | 73.4 |
Scores published by the model author or aggregated from public leaderboards. Re-measured monthly by our editorial team.
Strengths
- 73.4% SWE-Bench in an MoE that fits on a 24GB GPU
- Only 3B active parameters means fast inference
- 262K context handles whole repos
- Apache 2.0 license
Limitations
- No official Ollama tag yet
- Brand-new release with limited production track record
- Specialized for coding, weaker as a general chat model
Architecture & training
Architecture: MoE 35B/3B active · agentic-coding specialist
Training: Released April 16, 2026.
Verdict
The best local coding agent for a single 24GB GPU as of April 2026.
Quick start
ollama run qwen3.6:35b-a3bOr use the open-source MCP server to query this model from Claude Desktop, Cursor, or any MCP-compatible client.