Qwen 3.6 27B
By Alibaba · China
Overview
Dense 27B multimodal model from Alibaba (April 2026), scoring 77.2% on SWE-bench Verified with 262k native context (1M via YaRN). The Qwen 3.6 generation's developer-friendly workhorse.
When to pick this model
- Coding agents needing top-tier SWE-bench accuracy at single-GPU scale
- Multimodal applications with long context (up to 1M with YaRN)
- Apache 2.0 deployments replacing closed APIs
- Reasoning workloads where dense models behave more predictably than MoE
- Local inference on a single 24-32 GB GPU
VRAM requirements by quantization
| Quantization | VRAM required |
|---|---|
| Q4_K_M (recommended) | 16 GB |
| Q5_K_M | 19 GB |
| Q8_0 | 29 GB |
| FP16 (no quantization) | 54 GB |
VRAM figures include model weights plus a typical 8k KV cache and ~600 MB runtime overhead (Ollama / llama.cpp baseline). Add headroom for higher context lengths.
Published benchmark scores
| Benchmark | Score |
|---|---|
| SWE-bench Verified | 77.2 |
| Terminal-Bench | 59.3 |
| SkillsBench | 48.2 |
Scores published by the model author or aggregated from public leaderboards. Re-measured monthly by our editorial team.
Strengths
- 77.2% SWE-bench Verified — frontier coding accuracy
- Native multimodal text + image
- 262k context, extendable to 1M with YaRN
- Apache 2.0
- Dense 27B fits comfortably on consumer hardware
Limitations
- Needs 16+ GB VRAM at Q4
- Hybrid architecture requires a recent llama.cpp build
- Dense design means no MoE inference efficiency
Architecture & training
Architecture: Dense 27B · Gated DeltaNet + Gated Attention · multimodal · 64 layers
Training: Dense successor to Qwen 3.5 27B, 3.6 generation.
The single-GPU coding model to beat in 2026 — Apache 2.0, multimodal, and frontier-grade on SWE-bench.
Quick start
ollama run qwen3.6:27bOr use the open-source MCP server to query this model from Claude Desktop, Cursor, or any MCP-compatible client.