BestLLMfor EN Your hardware. Your LLM. Your call.
APIOpen data Find my LLM
Model fiche

Qwen 3.6 35B-A3B

By Alibaba · China

chat code reasoning moe
Parameters
35B
License
Apache 2.0
Context
255k
VRAM (Q4)
21 GB
Released
April 2026

Overview

Alibaba's agentic coding MoE with 35B total and just 3B active parameters, released April 16, 2026. Scores 73.4% on SWE-Bench while running on a single 24GB GPU.

When to pick this model

  • Local SWE-Bench-grade coding agents
  • Single 24GB GPU coding workstations
  • Repository-scale refactoring with 262K context
  • Cost-sensitive autonomous coding pipelines
  • Commercial code assistants under Apache 2.0

VRAM requirements by quantization

QuantizationVRAM required
Q4_K_M (recommended)21 GB
Q5_K_M25 GB
Q8_038 GB
FP16 (no quantization)70 GB

VRAM figures include model weights plus a typical 8k KV cache and ~600 MB runtime overhead (Ollama / llama.cpp baseline). Add headroom for higher context lengths.

Published benchmark scores

BenchmarkScore
SWE-Bench73.4

Scores published by the model author or aggregated from public leaderboards. Re-measured monthly by our editorial team.

Strengths

  • 73.4% SWE-Bench in an MoE that fits on a 24GB GPU
  • Only 3B active parameters means fast inference
  • 262K context handles whole repos
  • Apache 2.0 license

Limitations

  • No official Ollama tag yet
  • Brand-new release with limited production track record
  • Specialized for coding, weaker as a general chat model

Architecture & training

Architecture: MoE 35B/3B active · agentic-coding specialist

Training: Released April 16, 2026.

Verdict

The best local coding agent for a single 24GB GPU as of April 2026.

Quick start

ollama run qwen3.6:35b-a3b

Or use the open-source MCP server to query this model from Claude Desktop, Cursor, or any MCP-compatible client.

Tools

Is Qwen 3.6 35B-A3B the right pick for you?

Compute self-hosted ROI → Back to catalog