Model fiche
Qwen3-Coder-Next 80B-A3B
By Alibaba · China
code
moe
Overview
Alibaba's hybrid Gated DeltaNet + Attention MoE with 80B total and 3B active parameters. Purpose-built as a local coding copilot that fits on a 24GB GPU.
When to pick this model
- Local Copilot-style code completion
- Long-context refactoring up to 262K tokens
- IDE plugins running on consumer hardware
- Apache-licensed commercial code tooling
- Reducing reliance on cloud coding APIs
VRAM requirements by quantization
| Quantization | VRAM required |
|---|---|
| Q4_K_M (recommended) | 48 GB |
| Q5_K_M | 58 GB |
| Q8_0 | 86 GB |
| FP16 (no quantization) | 160 GB |
VRAM figures include model weights plus a typical 8k KV cache and ~600 MB runtime overhead (Ollama / llama.cpp baseline). Add headroom for higher context lengths.
Strengths
- Runs as a local copilot on a 24GB GPU
- 262K context fits entire codebases
- Hybrid architecture keeps memory low
- Apache 2.0 license
Limitations
- Hybrid architecture means partial llama.cpp support
- Less mature than dense coder alternatives
- Tooling lags behind standard transformer models
Architecture & training
Architecture: MoE 80B/3B · Gated DeltaNet + hybrid Attention · 262k ctx
Training: Agentic code specialist.
Verdict
Choose this when you want a local Copilot replacement and can tolerate early-stage tooling friction.
Quick start
ollama run qwen3-coder-nextOr use the open-source MCP server to query this model from Claude Desktop, Cursor, or any MCP-compatible client.