BestLLMfor EN Your hardware. Your LLM. Your call.
APIOpen data Find my LLM
Model fiche

Qwen3-Coder-Next 80B-A3B

By Alibaba · China

code moe
Parameters
80B
License
Apache 2.0
Context
255k
VRAM (Q4)
48 GB
Released
February 2026

Overview

Alibaba's hybrid Gated DeltaNet + Attention MoE with 80B total and 3B active parameters. Purpose-built as a local coding copilot that fits on a 24GB GPU.

When to pick this model

  • Local Copilot-style code completion
  • Long-context refactoring up to 262K tokens
  • IDE plugins running on consumer hardware
  • Apache-licensed commercial code tooling
  • Reducing reliance on cloud coding APIs

VRAM requirements by quantization

QuantizationVRAM required
Q4_K_M (recommended)48 GB
Q5_K_M58 GB
Q8_086 GB
FP16 (no quantization)160 GB

VRAM figures include model weights plus a typical 8k KV cache and ~600 MB runtime overhead (Ollama / llama.cpp baseline). Add headroom for higher context lengths.

Strengths

  • Runs as a local copilot on a 24GB GPU
  • 262K context fits entire codebases
  • Hybrid architecture keeps memory low
  • Apache 2.0 license

Limitations

  • Hybrid architecture means partial llama.cpp support
  • Less mature than dense coder alternatives
  • Tooling lags behind standard transformer models

Architecture & training

Architecture: MoE 80B/3B · Gated DeltaNet + hybrid Attention · 262k ctx

Training: Agentic code specialist.

Verdict

Choose this when you want a local Copilot replacement and can tolerate early-stage tooling friction.

Quick start

ollama run qwen3-coder-next

Or use the open-source MCP server to query this model from Claude Desktop, Cursor, or any MCP-compatible client.

Tools

Is Qwen3-Coder-Next 80B-A3B the right pick for you?

Compute self-hosted ROI → Back to catalog