Editorial ranking · 2026

Best local LLM for coding

Q: What is the best local LLM for coding?

Devstral Small 2 24B tops this ranking — a 24B model, licensed under Apache 2.0, needing about 14 GB of VRAM at Q4 quantization. See the full list below for the runner-ups and how they compare.

Last updated 2026-05-26 · Page updated 2026-07-13

Top 7 open-source picks for coding, ranked by benchmark performance and real-world fit. Updated monthly.

Devstral Small 2 24B

24B · Mistral AI · Apache 2.0

Mistral AI's 24B coding specialist co-developed with All Hands AI, scoring 72.2% on SWE-Bench under Apache 2.0. Fits on a single RTX 4090.

VRAM Q4: 14 GB · Context: 250k

Read full fiche →

Qwen 2.5 Coder 32B

32B · Alibaba · Apache 2.0

Alibaba's Qwen 2.5 Coder 32B — the strongest open-weight code model we've benchmarked, trading punches with Claude 3.5 Sonnet on HumanEval.

VRAM Q4: 19 GB · Context: 128k

Read full fiche →

DeepSeek Coder V2 Lite 16B

16B · DeepSeek · MIT

A 16B MoE code specialist from DeepSeek covering 338 programming languages with a 128k context. Fast inference for its quality tier.

VRAM Q4: 10 GB · Context: 128k

Read full fiche →

Qwen 2.5 Coder 14B Instruct

14B · Alibaba · Apache 2.0

Alibaba's Qwen 2.5 Coder 14B under Apache 2.0 with HumanEval 89.6 and LiveCodeBench 37.1. The VRAM sweet spot for serious self-hosted code generation.

VRAM Q4: 9 GB · Context: 128k

Read full fiche →

Qwen 3.6 27B

27B · Alibaba · Apache 2.0

Dense 27B multimodal model from Alibaba (April 2026), scoring 77.2% on SWE-bench Verified with 262k native context (1M via YaRN). The Qwen 3.6 generation's developer-friendly workhorse.

VRAM Q4: 16 GB · Context: 256k

Read full fiche →

Granite 4.1 30B Instruct

30B · IBM · Apache 2.0

IBM's dense 30B Granite 4.1: Apache 2.0, 12 languages, 131k context, with OpenAI-compatible tool calling. Built on the same GB200 NVL72 cluster as the rest of the 4.1 lineup.

VRAM Q4: 17 GB · Context: 128k

Read full fiche →

Qwen 2.5 Coder 7B

7B · Alibaba · Apache 2.0

A 7B coding specialist from Alibaba covering 92 programming languages with a 128k context. Competitive with proprietary models on HumanEval at this size.

VRAM Q4: 5 GB · Context: 128k

Read full fiche →

Which GPU should you buy to run Devstral Small 2 24B?

To run Devstral Small 2 24B locally at Q4, you need ~14 GB of VRAM. The best value for this is a RTX 5070 Ti (16 GB VRAM).

Check RTX 5070 Ti price on Amazon →

As an Amazon Associate, BestLLMfor earns from qualifying purchases, at no extra cost to you. It does not influence our independent rankings.

Frequently asked questions

What is the best local LLM for coding?

Devstral Small 2 24B tops this ranking — a 24B model, licensed under Apache 2.0, needing about 14 GB of VRAM at Q4 quantization. See the full list below for the runner-ups and how they compare.

How much VRAM do I need to run Devstral Small 2 24B?

At Q4 quantization, Devstral Small 2 24B needs about 14 GB of VRAM and fits comfortably on a single 24 GB GPU.

Which of these models fit an 8 GB GPU?

At Q4 quantization, Qwen 2.5 Coder 7B fit within 8 GB of VRAM.

Are the models on this coding list free for commercial use?

Licenses across this list include Apache 2.0, MIT. Check the specific license of each model on its catalog page before deploying commercially, as terms vary by author.

What context window do these models support?

Context windows on this list range from 128k to 256k tokens, depending on the model.