Editorial ranking · 2026

Best local LLM for agents

Q: What is the best local LLM for autonomous agents and tool use?

Qwen 3 30B-A3B tops this ranking — a 30B model, licensed under Apache 2.0, needing about 19 GB of VRAM at Q4 quantization. See the full list below for the runner-ups and how they compare.

Last updated 2026-05-26 · Page updated 2026-07-13

Top 7 open-source picks for autonomous agents and tool use, ranked by benchmark performance and real-world fit. Updated monthly.

Qwen 3 30B-A3B

30B · Alibaba · Apache 2.0

Alibaba's Qwen 3 MoE with 30B total and just 3B active parameters, supporting hybrid thinking mode. MMLU 81.4, AIME24 80.4, 100+ languages, Apache 2.0.

VRAM Q4: 19 GB · Context: 128k

Read full fiche →

gpt-oss 20B

21B · OpenAI · Apache 2.0

OpenAI's compact open-weight MoE with 3.6B active out of 21B total parameters. Matches o3-mini on a laptop-class GPU under Apache 2.0.

VRAM Q4: 13 GB · Context: 125k

Read full fiche →

ERNIE 4.5 21B-A3B Thinking

21B · Baidu · Apache 2.0

Baidu's compact reasoning MoE with 3B active parameters out of 21B total. Fast inference thanks to the small active set, with Chinese-language strength.

VRAM Q4: 13 GB · Context: 128k

Read full fiche →

Kanana 2 30B-A3B Thinking

30B · Kakao · Apache 2.0

Kakao's agentic 30B MoE (3B active) with native hybrid thinking and Korean-first training. Apache 2.0 with MLA attention and 131k context.

VRAM Q4: 18 GB · Context: 128k

Read full fiche →

DeepSeek R1 Distill 32B

32B · DeepSeek · MIT

The 32B DeepSeek R1 distill — the best accessible open-weight reasoner we've tested. Explicit chain-of-thought, MIT-licensed, runs on a single 24GB GPU.

VRAM Q4: 19 GB · Context: 32k

Read full fiche →

Qwen 3 32B

32B · Alibaba · Apache 2.0

Alibaba's 32B dense flagship with thinking mode, scoring 65.5 on MMLU-Pro and 39.8 on SuperGPQA. The strongest general-purpose Qwen 3 dense model before stepping up to the MoE.

VRAM Q4: 19 GB · Context: 128k

Read full fiche →

QwQ 32B

32B · Alibaba · Apache 2.0

Alibaba's dedicated 32B reasoner, trained with reinforcement learning rather than distillation. Hits 79.5 on AIME24 and 90.6 on MATH-500 — a direct Apache-licensed alternative to DeepSeek R1.

VRAM Q4: 19 GB · Context: 128k

Read full fiche →

Which GPU should you buy to run Qwen 3 30B-A3B?

To run Qwen 3 30B-A3B locally at Q4, you need ~19 GB of VRAM. The best value for this is a RTX 4090 (24 GB VRAM).

Check RTX 4090 price on Amazon →

As an Amazon Associate, BestLLMfor earns from qualifying purchases, at no extra cost to you. It does not influence our independent rankings.

Frequently asked questions

What is the best local LLM for autonomous agents and tool use?

Qwen 3 30B-A3B tops this ranking — a 30B model, licensed under Apache 2.0, needing about 19 GB of VRAM at Q4 quantization. See the full list below for the runner-ups and how they compare.

How much VRAM do I need to run Qwen 3 30B-A3B?

At Q4 quantization, Qwen 3 30B-A3B needs about 19 GB of VRAM and fits comfortably on a single 24 GB GPU.

Which of these models fit an 16 GB GPU?

At Q4 quantization, gpt-oss 20B, ERNIE 4.5 21B-A3B Thinking fit within 16 GB of VRAM.

Are the models on this autonomous agents and tool use list free for commercial use?

Licenses across this list include Apache 2.0, MIT. Check the specific license of each model on its catalog page before deploying commercially, as terms vary by author.

What context window do these models support?

Context windows on this list range from 32k to 128k tokens, depending on the model.