BestLLMfor EN Your hardware. Your LLM. Your call.
APIOpen data Find my LLM
Editorial ranking · 2026

Best local LLM for RAG

Top 7 open-source picks for RAG and document retrieval, ranked by benchmark performance and real-world fit. Updated monthly.

#1

Qwen 3 VL 30B-A3B

30B · Alibaba · Apache 2.0

Qwen 3 VL's sweet spot: a 30B MoE with 3B active parameters and 256k context. Delivers most of the 235B's quality at a fraction of the hardware cost.

VRAM Q4: 19 GB · Context: 256k
Read full fiche →
#2

Nemotron Nano 3 30B-A3B

30B · NVIDIA · NVIDIA Open Model License

NVIDIA's Mamba-2 + Transformer hybrid MoE with 3B active out of 30B total parameters. A native 1M-token context with roughly 4× the throughput of Nemotron 2.

VRAM Q4: 19 GB · Context: 976k
Read full fiche →
#3

Qwen 3 30B-A3B

30B · Alibaba · Apache 2.0

Alibaba's Qwen 3 MoE with 30B total and just 3B active parameters, supporting hybrid thinking mode. MMLU 81.4, AIME24 80.4, 100+ languages, Apache 2.0.

VRAM Q4: 19 GB · Context: 128k
Read full fiche →
#4

Trinity Mini 26B-A3B

26B · Arcee AI · Apache 2.0

Arcee AI's US-built MoE with 3B active parameters out of 26B total. Apache-licensed, fast in practice, and tuned for agent-style workloads.

VRAM Q4: 15 GB · Context: 128k
Read full fiche →
#5

Kanana 2 30B-A3B Thinking

30B · Kakao · Apache 2.0

Kakao's agentic 30B MoE (3B active) with native hybrid thinking and Korean-first training. Apache 2.0 with MLA attention and 131k context.

VRAM Q4: 18 GB · Context: 128k
Read full fiche →
#6

Qwen 3 Omni 30B-A3B

30B · Alibaba · Apache 2.0

Alibaba's omni-modal 30B MoE (3B active) with streaming speech, 119-language ASR, and Apache 2.0 licensing. The most accessible truly omnimodal open model.

VRAM Q4: 19 GB · Context: 128k
Read full fiche →
#7

Granite 4.0 H-Small 32B-A9B

32B · IBM · Apache 2.0

IBM's hybrid Mamba-2 + MoE model with 32B total and 9B active parameters, engineered to slash long-context memory use by roughly 70% versus comparable transformers under Apache 2.0.

VRAM Q4: 19 GB · Context: 125k
Read full fiche →