BestLLMfor EN Your hardware. Your LLM. Your call.
APIOpen data Find my LLM
Editorial ranking · 2026

Best local LLM with vision

Top 7 open-source picks for vision and multimodal tasks, ranked by benchmark performance and real-world fit. Updated monthly.

#1

Qwen 3 VL 30B-A3B

30B · Alibaba · Apache 2.0

Qwen 3 VL's sweet spot: a 30B MoE with 3B active parameters and 256k context. Delivers most of the 235B's quality at a fraction of the hardware cost.

VRAM Q4: 19 GB · Context: 256k
Read full fiche →
#2

Nemotron Nano v2 VL 12B

12.6B · NVIDIA · NVIDIA Open Model License

NVIDIA's 12.6B enterprise VLM with strong DocVQA and ChartQA scores, tuned for professional document extraction workflows.

VRAM Q4: 8 GB · Context: 125k
Read full fiche →
#3

Qwen 2.5 VL 7B

7B · Alibaba · Apache 2.0

A 7B vision-language model from Alibaba with state-of-the-art results in its class, scoring 95.7 on DocVQA. Handles hour-long video, bounding-box grounding, and multilingual OCR.

VRAM Q4: 6 GB · Context: 125k
Read full fiche →
#4

Qwen 3 VL 8B

8B · Alibaba · Apache 2.0

The dense 8B entry in Qwen 3 VL, offering strong OCR and document analysis with a remarkable 256k multimodal context for its size.

VRAM Q4: 6 GB · Context: 256k
Read full fiche →
#5

Qwen 3 Omni 30B-A3B

30B · Alibaba · Apache 2.0

Alibaba's omni-modal 30B MoE (3B active) with streaming speech, 119-language ASR, and Apache 2.0 licensing. The most accessible truly omnimodal open model.

VRAM Q4: 19 GB · Context: 128k
Read full fiche →
#6

LLaDA 2.0 Uni 16B

16B · Ant Group / inclusionAI · Apache 2.0

Ant Group's first open Apache 2.0 diffusion LLM: a 16B/1B MoE paired with a 6.2B diffusion decoder, unifying text and vision generation and editing. Released April 2026.

VRAM Q4: 18 GB · Context: 8k
Read full fiche →
#7

Mistral Small 3.1 24B

24B · Mistral AI · Apache 2.0

Mistral AI's Small 3.1 — Small 3 plus a vision encoder, a 128k context, and ~150 tok/s inference under Apache 2.0. Small 3.2 (June 2025) is a drop-in upgrade.

VRAM Q4: 14 GB · Context: 125k
Read full fiche →