Best local LLM for 8GB VRAM
Top 8 open-source picks for 8 GB VRAM budgets, ranked by benchmark performance and real-world fit. Updated monthly.
Granite 4.0 H-Tiny 7B-A1B
IBM's edge-class hybrid MoE with 7B total and only 1B active parameters — Apache 2.0 licensed and built for embedded and low-cost serving.
Mistral Nemo 12B Instruct
Mistral AI and NVIDIA's co-developed 12B instruct model with 128k context, the Tekken tokenizer, and strong European multilingual coverage.
Gemma 3 12B
The 12B sweet spot of Google's Gemma 3 line — multimodal, 128K context, and 140 languages. Fits on a single consumer GPU with room for batching.
Nemotron Nano v2 VL 12B
NVIDIA's 12.6B enterprise VLM with strong DocVQA and ChartQA scores, tuned for professional document extraction workflows.
Lucie 7B
A French-sovereign 7B model from OpenLLM-France, backed by CNRS and LINAGORA, with a fully transparent and auditable training corpus.
DeepSeek R1 Distill 7B
A 7B DeepSeek model distilled from R1 671B with explicit chain-of-thought reasoning. Surprisingly strong on AIME and MATH for its size.
Qwen 3 8B
Alibaba's 8B dense model with a toggleable thinking mode and broad multilingual coverage. Punches well above its weight for an 8B and runs comfortably on a single consumer GPU.
Qwen 2.5 VL 7B
A 7B vision-language model from Alibaba with state-of-the-art results in its class, scoring 95.7 on DocVQA. Handles hour-long video, bounding-box grounding, and multilingual OCR.