BestLLMfor EN Your hardware. Your LLM. Your call.
APIOpen data Find my LLM
Editorial ranking · 2026

Best local LLM for rtx 5080

Top 7 open-source picks for rtx 5080, ranked by benchmark performance and real-world fit. Updated monthly.

#1

Qwen 3 14B

14B · Alibaba · Apache 2.0

A 14B dense model from Alibaba that matches Qwen 2.5 32B Base on STEM and code, with the same hybrid thinking system as the rest of the Qwen 3 family. The pragmatic sweet spot for a single 24GB GPU.

VRAM Q4: 9 GB · Context: 128k
Read full fiche →
#2

Phi-4 Reasoning 14B

14B · Microsoft · MIT

Microsoft's 14B reasoner that beats R1-Distill-Llama-70B on AIME and GPQA with 50x fewer parameters. MIT-licensed, English-first, with a 32K context.

VRAM Q4: 9 GB · Context: 32k
Read full fiche →
#3

DeepSeek R1 Distill Qwen 14B

14B · DeepSeek · MIT

DeepSeek's R1 reasoning distilled into Qwen 14B under MIT. AIME24 69.7 and MATH-500 93.9 — beats o1-mini on most reasoning benchmarks.

VRAM Q4: 9 GB · Context: 128k
Read full fiche →
#4

Phi-4 14B

14B · Microsoft · MIT

Microsoft's Phi-4 14B, trained on ultra-curated synthetic data with a heavy STEM bias. The 14B reasoning leader at the end of 2024.

VRAM Q4: 9 GB · Context: 16k
Read full fiche →
#5

Qwen 2.5 14B Instruct

14B · Alibaba · Apache 2.0

Alibaba's Apache 2.0 dense 14B hitting MMLU 79.7 and HumanEval 83.5 across 29+ languages. The pragmatic sweet spot for self-hosted general-purpose chat.

VRAM Q4: 9 GB · Context: 128k
Read full fiche →
#6

Qwen 2.5 Coder 14B Instruct

14B · Alibaba · Apache 2.0

Alibaba's Qwen 2.5 Coder 14B under Apache 2.0 with HumanEval 89.6 and LiveCodeBench 37.1. The VRAM sweet spot for serious self-hosted code generation.

VRAM Q4: 9 GB · Context: 128k
Read full fiche →
#7

gpt-oss 20B

21B · OpenAI · Apache 2.0

OpenAI's compact open-weight MoE with 3.6B active out of 21B total parameters. Matches o3-mini on a laptop-class GPU under Apache 2.0.

VRAM Q4: 13 GB · Context: 125k
Read full fiche →