BestLLMfor EN Your hardware. Your LLM. Your call.
APIOpen data Find my LLM
Editorial ranking · 2026

Best local LLM for mac 24gb

Top 8 open-source picks for mac 24gb, ranked by benchmark performance and real-world fit. Updated monthly.

#1

Granite 4.0 H-Tiny 7B-A1B

7B · IBM · Apache 2.0

IBM's edge-class hybrid MoE with 7B total and only 1B active parameters — Apache 2.0 licensed and built for embedded and low-cost serving.

VRAM Q4: 4 GB · Context: 125k
Read full fiche →
#2

Qwen 3 14B

14B · Alibaba · Apache 2.0

A 14B dense model from Alibaba that matches Qwen 2.5 32B Base on STEM and code, with the same hybrid thinking system as the rest of the Qwen 3 family. The pragmatic sweet spot for a single 24GB GPU.

VRAM Q4: 9 GB · Context: 128k
Read full fiche →
#3

Phi-4 Reasoning 14B

14B · Microsoft · MIT

Microsoft's 14B reasoner that beats R1-Distill-Llama-70B on AIME and GPQA with 50x fewer parameters. MIT-licensed, English-first, with a 32K context.

VRAM Q4: 9 GB · Context: 32k
Read full fiche →
#4

DeepSeek R1 Distill Qwen 14B

14B · DeepSeek · MIT

DeepSeek's R1 reasoning distilled into Qwen 14B under MIT. AIME24 69.7 and MATH-500 93.9 — beats o1-mini on most reasoning benchmarks.

VRAM Q4: 9 GB · Context: 128k
Read full fiche →
#5

gpt-oss 20B

21B · OpenAI · Apache 2.0

OpenAI's compact open-weight MoE with 3.6B active out of 21B total parameters. Matches o3-mini on a laptop-class GPU under Apache 2.0.

VRAM Q4: 13 GB · Context: 125k
Read full fiche →
#6

ERNIE 4.5 21B-A3B Thinking

21B · Baidu · Apache 2.0

Baidu's compact reasoning MoE with 3B active parameters out of 21B total. Fast inference thanks to the small active set, with Chinese-language strength.

VRAM Q4: 13 GB · Context: 128k
Read full fiche →
#7

Trinity Mini 26B-A3B

26B · Arcee AI · Apache 2.0

Arcee AI's US-built MoE with 3B active parameters out of 26B total. Apache-licensed, fast in practice, and tuned for agent-style workloads.

VRAM Q4: 15 GB · Context: 128k
Read full fiche →
#8

OLMoE 1B-7B Instruct

7B · Allen AI · Apache 2.0

Allen AI's OLMoE is the only MoE released with weights, training data, and code fully open — 7B total with 1.3B active, matching Llama2-13B-Chat quality.

VRAM Q4: 4 GB · Context: 4k
Read full fiche →