BestLLMfor EN Your hardware. Your LLM. Your call.
APIOpen data Find my LLM
Model fiche

Llama 4 Scout 109B

By Meta · United States

chat general vision moe multilingual
Parameters
109B
License
Llama 4 Community
Context
9765k
VRAM (Q4)
65 GB
Released
April 2025

Overview

Meta's compact Llama 4 MoE — 109B total, 17B active, natively multimodal, with an unprecedented 10M token context. Fits on a single H100.

When to pick this model

  • Whole-codebase or whole-corpus analysis up to 10M tokens
  • Multimodal pipelines where one H100 is the inference budget
  • Long-form document understanding without RAG
  • Multilingual chat with native image input

VRAM requirements by quantization

QuantizationVRAM required
Q4_K_M (recommended)65 GB
Q5_K_M78 GB
Q8_0117 GB
FP16 (no quantization)218 GB

VRAM figures include model weights plus a typical 8k KV cache and ~600 MB runtime overhead (Ollama / llama.cpp baseline). Add headroom for higher context lengths.

Published benchmark scores

BenchmarkScore
MMLU-Pro74

Scores published by the model author or aggregated from public leaderboards. Re-measured monthly by our editorial team.

Strengths

  • 10M token context — unmatched among open models
  • Runs on a single H100 thanks to MoE sparsity
  • Native multimodal input — no separate vision adapter needed
  • 17B active parameters keeps inference fast

Limitations

  • Hugging Face gated access
  • Llama 4 Community License with the >700M MAU clause
  • Long-context quality drops well before the 10M ceiling
  • Newer than Llama 3.1 — tooling still catching up

Architecture & training

Architecture: MoE 16 experts · 109B/17B active · iRoPE · natively multimodal

Training: Meta Llama 4 compact flagship.

Verdict

The long-context champion of open weights — if you actually need 10M tokens, nothing else comes close on a single H100.

Quick start

ollama run llama4:scout

Or use the open-source MCP server to query this model from Claude Desktop, Cursor, or any MCP-compatible client.

Tools

Is Llama 4 Scout 109B the right pick for you?

Compute self-hosted ROI → Back to catalog