Llama 4 Maverick 400B
By Meta · United States
Overview
Meta's larger Llama 4 MoE at 400B total with 17B active across 128 experts, natively multimodal. LMArena 1417 and 1M token context, but 245GB to download.
When to pick this model
- Frontier-quality open chat in multi-GPU production
- Multimodal agents needing 1M context
- Drop-in for teams ready to commit to the Llama 4 ecosystem
- Workloads where MMLU-Pro 80 quality justifies the storage cost
VRAM requirements by quantization
| Quantization | VRAM required |
|---|---|
| Q4_K_M (recommended) | 240 GB |
| Q5_K_M | 285 GB |
| Q8_0 | 425 GB |
| FP16 (no quantization) | 800 GB |
VRAM figures include model weights plus a typical 8k KV cache and ~600 MB runtime overhead (Ollama / llama.cpp baseline). Add headroom for higher context lengths.
Published benchmark scores
| Benchmark | Score |
|---|---|
| LMArena | 70.85 |
| MMLU-Pro | 80 |
Scores published by the model author or aggregated from public leaderboards. Re-measured monthly by our editorial team.
Strengths
- LMArena 1417 — top-tier open chat quality
- MMLU-Pro 80
- 1M token context
- Native multimodal with strong vision performance
- 17B active keeps inference cost manageable
Limitations
- 245GB download — non-trivial storage and bandwidth
- Hugging Face gated access
- Llama 4 Community License with >700M MAU clause
- Outclassed on reasoning by R1-class models
Architecture & training
Architecture: MoE 128 experts · 400B/17B active · natively multimodal · 1M ctx
Training: Scout's big brother.
Meta's biggest open chat model and a credible GPT-4-class alternative — if you can host 245GB and accept the MAU clause.
Quick start
ollama run llama4:maverickOr use the open-source MCP server to query this model from Claude Desktop, Cursor, or any MCP-compatible client.