Tülu 3 8B
By Allen AI · United States
Overview
Allen AI's fully open post-training recipe applied to Llama 3.1 8B, hitting 87.6 on GSM8K with all data, code, and evals released publicly.
When to pick this model
- Reproducible research on RLHF and DPO pipelines
- Drop-in replacement for Llama 3.1 8B Instruct with stronger math
- Instruction-following workloads needing high IFEval scores
- Teams that need to audit training data end-to-end
- Academic baselines requiring full provenance
VRAM requirements by quantization
| Quantization | VRAM required |
|---|---|
| Q4_K_M (recommended) | 6 GB |
| Q5_K_M | 7 GB |
| Q8_0 | 10 GB |
| FP16 (no quantization) | 16 GB |
VRAM figures include model weights plus a typical 8k KV cache and ~600 MB runtime overhead (Ollama / llama.cpp baseline). Add headroom for higher context lengths.
Published benchmark scores
| Benchmark | Score |
|---|---|
| GSM8K | 87.6 |
| MATH | 42 |
| IFEval | 82.4 |
Scores published by the model author or aggregated from public leaderboards. Re-measured monthly by our editorial team.
Strengths
- Best fully-open RLHF recipe shipped to date
- GSM8K 87.6 is class-leading at 8B
- IFEval 82.4 shows strong instruction adherence
- Training data, code, and evals all publicly available
- Stable behavior on standard chat benchmarks
Limitations
- Inherits the Llama 3.1 Community License
- No native vision or tool-use specialization
- Eclipsed at the frontier by larger open models
Architecture & training
Architecture: Dense Llama 3.1 8B · SFT + DPO + RLVR
Training: Public data + code + evals.
The reference open RLHF recipe at 8B — choose it when reproducibility and post-training transparency matter as much as benchmark scores.
Quick start
ollama run tulu3:8bOr use the open-source MCP server to query this model from Claude Desktop, Cursor, or any MCP-compatible client.