Tülu 3 70B
By Allen AI · United States
Overview
Allen AI's fully open RLHF stack on Llama 3.1 70B, beating Claude Haiku, GPT-3.5 Turbo, and GPT-4o-mini on standard reasoning and code benchmarks.
When to pick this model
- Self-hosted alternative to closed mid-tier APIs
- Math-heavy chat with GSM8K 93.5 territory performance
- Code assistance where HumanEval+ matters more than agentic loops
- Research projects that need a fully documented post-training pipeline
- Workloads that justify a 2x A100 footprint
VRAM requirements by quantization
| Quantization | VRAM required |
|---|---|
| Q4_K_M (recommended) | 40 GB |
| Q5_K_M | 48 GB |
| Q8_0 | 75 GB |
| FP16 (no quantization) | 140 GB |
VRAM figures include model weights plus a typical 8k KV cache and ~600 MB runtime overhead (Ollama / llama.cpp baseline). Add headroom for higher context lengths.
Published benchmark scores
| Benchmark | Score |
|---|---|
| GSM8K | 93.5 |
| HumanEval+ | 92.4 |
| IFEval | 83.2 |
Scores published by the model author or aggregated from public leaderboards. Re-measured monthly by our editorial team.
Strengths
- Beats Claude Haiku, GPT-3.5 Turbo, and GPT-4o-mini on key evals
- GSM8K 93.5 and HumanEval+ 92.4 at open weights
- Fully open SFT + DPO + RLVR recipe
- Strong instruction following and refusal calibration
- Stable, well-documented behavior for production deploys
Limitations
- ~40 GB VRAM at Q4 — needs serious hardware
- Bound by Llama 3.1 Community License
- No multimodal capabilities
Architecture & training
Architecture: Dense Llama 3.1 70B · full Tülu recipe
Training: SFT + DPO + RLVR on 70B.
The strongest fully open post-trained 70B available — a credible self-hosted replacement for closed mid-tier chat APIs.
Quick start
ollama run tulu3:70bOr use the open-source MCP server to query this model from Claude Desktop, Cursor, or any MCP-compatible client.