BestLLMfor EN Your hardware. Your LLM. Your call.
APIOpen data Find my LLM
Model fiche

Tülu 3 8B

By Allen AI · United States

chat general
Parameters
8B
License
Llama 3.1 Community
Context
125k
VRAM (Q4)
6 GB
Released
November 2024

Overview

Allen AI's fully open post-training recipe applied to Llama 3.1 8B, hitting 87.6 on GSM8K with all data, code, and evals released publicly.

When to pick this model

  • Reproducible research on RLHF and DPO pipelines
  • Drop-in replacement for Llama 3.1 8B Instruct with stronger math
  • Instruction-following workloads needing high IFEval scores
  • Teams that need to audit training data end-to-end
  • Academic baselines requiring full provenance

VRAM requirements by quantization

QuantizationVRAM required
Q4_K_M (recommended)6 GB
Q5_K_M7 GB
Q8_010 GB
FP16 (no quantization)16 GB

VRAM figures include model weights plus a typical 8k KV cache and ~600 MB runtime overhead (Ollama / llama.cpp baseline). Add headroom for higher context lengths.

Published benchmark scores

BenchmarkScore
GSM8K87.6
MATH42
IFEval82.4

Scores published by the model author or aggregated from public leaderboards. Re-measured monthly by our editorial team.

Strengths

  • Best fully-open RLHF recipe shipped to date
  • GSM8K 87.6 is class-leading at 8B
  • IFEval 82.4 shows strong instruction adherence
  • Training data, code, and evals all publicly available
  • Stable behavior on standard chat benchmarks

Limitations

  • Inherits the Llama 3.1 Community License
  • No native vision or tool-use specialization
  • Eclipsed at the frontier by larger open models

Architecture & training

Architecture: Dense Llama 3.1 8B · SFT + DPO + RLVR

Training: Public data + code + evals.

Verdict

The reference open RLHF recipe at 8B — choose it when reproducibility and post-training transparency matter as much as benchmark scores.

Quick start

ollama run tulu3:8b

Or use the open-source MCP server to query this model from Claude Desktop, Cursor, or any MCP-compatible client.

Tools

Is Tülu 3 8B the right pick for you?

Compute self-hosted ROI → Back to catalog