Granite 4.1 8B Instruct
By IBM · United States
Overview
IBM's dense 8B Granite 4.1 release: Apache 2.0, 12 languages, 131k context, MMLU 73.84, HumanEval 85.37. Trained on a CoreWeave GB200 NVL72 cluster.
When to pick this model
- Enterprise deployments needing Apache 2.0 and IBM provenance
- Tool-calling agents with predictable behavior at moderate scale
- Multilingual products across 12 languages including French
- Long-context tasks up to 131k tokens
- Coding workloads on a single mid-range GPU
VRAM requirements by quantization
| Quantization | VRAM required |
|---|---|
| Q4_K_M (recommended) | 5 GB |
| Q5_K_M | 6 GB |
| Q8_0 | 9 GB |
| FP16 (no quantization) | 16 GB |
VRAM figures include model weights plus a typical 8k KV cache and ~600 MB runtime overhead (Ollama / llama.cpp baseline). Add headroom for higher context lengths.
Published benchmark scores
| Benchmark | Score |
|---|---|
| MMLU | 73.84 |
| GSM8K | 92.49 |
| HumanEval | 85.37 |
| ArenaHard | 68.98 |
| AlpacaEval | 50.08 |
Scores published by the model author or aggregated from public leaderboards. Re-measured monthly by our editorial team.
Strengths
- Apache 2.0 with full transparency on training
- Strong tool calling and instruction following
- 12 native languages including French
- 131k context window
- Excellent quality-per-parameter at the 8B tier
Limitations
- No official Ollama tag at release
- Reasoning in non-English languages still trails English
- No MoE variant at this size
Architecture & training
Architecture: Dense Transformer · 40 layers · GQA 32Q/8KV · embedding 4096 · MLP hidden 12,800 · RoPE
Training: Improved post-training: SFT + RL alignment. 12 languages: EN, DE, ES, FR, JA, PT, AR, CS, IT, KO, NL, ZH. NVIDIA GB200 NVL72 cluster (CoreWeave).
IBM's most usable open model yet — Apache 2.0, multilingual, and well-suited for enterprise tool use.
Quick start
# HuggingFace : ibm-granite/granite-4.1-8bOr use the open-source MCP server to query this model from Claude Desktop, Cursor, or any MCP-compatible client.