Granite 3.2 8B Instruct
By IBM · United States
Overview
IBM's enterprise-focused 8B Granite 3.2 with a toggleable thinking mode under Apache 2.0. MMLU 65.5 and IFEval 70.9, with built-in IBM safety guardrails.
When to pick this model
- Enterprise RAG deployments needing strict instruction following
- Regulated environments requiring safety guardrails out of the box
- Internal tools where Apache 2.0 plus IBM backing matters
- Workloads benefiting from optional thinking mode
VRAM requirements by quantization
| Quantization | VRAM required |
|---|---|
| Q4_K_M (recommended) | 5 GB |
| Q5_K_M | 6 GB |
| Q8_0 | 9 GB |
| FP16 (no quantization) | 16 GB |
VRAM figures include model weights plus a typical 8k KV cache and ~600 MB runtime overhead (Ollama / llama.cpp baseline). Add headroom for higher context lengths.
Published benchmark scores
| Benchmark | Score |
|---|---|
| MMLU | 67 |
| HumanEval | 72 |
Scores published by the model author or aggregated from public leaderboards. Re-measured monthly by our editorial team.
Strengths
- 128k context
- Apache 2.0 license
- Strong RAG and enterprise instruction following
- IBM Safety Guardrails included
- Toggleable thinking mode
Limitations
- Trails Llama 3.1 8B on general chat
- Very enterprise-flavored tone
- Weaker than Qwen 2.5 7B on coding tasks
Architecture & training
Architecture: Dense · 8B · IBM Granite 3.2 · RAG and enterprise agents
Training: IBM enterprise corpus, strong in code (100 languages), 2024 data.
The default open 8B for enterprise RAG and regulated workloads — picked for safety guardrails and IBM support, not chat quality.
Quick start
ollama run granite3.2:8bOr use the open-source MCP server to query this model from Claude Desktop, Cursor, or any MCP-compatible client.