BestLLMfor EN Your hardware. Your LLM. Your call.
APIOpen data Find my LLM
Model fiche

Granite 3.2 8B Instruct

By IBM · United States

chat general
Parameters
8B
License
Apache 2.0
Context
125k
VRAM (Q4)
5 GB
Released
October 2024

Overview

IBM's enterprise-focused 8B Granite 3.2 with a toggleable thinking mode under Apache 2.0. MMLU 65.5 and IFEval 70.9, with built-in IBM safety guardrails.

When to pick this model

  • Enterprise RAG deployments needing strict instruction following
  • Regulated environments requiring safety guardrails out of the box
  • Internal tools where Apache 2.0 plus IBM backing matters
  • Workloads benefiting from optional thinking mode

VRAM requirements by quantization

QuantizationVRAM required
Q4_K_M (recommended)5 GB
Q5_K_M6 GB
Q8_09 GB
FP16 (no quantization)16 GB

VRAM figures include model weights plus a typical 8k KV cache and ~600 MB runtime overhead (Ollama / llama.cpp baseline). Add headroom for higher context lengths.

Published benchmark scores

BenchmarkScore
MMLU67
HumanEval72

Scores published by the model author or aggregated from public leaderboards. Re-measured monthly by our editorial team.

Strengths

  • 128k context
  • Apache 2.0 license
  • Strong RAG and enterprise instruction following
  • IBM Safety Guardrails included
  • Toggleable thinking mode

Limitations

  • Trails Llama 3.1 8B on general chat
  • Very enterprise-flavored tone
  • Weaker than Qwen 2.5 7B on coding tasks

Architecture & training

Architecture: Dense · 8B · IBM Granite 3.2 · RAG and enterprise agents

Training: IBM enterprise corpus, strong in code (100 languages), 2024 data.

Verdict

The default open 8B for enterprise RAG and regulated workloads — picked for safety guardrails and IBM support, not chat quality.

Quick start

ollama run granite3.2:8b

Or use the open-source MCP server to query this model from Claude Desktop, Cursor, or any MCP-compatible client.

Tools

Is Granite 3.2 8B Instruct the right pick for you?

Compute self-hosted ROI → Back to catalog