Granite 4.1 30B Instruct
By IBM · United States
Overview
IBM's dense 30B Granite 4.1: Apache 2.0, 12 languages, 131k context, with OpenAI-compatible tool calling. Built on the same GB200 NVL72 cluster as the rest of the 4.1 lineup.
When to pick this model
- Enterprise agents requiring OpenAI-compatible function calling
- Apache 2.0 deployments where Granite 8B isn't enough
- Multilingual products across 12 languages including French
- Long-context workflows up to 131k
- Single-GPU production on RTX 5090 or A100 class hardware
VRAM requirements by quantization
| Quantization | VRAM required |
|---|---|
| Q4_K_M (recommended) | 17 GB |
| Q5_K_M | 21 GB |
| Q8_0 | 32 GB |
| FP16 (no quantization) | 60 GB |
VRAM figures include model weights plus a typical 8k KV cache and ~600 MB runtime overhead (Ollama / llama.cpp baseline). Add headroom for higher context lengths.
Strengths
- Apache 2.0 with IBM-grade transparency
- Native OpenAI function-calling schema
- 12 languages including French
- 131k context window
- Official Ollama tag with multiple quantizations
Limitations
- Needs ~32 GB VRAM at Q4 — RTX 5090 territory
- No MoE variant at this size
- Non-English reasoning trails English
Architecture & training
Architecture: Dense Transformer · 64 layers · GQA 32Q/8KV · embedding 4096 · MLP hidden 32,768 · SwiGLU · RoPE · RMSNorm
Training: Fine-tuned from Granite-4.1-30B-Base. SFT + RL alignment pipeline. 12 languages: EN, DE, ES, FR, JA, PT, AR, CS, IT, KO, NL, ZH. NVIDIA GB200 NVL72 cluster (CoreWeave).
The Granite to pick when 8B feels light: Apache 2.0, function-calling native, and built for enterprise.
Quick start
ollama run granite4.1:30bOr use the open-source MCP server to query this model from Claude Desktop, Cursor, or any MCP-compatible client.