Model fiche
Granite 4.1
By IBM · United States
chat
code
Overview
IBM's Granite 4.1 in its generic 3B Ollama tag — Apache 2.0, 128k context, robust tool calling, and a sub-2GB Q4 footprint. Code- and chat-oriented.
When to pick this model
- Agent and tool-use pipelines where license clarity matters
- Enterprise deployments that require fully permissive Apache 2.0 weights
- Multilingual chat across the 12 languages IBM trained on
- Lightweight coding assistants on developer machines
VRAM requirements by quantization
| Quantization | VRAM required |
|---|---|
| Q4_K_M (recommended) | 1.7 GB |
| Q5_K_M | 2.1 GB |
| Q8_0 | 3.2 GB |
| FP16 (no quantization) | 6 GB |
VRAM figures include model weights plus a typical 8k KV cache and ~600 MB runtime overhead (Ollama / llama.cpp baseline). Add headroom for higher context lengths.
Strengths
- Fully Apache 2.0 — no click-through, no commercial caveats
- 128k context window
- OpenAI-compatible tool calling that works reliably
- Compact ~1.7GB VRAM at Q4
Limitations
- Hugging Face distribution is gated even though the license is open
- Generic Ollama tag doesn't pin a specific size variant
Architecture & training
Architecture: Dense transformer · 3B parameters · 128k context
Training: IBM's Granite 4.1 family. Multilingual training (12 languages), OpenAI-compatible tool calling.
Verdict
The pragmatic Apache 2.0 default for agentic workflows when license friction is a non-starter.
Quick start
ollama run granite4.1Or use the open-source MCP server to query this model from Claude Desktop, Cursor, or any MCP-compatible client.