BestLLMfor EN Your hardware. Your LLM. Your call.
APIOpen data Find my LLM
Model fiche

Granite 4.1

By IBM · United States

chat code
Parameters
3B
License
Apache 2.0
Context
125k
VRAM (Q4)
1.7 GB
Released
May 2026

Overview

IBM's Granite 4.1 in its generic 3B Ollama tag — Apache 2.0, 128k context, robust tool calling, and a sub-2GB Q4 footprint. Code- and chat-oriented.

When to pick this model

  • Agent and tool-use pipelines where license clarity matters
  • Enterprise deployments that require fully permissive Apache 2.0 weights
  • Multilingual chat across the 12 languages IBM trained on
  • Lightweight coding assistants on developer machines

VRAM requirements by quantization

QuantizationVRAM required
Q4_K_M (recommended)1.7 GB
Q5_K_M2.1 GB
Q8_03.2 GB
FP16 (no quantization)6 GB

VRAM figures include model weights plus a typical 8k KV cache and ~600 MB runtime overhead (Ollama / llama.cpp baseline). Add headroom for higher context lengths.

Strengths

  • Fully Apache 2.0 — no click-through, no commercial caveats
  • 128k context window
  • OpenAI-compatible tool calling that works reliably
  • Compact ~1.7GB VRAM at Q4

Limitations

  • Hugging Face distribution is gated even though the license is open
  • Generic Ollama tag doesn't pin a specific size variant

Architecture & training

Architecture: Dense transformer · 3B parameters · 128k context

Training: IBM's Granite 4.1 family. Multilingual training (12 languages), OpenAI-compatible tool calling.

Verdict

The pragmatic Apache 2.0 default for agentic workflows when license friction is a non-starter.

Quick start

ollama run granite4.1

Or use the open-source MCP server to query this model from Claude Desktop, Cursor, or any MCP-compatible client.

Tools

Is Granite 4.1 the right pick for you?

Compute self-hosted ROI → Back to catalog