BestLLMfor EN Your hardware. Your LLM. Your call.
APIOpen data Find my LLM
Model fiche

Granite 4.1 3B Instruct

By IBM · United States

chat general code multilingual small
Parameters
3B
License
Apache 2.0
Context
128k
VRAM (Q4)
2 GB
Released
29 April 2026

Overview

IBM's dense 3B Granite 4.1: Apache 2.0, 12 languages, 131k context, with tool calling and FIM code support. The smallest Granite tier, sharing data and pipeline with its larger siblings.

When to pick this model

  • Edge and embedded deployments needing ~3 GB VRAM
  • Code completion with fill-in-the-middle
  • Tool-calling agents on resource-constrained hardware
  • Multilingual apps across 12 languages
  • Long-context tasks at very small scale (131k)

VRAM requirements by quantization

QuantizationVRAM required
Q4_K_M (recommended)2 GB
Q5_K_M2.5 GB
Q8_03 GB
FP16 (no quantization)6 GB

VRAM figures include model weights plus a typical 8k KV cache and ~600 MB runtime overhead (Ollama / llama.cpp baseline). Add headroom for higher context lengths.

Strengths

  • Apache 2.0 with full openness
  • Tool calling plus FIM code completion
  • 12 languages including French
  • 131k context at only 3B params
  • Fits in 3 GB VRAM

Limitations

  • Reasoning lags the 8B and 30B siblings
  • Demanding chat use cases really want the 8B model

Architecture & training

Architecture: Dense Transformer · 40 layers · GQA 40Q/8KV · embedding 2560 · MLP hidden 8,192 · SwiGLU · RoPE · RMSNorm

Training: SFT + RL alignment, same data and pipeline as the 8B and 30B. 12 languages including FR. NVIDIA GB200 NVL72 cluster.

Verdict

A serious 3B option for edge and embedded — same Granite recipe, just smaller.

Quick start

ollama run granite4.1:3b

Or use the open-source MCP server to query this model from Claude Desktop, Cursor, or any MCP-compatible client.

Tools

Is Granite 4.1 3B Instruct the right pick for you?

Compute self-hosted ROI → Back to catalog