Model fiche
Granite 3.3 8B Instruct
By IBM · United States
chat
general
code
Overview
IBM's update to Granite 3.2 8B adding fill-in-the-middle code support and improved instruction following. Apache 2.0 with strong agent and tool-use behavior.
When to pick this model
- Enterprise agents needing tool use and structured output
- RAG pipelines where instruction-following reliability matters
- Internal developer tooling combining code and chat
- Drop-in upgrade from Granite 3.2 8B
VRAM requirements by quantization
| Quantization | VRAM required |
|---|---|
| Q4_K_M (recommended) | 5 GB |
| Q5_K_M | 6 GB |
| Q8_0 | 9 GB |
| FP16 (no quantization) | 16 GB |
VRAM figures include model weights plus a typical 8k KV cache and ~600 MB runtime overhead (Ollama / llama.cpp baseline). Add headroom for higher context lengths.
Strengths
- 128k context
- Apache 2.0 license
- Strong agentic and tool-use behavior
- Fill-in-the-middle code completion added
- Better instruction following than 3.2
Limitations
- Still very enterprise-flavored
- Less versatile than Qwen 3 8B on open-ended chat
- Code quality trails dedicated coders like Qwen 2.5 Coder 7B
Architecture & training
Architecture: Dense · 8B · IBM Granite 3.3 · improved agents and tool use
Training: Granite 3.2 evolution with improved agent/tool use and code.
Verdict
A clean upgrade over Granite 3.2 8B for enterprise agents — better tool use, better code, same Apache 2.0 backbone.
Quick start
ollama run granite3.3:8bOr use the open-source MCP server to query this model from Claude Desktop, Cursor, or any MCP-compatible client.