BestLLMfor EN Your hardware. Your LLM. Your call.
APIOpen data Find my LLM
Model fiche

Granite 3.3 8B Instruct

By IBM · United States

chat general code
Parameters
8B
License
Apache 2.0
Context
125k
VRAM (Q4)
5 GB
Released
January 2025

Overview

IBM's update to Granite 3.2 8B adding fill-in-the-middle code support and improved instruction following. Apache 2.0 with strong agent and tool-use behavior.

When to pick this model

  • Enterprise agents needing tool use and structured output
  • RAG pipelines where instruction-following reliability matters
  • Internal developer tooling combining code and chat
  • Drop-in upgrade from Granite 3.2 8B

VRAM requirements by quantization

QuantizationVRAM required
Q4_K_M (recommended)5 GB
Q5_K_M6 GB
Q8_09 GB
FP16 (no quantization)16 GB

VRAM figures include model weights plus a typical 8k KV cache and ~600 MB runtime overhead (Ollama / llama.cpp baseline). Add headroom for higher context lengths.

Strengths

  • 128k context
  • Apache 2.0 license
  • Strong agentic and tool-use behavior
  • Fill-in-the-middle code completion added
  • Better instruction following than 3.2

Limitations

  • Still very enterprise-flavored
  • Less versatile than Qwen 3 8B on open-ended chat
  • Code quality trails dedicated coders like Qwen 2.5 Coder 7B

Architecture & training

Architecture: Dense · 8B · IBM Granite 3.3 · improved agents and tool use

Training: Granite 3.2 evolution with improved agent/tool use and code.

Verdict

A clean upgrade over Granite 3.2 8B for enterprise agents — better tool use, better code, same Apache 2.0 backbone.

Quick start

ollama run granite3.3:8b

Or use the open-source MCP server to query this model from Claude Desktop, Cursor, or any MCP-compatible client.

Tools

Is Granite 3.3 8B Instruct the right pick for you?

Compute self-hosted ROI → Back to catalog