Model fiche
Granite 4.0 3B Vision
By IBM · United States
vision
chat
small
Overview
IBM's 3B vision-language model purpose-built for enterprise document extraction, including OCR, table parsing, and form understanding. Apache 2.0 and laptop-deployable.
When to pick this model
- Enterprise document and form extraction pipelines
- OCR replacement for invoices, receipts, and PDFs
- Table structure understanding at scale
- Apache-licensed on-prem document AI
- Edge deployment for sensitive enterprise data
VRAM requirements by quantization
| Quantization | VRAM required |
|---|---|
| Q4_K_M (recommended) | 2.2 GB |
| Q5_K_M | 2.7 GB |
| Q8_0 | 3.8 GB |
| FP16 (no quantization) | 6.5 GB |
VRAM figures include model weights plus a typical 8k KV cache and ~600 MB runtime overhead (Ollama / llama.cpp baseline). Add headroom for higher context lengths.
Strengths
- Fast, accurate enterprise OCR
- Strong table and form-field extraction
- Apache 2.0 license
- Runs comfortably on a laptop
Limitations
- 16K context limits multi-page documents
- English-first, weak on non-Latin scripts
- Narrow scope, not a general-purpose VLM
Architecture & training
Architecture: Dense 3B VLM · specialized for enterprise documents
Training: IBM Granite 4.0 family.
Verdict
The best small VLM for enterprise document workflows under an Apache license.
Quick start
# HuggingFace : ibm-granite/granite-4.0-3b-visionOr use the open-source MCP server to query this model from Claude Desktop, Cursor, or any MCP-compatible client.