Model fiche
Gemma 4 E4B
By Google · United States
chat
general
vision
audio
multilingual
small
Overview
Google's 4B-effective multimodal Gemma variant tuned for laptops and edge devices, handling text, image, and audio across 140 languages with a 128K context.
When to pick this model
- Multimodal apps running on laptops or mobile
- Offline assistants that need image and audio input
- Multilingual edge deployments
- Low-power on-device inference
- Prototyping multimodal flows before scaling up
VRAM requirements by quantization
| Quantization | VRAM required |
|---|---|
| Q4_K_M (recommended) | 10 GB |
| Q5_K_M | 12 GB |
| Q8_0 | 18 GB |
| FP16 (no quantization) | 33 GB |
VRAM figures include model weights plus a typical 8k KV cache and ~600 MB runtime overhead (Ollama / llama.cpp baseline). Add headroom for higher context lengths.
Strengths
- Full text + image + audio in a 4B model
- Runs comfortably on laptops and high-end phones
- 128K context is generous for the size class
- 140-language coverage in a small footprint
Limitations
- Gemma license restricts some commercial uses
- Quality clearly trails 12B+ multimodal models
- Audio reasoning is functional but not robust
Architecture & training
Architecture: Dense E4B (4B effective) · multimodal text+image+audio
Training: Edge/mobile edition of Gemma 4.
Verdict
The most capable sub-5B multimodal model for edge deployments, with the usual Gemma license caveats.
Quick start
ollama run gemma4:e4bOr use the open-source MCP server to query this model from Claude Desktop, Cursor, or any MCP-compatible client.