Model fiche
Gemma 3n E4B
By Google · United States
chat
general
multilingual
small
Overview
Google's full Gemma 3n with 4B effective parameters (8B raw) and nested MatFormer architecture. Native multimodal across 140 languages for high-end mobile deployments.
When to pick this model
- High-end mobile or edge devices needing multimodal input
- Multilingual on-device assistants across 140 languages
- Image-aware mobile workflows
- Replacing E2B when accuracy matters more than RAM
VRAM requirements by quantization
| Quantization | VRAM required |
|---|---|
| Q4_K_M (recommended) | 4.5 GB |
| Q5_K_M | 5.5 GB |
| Q8_0 | 8 GB |
| FP16 (no quantization) | 14 GB |
VRAM figures include model weights plus a typical 8k KV cache and ~600 MB runtime overhead (Ollama / llama.cpp baseline). Add headroom for higher context lengths.
Strengths
- 4B effective parameters punch well above mobile-class weights
- Integrated multimodal — text and image input
- 140 language coverage
- Open Gemma license
Limitations
- 32k context only
- Beaten by Gemma 3 12B in desktop scenarios
- Gemma license — less permissive than Apache 2.0
- Multimodal support uneven across runtimes
Architecture & training
Architecture: Gemma 3n E4B · on-device architecture · 4B effective
Training: Google Gemma 3n 4B, multimodal text+image, 140 languages.
Verdict
The full-fat Gemma 3n — strong mobile multimodal with surprising quality, if Gemma's license fits your use case.
Quick start
ollama run gemma3n:e4bOr use the open-source MCP server to query this model from Claude Desktop, Cursor, or any MCP-compatible client.