Model fiche
Gemma 3n E2B
By Google · United States
chat
general
multilingual
small
Overview
Google's Gemma 3n with 2B effective parameters (6B raw) using MatFormer, covering 140+ languages. Optimized for mobile and edge; text-only on Ollama.
When to pick this model
- Mobile and embedded deployments where memory is scarce
- Multilingual edge inference across 140+ languages
- Battery-constrained on-device chat
- MatFormer-based research and experimentation
VRAM requirements by quantization
| Quantization | VRAM required |
|---|---|
| Q4_K_M (recommended) | 2 GB |
| Q5_K_M | 2.5 GB |
| Q8_0 | 3.5 GB |
| FP16 (no quantization) | 6 GB |
VRAM figures include model weights plus a typical 8k KV cache and ~600 MB runtime overhead (Ollama / llama.cpp baseline). Add headroom for higher context lengths.
Strengths
- Built specifically for mobile and edge hardware
- 140+ language coverage in a tiny footprint
- MatFormer architecture maximizes memory efficiency
- Per-layer shared embeddings cut RAM use
Limitations
- 32k context only
- Absolute quality trails Gemma 3 9B
- Gemma license — not as permissive as Apache 2.0
- Multimodal features not exposed via Ollama
Architecture & training
Architecture: Gemma 3n E2B · on-device architecture · 2B effective · matPow
Training: Google Gemma 3n, optimized for mobile/edge with shared per-layer embeddings.
Verdict
Google's most memory-efficient small model — purpose-built for mobile and edge inference, with multilingual to match.
Quick start
ollama run gemma3n:e2bOr use the open-source MCP server to query this model from Claude Desktop, Cursor, or any MCP-compatible client.