BestLLMfor EN Your hardware. Your LLM. Your call.
APIOpen data Find my LLM
Model fiche

Gemma 3n E4B

By Google · United States

chat general multilingual small
Parameters
4B
License
Gemma
Context
32k
VRAM (Q4)
4.5 GB
Released
May 2025

Overview

Google's full Gemma 3n with 4B effective parameters (8B raw) and nested MatFormer architecture. Native multimodal across 140 languages for high-end mobile deployments.

When to pick this model

  • High-end mobile or edge devices needing multimodal input
  • Multilingual on-device assistants across 140 languages
  • Image-aware mobile workflows
  • Replacing E2B when accuracy matters more than RAM

VRAM requirements by quantization

QuantizationVRAM required
Q4_K_M (recommended)4.5 GB
Q5_K_M5.5 GB
Q8_08 GB
FP16 (no quantization)14 GB

VRAM figures include model weights plus a typical 8k KV cache and ~600 MB runtime overhead (Ollama / llama.cpp baseline). Add headroom for higher context lengths.

Strengths

  • 4B effective parameters punch well above mobile-class weights
  • Integrated multimodal — text and image input
  • 140 language coverage
  • Open Gemma license

Limitations

  • 32k context only
  • Beaten by Gemma 3 12B in desktop scenarios
  • Gemma license — less permissive than Apache 2.0
  • Multimodal support uneven across runtimes

Architecture & training

Architecture: Gemma 3n E4B · on-device architecture · 4B effective

Training: Google Gemma 3n 4B, multimodal text+image, 140 languages.

Verdict

The full-fat Gemma 3n — strong mobile multimodal with surprising quality, if Gemma's license fits your use case.

Quick start

ollama run gemma3n:e4b

Or use the open-source MCP server to query this model from Claude Desktop, Cursor, or any MCP-compatible client.

Tools

Is Gemma 3n E4B the right pick for you?

Compute self-hosted ROI → Back to catalog