BestLLMfor EN Your hardware. Your LLM. Your call.
APIOpen data Find my LLM
Model fiche

Gemma 3 4B

By Google · United States

chat general vision multilingual small
Parameters
4B
License
Gemma
Context
125k
VRAM (Q4)
10 GB
Released
March 2025

Overview

Google's compact multimodal 4B with 128K context, vision input, and 140+ language coverage. The smallest Gemma 3 with the full feature set intact.

When to pick this model

  • You need vision and long context on a low-VRAM machine or edge device
  • You're shipping multilingual apps and need broad language coverage
  • You want one small model for both text and image inputs
  • You're prototyping before scaling to 12B or 27B

VRAM requirements by quantization

QuantizationVRAM required
Q4_K_M (recommended)10 GB
Q5_K_M12 GB
Q8_018 GB
FP16 (no quantization)33 GB

VRAM figures include model weights plus a typical 8k KV cache and ~600 MB runtime overhead (Ollama / llama.cpp baseline). Add headroom for higher context lengths.

Strengths

  • Multimodal in a 4B footprint
  • 140+ language coverage
  • 128K context
  • Sliding-window attention keeps memory in check

Limitations

  • Gemma License — review terms before commercial use
  • Trails the 12B and 27B on reasoning and code

Architecture & training

Architecture: Dense · multimodal text+vision · sliding-window attention (5:1 local:global)

Training: 4T tokens, 140+ languages.

Verdict

The most capable 4B multimodal you can run locally — strong default for resource-constrained deployments.

Quick start

ollama run gemma3:4b

Or use the open-source MCP server to query this model from Claude Desktop, Cursor, or any MCP-compatible client.

Tools

Is Gemma 3 4B the right pick for you?

Compute self-hosted ROI → Back to catalog