Model fiche
Gemma 4 31B
By Google · United States
chat
general
vision
audio
multilingual
Overview
Google's dense 31B multimodal model with native text, image, and audio support across 140+ languages. Ranked #3 on Chatbot Arena's open leaderboard with a 256K context window.
When to pick this model
- Multilingual production apps spanning 100+ languages
- Native audio input and analysis workflows
- Long-context document and codebase analysis
- On-prem multimodal chat backends
- Replacing GPT-4o-class APIs with local weights
VRAM requirements by quantization
| Quantization | VRAM required |
|---|---|
| Q4_K_M (recommended) | 18 GB |
| Q5_K_M | 22 GB |
| Q8_0 | 33 GB |
| FP16 (no quantization) | 62 GB |
VRAM figures include model weights plus a typical 8k KV cache and ~600 MB runtime overhead (Ollama / llama.cpp baseline). Add headroom for higher context lengths.
Strengths
- #3 on Chatbot Arena's open leaderboard
- Native audio understanding, not just text-to-image
- 256K context window in a dense 31B model
- Strong coverage across 140+ languages
- Backed by Google's training infrastructure
Limitations
- Gemma license is more restrictive than Apache 2.0
- 31B dense model needs ~20GB VRAM in Q4
- Audio quality trails purpose-built ASR models
Architecture & training
Architecture: Dense 31B · multimodal text+image+audio · 256k ctx
Training: 140+ languages.
Verdict
The best open multimodal generalist of the Gemma line, assuming you can live with the Gemma license.
Quick start
ollama run gemma4:31bOr use the open-source MCP server to query this model from Claude Desktop, Cursor, or any MCP-compatible client.