BestLLMfor EN Your hardware. Your LLM. Your call.
APIOpen data Find my LLM
Model fiche

Gemma 4 31B

By Google · United States

chat general vision audio multilingual
Parameters
31B
License
Gemma
Context
250k
VRAM (Q4)
18 GB
Released
April 2026

Overview

Google's dense 31B multimodal model with native text, image, and audio support across 140+ languages. Ranked #3 on Chatbot Arena's open leaderboard with a 256K context window.

When to pick this model

  • Multilingual production apps spanning 100+ languages
  • Native audio input and analysis workflows
  • Long-context document and codebase analysis
  • On-prem multimodal chat backends
  • Replacing GPT-4o-class APIs with local weights

VRAM requirements by quantization

QuantizationVRAM required
Q4_K_M (recommended)18 GB
Q5_K_M22 GB
Q8_033 GB
FP16 (no quantization)62 GB

VRAM figures include model weights plus a typical 8k KV cache and ~600 MB runtime overhead (Ollama / llama.cpp baseline). Add headroom for higher context lengths.

Strengths

  • #3 on Chatbot Arena's open leaderboard
  • Native audio understanding, not just text-to-image
  • 256K context window in a dense 31B model
  • Strong coverage across 140+ languages
  • Backed by Google's training infrastructure

Limitations

  • Gemma license is more restrictive than Apache 2.0
  • 31B dense model needs ~20GB VRAM in Q4
  • Audio quality trails purpose-built ASR models

Architecture & training

Architecture: Dense 31B · multimodal text+image+audio · 256k ctx

Training: 140+ languages.

Verdict

The best open multimodal generalist of the Gemma line, assuming you can live with the Gemma license.

Quick start

ollama run gemma4:31b

Or use the open-source MCP server to query this model from Claude Desktop, Cursor, or any MCP-compatible client.

Tools

Is Gemma 4 31B the right pick for you?

Compute self-hosted ROI → Back to catalog