BestLLMfor EN Your hardware. Your LLM. Your call.
APIOpen data Find my LLM
Model fiche

Gemma 4 26B-A4B MoE

By Google · United States

chat general vision audio multilingual moe
Parameters
26B
License
Gemma
Context
125k
VRAM (Q4)
16 GB
Released
April 2026

Overview

Google's MoE variant of Gemma 4 with 26B total / 4B active params and full text+image+audio multimodality. The smallest open model with native audio understanding at this quality.

When to pick this model

  • Multimodal apps that need text, image, and audio in one model
  • Voice-driven assistants and audio analysis pipelines
  • Long-context reasoning over mixed-media inputs (128k)
  • On-prem deployments where Google's tooling integrates cleanly
  • Replacing three separate models with one

VRAM requirements by quantization

QuantizationVRAM required
Q4_K_M (recommended)16 GB
Q5_K_M19 GB
Q8_028 GB
FP16 (no quantization)52 GB

VRAM figures include model weights plus a typical 8k KV cache and ~600 MB runtime overhead (Ollama / llama.cpp baseline). Add headroom for higher context lengths.

Strengths

  • Unified text, image, and audio in 26B/4B-active MoE
  • 128k context
  • Strong reasoning relative to size
  • Backed by Google's training infrastructure and corpus
  • 4B active params keep inference cheap

Limitations

  • Around 16 GB VRAM in Q4
  • Gated on Hugging Face with click-through agreement
  • Gemma license has more restrictions than Apache or MIT

Architecture & training

Architecture: MoE · 26B · Gemma 4 · multimodal text+image+audio · 128k context

Training: Google Gemma 4 MoE 26B — natively multimodal with audio, vision, and text.

Verdict

The most capable open multimodal model under 30B if you can live with the Gemma license.

Quick start

ollama run gemma4:26b-moe

Or use the open-source MCP server to query this model from Claude Desktop, Cursor, or any MCP-compatible client.

Tools

Is Gemma 4 26B-A4B MoE the right pick for you?

Compute self-hosted ROI → Back to catalog