BestLLMfor EN Your hardware. Your LLM. Your call.
APIOpen data Find my LLM
Model fiche

Gemma 2 27B

By Google · United States

chat general
Parameters
27B
License
Gemma
Context
8k
VRAM (Q4)
16 GB
Released
June 2024

Overview

The flagship of the Gemma 2 family from Google. Approaches 70B-class quality on a single 24GB GPU at Q4, with strong multilingual coverage.

When to pick this model

  • Self-hosted assistants needing high quality on one consumer GPU
  • Multilingual content workflows including French
  • Instruction-heavy tasks like classification and structured output
  • Workloads that don't require long context
  • Replacing 70B models when VRAM is constrained

VRAM requirements by quantization

QuantizationVRAM required
Q4_K_M (recommended)16 GB
Q5_K_M19 GB
Q8_029 GB
FP16 (no quantization)54 GB

VRAM figures include model weights plus a typical 8k KV cache and ~600 MB runtime overhead (Ollama / llama.cpp baseline). Add headroom for higher context lengths.

Published benchmark scores

BenchmarkScore
MMLU75.2
HellaSwag89.5
HumanEval51.8

Scores published by the model author or aggregated from public leaderboards. Re-measured monthly by our editorial team.

Strengths

  • Quality close to 70B-class models
  • Runs in roughly 16GB VRAM at Q4
  • Strong instruction following
  • Robust multilingual output

Limitations

  • 8k context is a major handicap in 2025
  • Gemma license is less permissive than Apache 2.0
  • No vision in this checkpoint

Architecture & training

Architecture: Dense Transformer · Gemma 2 27B · logit-softcapping

Training: 13T tokens. The largest in the Gemma 2 family.

Verdict

Excellent raw quality undermined by an 8k context — only pick it when your prompts stay short.

Quick start

ollama run gemma2:27b

Or use the open-source MCP server to query this model from Claude Desktop, Cursor, or any MCP-compatible client.

Tools

Is Gemma 2 27B the right pick for you?

Compute self-hosted ROI → Back to catalog