Gemma 2 27B
By Google · United States
Overview
The flagship of the Gemma 2 family from Google. Approaches 70B-class quality on a single 24GB GPU at Q4, with strong multilingual coverage.
When to pick this model
- Self-hosted assistants needing high quality on one consumer GPU
- Multilingual content workflows including French
- Instruction-heavy tasks like classification and structured output
- Workloads that don't require long context
- Replacing 70B models when VRAM is constrained
VRAM requirements by quantization
| Quantization | VRAM required |
|---|---|
| Q4_K_M (recommended) | 16 GB |
| Q5_K_M | 19 GB |
| Q8_0 | 29 GB |
| FP16 (no quantization) | 54 GB |
VRAM figures include model weights plus a typical 8k KV cache and ~600 MB runtime overhead (Ollama / llama.cpp baseline). Add headroom for higher context lengths.
Published benchmark scores
| Benchmark | Score |
|---|---|
| MMLU | 75.2 |
| HellaSwag | 89.5 |
| HumanEval | 51.8 |
Scores published by the model author or aggregated from public leaderboards. Re-measured monthly by our editorial team.
Strengths
- Quality close to 70B-class models
- Runs in roughly 16GB VRAM at Q4
- Strong instruction following
- Robust multilingual output
Limitations
- 8k context is a major handicap in 2025
- Gemma license is less permissive than Apache 2.0
- No vision in this checkpoint
Architecture & training
Architecture: Dense Transformer · Gemma 2 27B · logit-softcapping
Training: 13T tokens. The largest in the Gemma 2 family.
Excellent raw quality undermined by an 8k context — only pick it when your prompts stay short.
Quick start
ollama run gemma2:27bOr use the open-source MCP server to query this model from Claude Desktop, Cursor, or any MCP-compatible client.