BestLLMfor EN Your hardware. Your LLM. Your call.
APIOpen data Find my LLM
Model fiche

MedGemma 4B

By Google · United States

chat vision multilingual small
Parameters
4B
License
Gemma
Context
125k
VRAM (Q4)
2.3 GB
Released
20 April 2026

Overview

Google's 4B medical variant of Gemma with vision and text, tuned for radiology, clinical imaging, and report drafting. 128k context, Gemma license.

When to pick this model

  • Drafting or summarizing radiology and clinical reports
  • Prototyping medical imaging assistants on a single small GPU
  • Research into multimodal clinical NLP without API costs
  • Edge deployments in healthcare workflows where data can't leave the device

VRAM requirements by quantization

QuantizationVRAM required
Q4_K_M (recommended)2.3 GB
Q5_K_M2.8 GB
Q8_04.3 GB
FP16 (no quantization)8 GB

VRAM figures include model weights plus a typical 8k KV cache and ~600 MB runtime overhead (Ollama / llama.cpp baseline). Add headroom for higher context lengths.

Strengths

  • Domain-tuned on clinical literature and radiology imagery
  • Compact 4B footprint (~2.3GB VRAM at Q4)
  • True multimodal — text plus medical images
  • Permissive Gemma license for research and most commercial use

Limitations

  • Decision-support only — not approved for direct clinical use
  • Narrow specialization; weak outside medical contexts
  • Gated on Hugging Face

Architecture & training

Architecture: Gemma 4 · 4B parameters · multimodal text + image · 128k context

Training: Medical variant of Gemma, fine-tuned on clinical literature, radiology imaging, and medical reports.

Verdict

A pocket-sized clinical assistant for research and report drafting — never a substitute for a licensed clinician.

Quick start

ollama run medgemma

Or use the open-source MCP server to query this model from Claude Desktop, Cursor, or any MCP-compatible client.

Tools

Is MedGemma 4B the right pick for you?

Compute self-hosted ROI → Back to catalog