Model fiche
MedGemma 4B
By Google · United States
chat
vision
multilingual
small
Overview
Google's 4B medical variant of Gemma with vision and text, tuned for radiology, clinical imaging, and report drafting. 128k context, Gemma license.
When to pick this model
- Drafting or summarizing radiology and clinical reports
- Prototyping medical imaging assistants on a single small GPU
- Research into multimodal clinical NLP without API costs
- Edge deployments in healthcare workflows where data can't leave the device
VRAM requirements by quantization
| Quantization | VRAM required |
|---|---|
| Q4_K_M (recommended) | 2.3 GB |
| Q5_K_M | 2.8 GB |
| Q8_0 | 4.3 GB |
| FP16 (no quantization) | 8 GB |
VRAM figures include model weights plus a typical 8k KV cache and ~600 MB runtime overhead (Ollama / llama.cpp baseline). Add headroom for higher context lengths.
Strengths
- Domain-tuned on clinical literature and radiology imagery
- Compact 4B footprint (~2.3GB VRAM at Q4)
- True multimodal — text plus medical images
- Permissive Gemma license for research and most commercial use
Limitations
- Decision-support only — not approved for direct clinical use
- Narrow specialization; weak outside medical contexts
- Gated on Hugging Face
Architecture & training
Architecture: Gemma 4 · 4B parameters · multimodal text + image · 128k context
Training: Medical variant of Gemma, fine-tuned on clinical literature, radiology imaging, and medical reports.
Verdict
A pocket-sized clinical assistant for research and report drafting — never a substitute for a licensed clinician.
Quick start
ollama run medgemmaOr use the open-source MCP server to query this model from Claude Desktop, Cursor, or any MCP-compatible client.