Phi-4 Mini 3.8B
By Microsoft · United States
Overview
Microsoft's 3.8B Phi-4 Mini under MIT with native function calling, 128k context via LongRoPE, and a 200k vocab. MMLU 67.3 and HumanEval 74.4.
When to pick this model
- Tool-using agents on minimal hardware
- On-device assistants requiring function calls
- MIT-licensed embedded deployments
- Long-context document tasks in a small model
VRAM requirements by quantization
| Quantization | VRAM required |
|---|---|
| Q4_K_M (recommended) | 10 GB |
| Q5_K_M | 12 GB |
| Q8_0 | 18 GB |
| FP16 (no quantization) | 33 GB |
VRAM figures include model weights plus a typical 8k KV cache and ~600 MB runtime overhead (Ollama / llama.cpp baseline). Add headroom for higher context lengths.
Published benchmark scores
| Benchmark | Score |
|---|---|
| MMLU | 67.3 |
| HumanEval | 74.4 |
| MATH | 71.5 |
Scores published by the model author or aggregated from public leaderboards. Re-measured monthly by our editorial team.
Strengths
- Native function calling at 3.8B
- 128k context via LongRoPE
- MIT license
- 200k vocabulary improves tokenization efficiency
Limitations
- English-first — multilingual coverage is thin
- Outscored on raw quality by Qwen 2.5 3B
- Tool-calling reliability still trails larger models
Architecture & training
Architecture: Dense 3.8B · GQA · LongRoPE · shared embeddings · 200k vocab
Training: High-quality Phi corpus.
The MIT-licensed pick for small tool-using agents — strong function calling and 128k context in a 3.8B footprint.
Quick start
ollama run phi4-mini:3.8bOr use the open-source MCP server to query this model from Claude Desktop, Cursor, or any MCP-compatible client.