Phi-4 14B
By Microsoft · United States
Overview
Microsoft's Phi-4 14B, trained on ultra-curated synthetic data with a heavy STEM bias. The 14B reasoning leader at the end of 2024.
When to pick this model
- Math, science, and structured reasoning workloads
- Coding assistants where quality beats context length
- MIT-licensed commercial deployments
- Mid-size GPU deployments needing strong reasoning
- Replacing larger models on STEM-heavy benchmarks
VRAM requirements by quantization
| Quantization | VRAM required |
|---|---|
| Q4_K_M (recommended) | 9 GB |
| Q5_K_M | 11 GB |
| Q8_0 | 16 GB |
| FP16 (no quantization) | 28 GB |
VRAM figures include model weights plus a typical 8k KV cache and ~600 MB runtime overhead (Ollama / llama.cpp baseline). Add headroom for higher context lengths.
Published benchmark scores
| Benchmark | Score |
|---|---|
| MMLU | 84.8 |
| MATH | 80.4 |
| HumanEval | 82.6 |
Scores published by the model author or aggregated from public leaderboards. Re-measured monthly by our editorial team.
Strengths
- Top-tier 14B reasoning at release
- MIT license
- Strong math, science, and code performance
- Tight, well-formatted outputs
Limitations
- 16k context is a significant limitation
- Weaker multilingual coverage than Qwen
- Narrower world knowledge from synthetic training
Architecture & training
Architecture: Dense · 14B · Phi-4 · Microsoft-exclusive synthetic data
Training: Ultra-filtered Microsoft synthetic corpus. Focus on reasoning and math.
The reasoning-focused 14B to pick — just budget around its short context window.
Quick start
ollama run phi4:14bOr use the open-source MCP server to query this model from Claude Desktop, Cursor, or any MCP-compatible client.