Phi-3.5 Mini
By Microsoft · United States
Overview
Microsoft's Phi-3.5 Mini, a 3.8B model trained on heavily curated synthetic data with a 128k context. Punches above its weight on reasoning.
When to pick this model
- Long-context tasks on hardware that can't fit a 7B
- Reasoning-heavy workloads at small size
- MIT-licensed embedded or commercial deployments
- Latency-critical assistants on consumer hardware
- STEM-focused tutoring and Q&A apps
VRAM requirements by quantization
| Quantization | VRAM required |
|---|---|
| Q4_K_M (recommended) | 10 GB |
| Q5_K_M | 12 GB |
| Q8_0 | 18 GB |
| FP16 (no quantization) | 33 GB |
VRAM figures include model weights plus a typical 8k KV cache and ~600 MB runtime overhead (Ollama / llama.cpp baseline). Add headroom for higher context lengths.
Published benchmark scores
| Benchmark | Score |
|---|---|
| MMLU | 69 |
| HumanEval | 62.8 |
Scores published by the model author or aggregated from public leaderboards. Re-measured monthly by our editorial team.
Strengths
- 128k context in a 3.8B footprint
- MIT license with no commercial restrictions
- Fast inference on modest hardware
- Strong reasoning relative to its size
Limitations
- Memory footprint is high for a 3.8B at full context
- Outclassed by Phi-4 14B on overall quality
- Synthetic-heavy training can show as a narrow knowledge base
Architecture & training
Architecture: Dense · 3.8B · Phi-3.5 Mini · sliding window + FlashAttention
Training: High-quality synthetic data from Microsoft. Strong educational focus.
A clever small model with a huge context — useful when you need 128k tokens and minimal VRAM.
Quick start
ollama run phi3.5Or use the open-source MCP server to query this model from Claude Desktop, Cursor, or any MCP-compatible client.