BestLLMfor EN Your hardware. Your LLM. Your call.
APIOpen data Find my LLM
Model fiche

Phi-3.5 Mini

By Microsoft · United States

chat small
Parameters
3.8B
License
MIT
Context
128k
VRAM (Q4)
10 GB
Released
August 2024

Overview

Microsoft's Phi-3.5 Mini, a 3.8B model trained on heavily curated synthetic data with a 128k context. Punches above its weight on reasoning.

When to pick this model

  • Long-context tasks on hardware that can't fit a 7B
  • Reasoning-heavy workloads at small size
  • MIT-licensed embedded or commercial deployments
  • Latency-critical assistants on consumer hardware
  • STEM-focused tutoring and Q&A apps

VRAM requirements by quantization

QuantizationVRAM required
Q4_K_M (recommended)10 GB
Q5_K_M12 GB
Q8_018 GB
FP16 (no quantization)33 GB

VRAM figures include model weights plus a typical 8k KV cache and ~600 MB runtime overhead (Ollama / llama.cpp baseline). Add headroom for higher context lengths.

Published benchmark scores

BenchmarkScore
MMLU69
HumanEval62.8

Scores published by the model author or aggregated from public leaderboards. Re-measured monthly by our editorial team.

Strengths

  • 128k context in a 3.8B footprint
  • MIT license with no commercial restrictions
  • Fast inference on modest hardware
  • Strong reasoning relative to its size

Limitations

  • Memory footprint is high for a 3.8B at full context
  • Outclassed by Phi-4 14B on overall quality
  • Synthetic-heavy training can show as a narrow knowledge base

Architecture & training

Architecture: Dense · 3.8B · Phi-3.5 Mini · sliding window + FlashAttention

Training: High-quality synthetic data from Microsoft. Strong educational focus.

Verdict

A clever small model with a huge context — useful when you need 128k tokens and minimal VRAM.

Quick start

ollama run phi3.5

Or use the open-source MCP server to query this model from Claude Desktop, Cursor, or any MCP-compatible client.

Tools

Is Phi-3.5 Mini the right pick for you?

Compute self-hosted ROI → Back to catalog