BestLLMfor EN Your hardware. Your LLM. Your call.
APIOpen data Find my LLM
Model fiche

Phi-4 Mini 3.8B

By Microsoft · United States

chat general small
Parameters
3.8B
License
MIT
Context
125k
VRAM (Q4)
10 GB
Released
February 2025

Overview

Microsoft's 3.8B Phi-4 Mini under MIT with native function calling, 128k context via LongRoPE, and a 200k vocab. MMLU 67.3 and HumanEval 74.4.

When to pick this model

  • Tool-using agents on minimal hardware
  • On-device assistants requiring function calls
  • MIT-licensed embedded deployments
  • Long-context document tasks in a small model

VRAM requirements by quantization

QuantizationVRAM required
Q4_K_M (recommended)10 GB
Q5_K_M12 GB
Q8_018 GB
FP16 (no quantization)33 GB

VRAM figures include model weights plus a typical 8k KV cache and ~600 MB runtime overhead (Ollama / llama.cpp baseline). Add headroom for higher context lengths.

Published benchmark scores

BenchmarkScore
MMLU67.3
HumanEval74.4
MATH71.5

Scores published by the model author or aggregated from public leaderboards. Re-measured monthly by our editorial team.

Strengths

  • Native function calling at 3.8B
  • 128k context via LongRoPE
  • MIT license
  • 200k vocabulary improves tokenization efficiency

Limitations

  • English-first — multilingual coverage is thin
  • Outscored on raw quality by Qwen 2.5 3B
  • Tool-calling reliability still trails larger models

Architecture & training

Architecture: Dense 3.8B · GQA · LongRoPE · shared embeddings · 200k vocab

Training: High-quality Phi corpus.

Verdict

The MIT-licensed pick for small tool-using agents — strong function calling and 128k context in a 3.8B footprint.

Quick start

ollama run phi4-mini:3.8b

Or use the open-source MCP server to query this model from Claude Desktop, Cursor, or any MCP-compatible client.

Tools

Is Phi-4 Mini 3.8B the right pick for you?

Compute self-hosted ROI → Back to catalog