SmolLM3 3B
By HuggingFace · France
Overview
HuggingFace's 3B model with dual think/no-think modes, 128k context, and full open data and recipe — punching at MMLU 59.7 and GSM8K 70.9.
When to pick this model
- Edge devices and laptops needing real reasoning at 3B
- Long-context tasks where larger models aren't viable
- Multilingual chat across the six supported European languages
- Educational and research use needing fully open training
- Latency-sensitive applications wanting toggleable thinking
VRAM requirements by quantization
| Quantization | VRAM required |
|---|---|
| Q4_K_M (recommended) | 2 GB |
| Q5_K_M | 2.5 GB |
| Q8_0 | 4 GB |
| FP16 (no quantization) | 6 GB |
VRAM figures include model weights plus a typical 8k KV cache and ~600 MB runtime overhead (Ollama / llama.cpp baseline). Add headroom for higher context lengths.
Published benchmark scores
| Benchmark | Score |
|---|---|
| MMLU | 59.7 |
| GSM8K | 70.9 |
Scores published by the model author or aggregated from public leaderboards. Re-measured monthly by our editorial team.
Strengths
- Compact think mode delivers reasoning at 3B scale
- Native support for six languages
- Apache 2.0 with fully open training data and recipe
- 128k context unusual at this size
- Strong MMLU and GSM8K for the parameter count
Limitations
- No official Ollama distribution — needs manual setup
- Quality ceiling typical of 3B dense models on hard tasks
- Smaller community than competing 3B releases
Architecture & training
Architecture: Dense 3B · dual-mode think/no-think · 64k native + YaRN
Training: Fully open (data + recipe).
The reasoning-capable 3B to beat — ideal for edge deployments that still need think-mode and 128k context.
Quick start
# HuggingFace : HuggingFaceTB/SmolLM3-3BOr use the open-source MCP server to query this model from Claude Desktop, Cursor, or any MCP-compatible client.