Model fiche
Falcon 3 10B Instruct
By TII · UAE
chat
general
multilingual
Overview
TII's depth-upscaled 10B successor to Falcon 3 7B, hitting MMLU 73.1 and GSM8K 83.1 — state-of-the-art under 13B at release.
When to pick this model
- General chat where 7B is too weak and 13B too costly
- Multilingual production deploys across five EU languages
- Math-leaning tasks needing GSM8K 83+ at small scale
- Replacing Llama 3 8B with stronger benchmark numbers
- Workloads benefiting from 32k context
VRAM requirements by quantization
| Quantization | VRAM required |
|---|---|
| Q4_K_M (recommended) | 6 GB |
| Q5_K_M | 8 GB |
| Q8_0 | 12 GB |
| FP16 (no quantization) | 20 GB |
VRAM figures include model weights plus a typical 8k KV cache and ~600 MB runtime overhead (Ollama / llama.cpp baseline). Add headroom for higher context lengths.
Published benchmark scores
| Benchmark | Score |
|---|---|
| MMLU | 73.1 |
| GSM8K | 83.1 |
Scores published by the model author or aggregated from public leaderboards. Re-measured monthly by our editorial team.
Strengths
- SOTA among sub-13B models at release
- MMLU 73.1 with strong knowledge breadth
- Efficient depth-upscaled design from the 7B base
- Five-language coverage with permissive licensing
- Strong GSM8K performance for the size class
Limitations
- TII Falcon-LLM 2.0 license, not Apache 2.0
- Limited fine-tune ecosystem versus Llama derivatives
- No multimodal version available
Architecture & training
Architecture: Dense 10B · depth-upscaled from 7B
Training: Successor to the 7B.
Verdict
The strongest sub-13B Falcon to date — a solid mid-size pick when you need multilingual quality without the Llama license.
Quick start
ollama run falcon3:10bOr use the open-source MCP server to query this model from Claude Desktop, Cursor, or any MCP-compatible client.