Model fiche
Falcon 3 7B Instruct
By TII · UAE
chat
general
multilingual
Overview
TII's 7B trained on 14T tokens, hitting MMLU 70.5 — on par with Qwen2.5-7B — with native support for English, French, Spanish, German, and Portuguese.
When to pick this model
- Multilingual chat across the five supported European languages
- General-purpose 7B serving where Qwen licensing is a concern
- Workloads needing 32k context at small scale
- Sovereign deployments preferring a non-Chinese-origin model
- Knowledge-heavy QA at the 7B tier
VRAM requirements by quantization
| Quantization | VRAM required |
|---|---|
| Q4_K_M (recommended) | 5 GB |
| Q5_K_M | 6 GB |
| Q8_0 | 9 GB |
| FP16 (no quantization) | 14 GB |
VRAM figures include model weights plus a typical 8k KV cache and ~600 MB runtime overhead (Ollama / llama.cpp baseline). Add headroom for higher context lengths.
Published benchmark scores
| Benchmark | Score |
|---|---|
| MMLU | 70.5 |
| GSM8K | 80.8 |
Scores published by the model author or aggregated from public leaderboards. Re-measured monthly by our editorial team.
Strengths
- MMLU 70.5 matches Qwen2.5-7B
- Trained on 14T tokens for broad knowledge coverage
- Five-language native support out of the box
- Permissive commercial license under TII Falcon-LLM 2.0
- 32k context covers most production needs
Limitations
- TII license is permissive but not Apache 2.0
- Smaller community than Llama or Qwen ecosystems
- No official multimodal variants
Architecture & training
Architecture: Dense 7B · GQA · 32k ctx
Training: 14T tokens.
Verdict
A credible non-Chinese 7B with Qwen-class quality — pick it for European multilingual work that needs a permissive commercial license.
Quick start
ollama run falcon3:7bOr use the open-source MCP server to query this model from Claude Desktop, Cursor, or any MCP-compatible client.