Mistral 7B Instruct
By Mistral AI · France
Overview
Mistral AI's breakout 7B instruct model. Still a go-to baseline for fast, low-cost inference and the most fine-tuned open-weight model in the wild.
When to pick this model
- Bootstrapping a local chatbot on a single consumer GPU
- Cheap, high-throughput batch inference where 2024+ reasoning isn't required
- Fine-tuning experiments thanks to the deep ecosystem of LoRAs and quants
- Edge or on-prem deployments under tight latency budgets
- Apache 2.0 commercial use with zero licensing friction
VRAM requirements by quantization
| Quantization | VRAM required |
|---|---|
| Q4_K_M (recommended) | 5 GB |
| Q5_K_M | 6 GB |
| Q8_0 | 9 GB |
| FP16 (no quantization) | 16 GB |
VRAM figures include model weights plus a typical 8k KV cache and ~600 MB runtime overhead (Ollama / llama.cpp baseline). Add headroom for higher context lengths.
Published benchmark scores
| Benchmark | Score |
|---|---|
| MMLU | 60.1 |
| HellaSwag | 81.3 |
| HumanEval | 30.5 |
Scores published by the model author or aggregated from public leaderboards. Re-measured monthly by our editorial team.
Strengths
- Excellent quality-to-speed ratio for a 7B
- Fully permissive Apache 2.0 license
- Mature ecosystem of fine-tunes, GGUFs, and quants
- Solid multilingual coverage, including strong French
Limitations
- Outclassed on reasoning by 2024+ models like Qwen 2.5 and Llama 3.1
- 32k context is no longer competitive
- Training data cutoff in 2023 shows on recent topics
Architecture & training
Architecture: Dense Transformer · 32 layers · Grouped-query attention
Training: Multilingual web corpus, strong in FR. Data from 2023.
A reliable, freely licensed workhorse — fine as a baseline, but newer 7Bs win on quality.
Quick start
ollama run mistral:7b-instructOr use the open-source MCP server to query this model from Claude Desktop, Cursor, or any MCP-compatible client.