Mistral Nemo 12B Instruct
By Mistral AI · France
Overview
Mistral AI and NVIDIA's co-developed 12B instruct model with 128k context, the Tekken tokenizer, and strong European multilingual coverage.
When to pick this model
- Multilingual chat across European languages
- Long-context summarization and RAG
- Replacing Mistral 7B with a noticeable quality bump
- Apache 2.0 commercial deployments on a single 24GB GPU
- NVIDIA-tuned inference stacks
VRAM requirements by quantization
| Quantization | VRAM required |
|---|---|
| Q4_K_M (recommended) | 7 GB |
| Q5_K_M | 9 GB |
| Q8_0 | 13 GB |
| FP16 (no quantization) | 24 GB |
VRAM figures include model weights plus a typical 8k KV cache and ~600 MB runtime overhead (Ollama / llama.cpp baseline). Add headroom for higher context lengths.
Published benchmark scores
| Benchmark | Score |
|---|---|
| MMLU | 68 |
| HellaSwag | 83.5 |
| Winogrande | 76.8 |
Scores published by the model author or aggregated from public leaderboards. Re-measured monthly by our editorial team.
Strengths
- 128k context window
- Strong European multilingual performance
- Apache 2.0 license
- Efficient Tekken tokenizer reduces token counts
Limitations
- Reasoning trails Mistral Small 3.1
- No vision
- Eclipsed by Small 3 on most general benchmarks
Architecture & training
Architecture: Dense Transformer · GQA · Tekken tokenizer (131k vocab)
Training: Co-trained by Mistral × NVIDIA. European multilingual corpus.
A clean midsize Mistral with great multilingual chops — Small 3.1 wins overall, but Nemo's tokenizer remains attractive.
Quick start
ollama run mistral-nemo:12bOr use the open-source MCP server to query this model from Claude Desktop, Cursor, or any MCP-compatible client.
Is Mistral Nemo 12B Instruct the right pick for you?