Model fiche
Mistral Large 3 675B
By Mistral AI · France
chat
general
vision
multilingual
fr
moe
Overview
Mistral AI's flagship 675B MoE (41B active) with a 2.5B vision encoder, trained from scratch on 3,000 H200s and released under Apache 2.0. Currently #2 OSS non-reasoning model on LMArena.
When to pick this model
- Frontier-tier on-prem deployments needing permissive licensing
- Multimodal applications requiring top-tier text quality
- Sovereign or regulated environments that cannot ship data to closed APIs
- Multilingual production workloads across European languages
- Replacing GPT-4-class APIs in self-hosted stacks
VRAM requirements by quantization
| Quantization | VRAM required |
|---|---|
| Q4_K_M (recommended) | 405 GB |
| Q5_K_M | 485 GB |
| Q8_0 | 720 GB |
| FP16 (no quantization) | 1350 GB |
VRAM figures include model weights plus a typical 8k KV cache and ~600 MB runtime overhead (Ollama / llama.cpp baseline). Add headroom for higher context lengths.
Strengths
- Top-tier open weights — #2 OSS non-reasoning on LMArena
- Apache 2.0 — fully unrestricted commercial use
- Native multimodal with 2.5B vision encoder
- 256k context window
- Strong multilingual coverage out of the box
Limitations
- 405GB at Q4 — needs an H200 or B200 server class deployment
- Active expert count (41B) still demands substantial inference compute
- Overkill for most single-GPU or developer-laptop use cases
Architecture & training
Architecture: Granular MoE 675B/41B active + 2.5B vision encoder · 256k ctx
Training: From scratch on 3000 H200.
Verdict
The most capable open-weight non-reasoning model shipping today — if you have the H200s, this replaces closed frontier APIs.
Quick start
# HuggingFace : mistralai/Mistral-Large-3-675B-Instruct-2512Or use the open-source MCP server to query this model from Claude Desktop, Cursor, or any MCP-compatible client.