Model fiche
Voxtral-4B-TTS
By Mistral AI · France
audio
multilingual
fr
small
Overview
Mistral AI's open frontier TTS model covering 9 languages including French, rivaling ElevenLabs on quality. Note: CC-BY-NC 4.0, non-commercial only.
When to pick this model
- Research and academic TTS projects
- Internal demos and prototypes
- Personal creative work and audiobooks
- Multilingual voice generation with French support
- Offline TTS on a laptop
VRAM requirements by quantization
| Quantization | VRAM required |
|---|---|
| Q4_K_M (recommended) | 10 GB |
| Q5_K_M | 12 GB |
| Q8_0 | 18 GB |
| FP16 (no quantization) | 33 GB |
VRAM figures include model weights plus a typical 8k KV cache and ~600 MB runtime overhead (Ollama / llama.cpp baseline). Add headroom for higher context lengths.
Strengths
- Studio-quality TTS in an open model
- Native French alongside 8 other languages
- Runs on consumer laptop hardware
- Competitive with ElevenLabs on quality
Limitations
- CC-BY-NC 4.0 license blocks commercial use
- Not a text LLM, narrower utility
- Short 4K context limits long-form scripts
Architecture & training
Architecture: Open frontier TTS · 4B · 9 languages
Training: Direct competitor to ElevenLabs.
Verdict
An ElevenLabs-class TTS for non-commercial work; commercial users need a different license path.
Quick start
# HuggingFace : mistralai/Voxtral-4B-TTS-2603Or use the open-source MCP server to query this model from Claude Desktop, Cursor, or any MCP-compatible client.