Model fiche
Sarvam-M 24B
By Sarvam AI · India
chat
general
reasoning
multilingual
Overview
Sarvam AI's 24B built on Mistral Small 3.1 with hybrid think/no-think modes, gaining +86% on romanized GSM-8K Indic and covering 11 Indian languages plus English.
When to pick this model
- Indic-language chat and content across 11 Indian languages
- Math-heavy workloads in romanized Indic scripts
- Hybrid reasoning where toggleable thinking helps
- Sovereign Indian deployments needing open weights
- Replacing closed APIs for Indian-market products
VRAM requirements by quantization
| Quantization | VRAM required |
|---|---|
| Q4_K_M (recommended) | 14 GB |
| Q5_K_M | 17 GB |
| Q8_0 | 26 GB |
| FP16 (no quantization) | 48 GB |
VRAM figures include model weights plus a typical 8k KV cache and ~600 MB runtime overhead (Ollama / llama.cpp baseline). Add headroom for higher context lengths.
Strengths
- +86% gain on romanized Indic GSM-8K
- Hybrid think/no-think mode toggle
- 11 Indian languages plus English
- Apache 2.0 with permissive commercial use
- Mistral Small 3.1 base brings solid general quality
Limitations
- No official Ollama distribution yet
- Strong Indic focus limits broader multilingual use
- Smaller community ecosystem than Mistral mainline
Architecture & training
Architecture: Dense 24B · Mistral Small 3.1 base · hybrid think/non-think
Training: 11 Indian languages + EN.
Verdict
The top open model for Indic markets — pick it when you need real Indian-language coverage with hybrid reasoning.
Quick start
# HuggingFace : sarvamai/sarvam-mOr use the open-source MCP server to query this model from Claude Desktop, Cursor, or any MCP-compatible client.