Model fiche
Codestral Mamba 7B
By Mistral AI · France
code
fr
Overview
Mistral AI's pure Mamba SSM architecture for code, with linear-time inference and a 256k context window. Apache 2.0, but tooling support is still patchy.
When to pick this model
- Long-context code analysis across entire repositories
- Research into state-space models for code
- Inference workloads where constant memory matters more than raw quality
- Settings where mistral-inference or vLLM is already in the stack
VRAM requirements by quantization
| Quantization | VRAM required |
|---|---|
| Q4_K_M (recommended) | 5 GB |
| Q5_K_M | 6 GB |
| Q8_0 | 9 GB |
| FP16 (no quantization) | 14 GB |
VRAM figures include model weights plus a typical 8k KV cache and ~600 MB runtime overhead (Ollama / llama.cpp baseline). Add headroom for higher context lengths.
Strengths
- Verified 256k context for whole-repo reasoning
- Constant memory footprint regardless of sequence length
- Apache 2.0 license
- Linear-time inference scales gracefully on long inputs
Limitations
- No official Ollama support
- Only partial llama.cpp integration
- Requires mistral-inference or vLLM for full functionality
- Quality trails transformer-based coders of similar size
Architecture & training
Architecture: Pure Mamba2 SSM ยท linear inference
Training: First serious Mamba for code.
Verdict
The first serious Mamba code model โ pick it for long-context experiments, not for daily completion work.
Quick start
# HuggingFace : mistralai/Mamba-Codestral-7B-v0.1Or use the open-source MCP server to query this model from Claude Desktop, Cursor, or any MCP-compatible client.