Model fiche
Jamba 1.5 Mini
By AI21 Labs · Israel
chat
general
moe
multilingual
Overview
AI21 Labs' hybrid SSM-Transformer with MoE routing, activating 12B of 52B parameters. Delivers a verified 256k context window but ships under AI21's non-OSI Jamba license.
When to pick this model
- Long-document workflows that genuinely use 200k+ tokens
- Multilingual chat across the 9 supported languages
- Benchmarking SSM-Transformer hybrids against pure attention models
- Use cases where the Jamba license terms are acceptable
VRAM requirements by quantization
| Quantization | VRAM required |
|---|---|
| Q4_K_M (recommended) | 30 GB |
| Q5_K_M | 37 GB |
| Q8_0 | 55 GB |
| FP16 (no quantization) | 104 GB |
VRAM figures include model weights plus a typical 8k KV cache and ~600 MB runtime overhead (Ollama / llama.cpp baseline). Add headroom for higher context lengths.
Strengths
- Effective 256k context (86% on RULER)
- Unique SSM-Transformer hybrid architecture
- Strong throughput vs. dense models of similar capability
- Solid 9-language coverage
Limitations
- Custom Jamba license is not OSI-approved
- Partial llama.cpp support complicates local deployment
- Superseded by Jamba 1.6 and 1.7
- Smaller fine-tune ecosystem than Llama or Qwen
Architecture & training
Architecture: Hybrid SSM-Transformer (Mamba+Attention) + MoE · 52B/12B active
Training: 256k effective ctx (86% at 256k RULER).
Verdict
A novel hybrid with real long-context performance, now eclipsed by newer Jamba releases and gated by a non-standard license.
Quick start
# HuggingFace : ai21labs/AI21-Jamba-Mini-1.5Or use the open-source MCP server to query this model from Claude Desktop, Cursor, or any MCP-compatible client.