Aya Expanse 32B
By Cohere For AI · United States
Overview
The 32B sibling of Aya Expanse from Cohere For AI, delivering a 25% gain on low-resource languages and 89.9% win rate on Dolly vs Mixtral 8x22B. CC-BY-NC.
When to pick this model
- You're doing high-quality multilingual research at the 30B tier
- You need top-tier low-resource language performance
- You're comparing against Mixtral 8x22B on multilingual benchmarks
- Non-commercial use is acceptable for your project
VRAM requirements by quantization
| Quantization | VRAM required |
|---|---|
| Q4_K_M (recommended) | 19 GB |
| Q5_K_M | 23 GB |
| Q8_0 | 35 GB |
| FP16 (no quantization) | 64 GB |
VRAM figures include model weights plus a typical 8k KV cache and ~600 MB runtime overhead (Ollama / llama.cpp baseline). Add headroom for higher context lengths.
Published benchmark scores
| Benchmark | Score |
|---|---|
| Dolly (vs Mixtral 8x22B) | 89.9 |
Scores published by the model author or aggregated from public leaderboards. Re-measured monthly by our editorial team.
Strengths
- 25% improvement on low-resource languages vs peers
- 23 language coverage
- 89.9% win rate on Dolly vs Mixtral 8x22B
- Strong general performance for its size
Limitations
- CC-BY-NC 4.0 — no commercial use
- Only 8K context window
- Newer Qwen 3 models close much of the gap with permissive licenses
Architecture & training
Architecture: Dense (Command R base) · 23 languages
Training: Multilingual fine-tune of the Command backbone.
The strongest open multilingual 32B for research — license disqualifies it for production.
Quick start
ollama run aya-expanse:32bOr use the open-source MCP server to query this model from Claude Desktop, Cursor, or any MCP-compatible client.