Aya Expanse 8B
By Cohere For AI · United States
Overview
Cohere For AI's multilingual 8B covering 23 languages, outperforming Gemma 2 9B and Llama 3.1 8B in its language set. CC-BY-NC — non-commercial only.
When to pick this model
- You're doing multilingual research that doesn't require commercial use
- You need strong coverage of low-resource languages at the 8B tier
- You're benchmarking against Gemma 2 9B and Llama 3.1 8B on non-English tasks
- You're building an internal evaluation harness
VRAM requirements by quantization
| Quantization | VRAM required |
|---|---|
| Q4_K_M (recommended) | 5 GB |
| Q5_K_M | 6 GB |
| Q8_0 | 9 GB |
| FP16 (no quantization) | 16 GB |
VRAM figures include model weights plus a typical 8k KV cache and ~600 MB runtime overhead (Ollama / llama.cpp baseline). Add headroom for higher context lengths.
Published benchmark scores
| Benchmark | Score |
|---|---|
| Dolly (vs Llama 3.1 8B) | 83.9 |
Scores published by the model author or aggregated from public leaderboards. Re-measured monthly by our editorial team.
Strengths
- 23 language coverage with strong low-resource performance
- Beats Gemma 2 9B and Llama 3.1 8B on multilingual benchmarks
- Particularly strong on low-resource languages
- Compact 8B footprint
Limitations
- CC-BY-NC 4.0 — no commercial deployment
- Only 8K context
- Outclassed by Qwen 3 8B on most general tasks
Architecture & training
Architecture: Dense · 32 layers · 32 heads · SwiGLU · GQA · SentencePiece ~128k vocab
Training: 23 languages, multilingual focus.
A strong multilingual research model held back by its non-commercial license.
Quick start
ollama run aya-expanse:8bOr use the open-source MCP server to query this model from Claude Desktop, Cursor, or any MCP-compatible client.