Model fiche
Tiny Aya 3.35B
By Cohere For AI · Canada
chat
multilingual
small
Overview
Cohere For AI's 3.35B model in 5 regional variants covering 70+ languages, with the Water variant tuned for Europe and APAC. CC-BY-NC 4.0, non-commercial only.
When to pick this model
- Research on multilingual small models
- Internal multilingual tooling under non-commercial use
- Region-specific assistants via specialized variants
- Educational and academic projects
- Personal multilingual chat applications
VRAM requirements by quantization
| Quantization | VRAM required |
|---|---|
| Q4_K_M (recommended) | 2.2 GB |
| Q5_K_M | 2.7 GB |
| Q8_0 | 3.8 GB |
| FP16 (no quantization) | 7 GB |
VRAM figures include model weights plus a typical 8k KV cache and ~600 MB runtime overhead (Ollama / llama.cpp baseline). Add headroom for higher context lengths.
Strengths
- Best multilingual quality in the tiny tier
- Five regional variants targeting specific markets
- Native coverage of 70+ languages
- Backed by Cohere For AI research
Limitations
- CC-BY-NC 4.0 blocks commercial deployment
- 8K context is limiting for long-form work
- Quality varies across regional variants
Architecture & training
Architecture: Dense 3.35B · 5 regional variants (Base/Global/Earth/Fire/Water)
Training: 70+ languages. Water = Europe + APAC (FR).
Verdict
The strongest tiny multilingual model when commercial use is off the table.
Quick start
# HuggingFace : CohereLabs/tiny-aya-base (pas encore de tag Ollama officiel)Or use the open-source MCP server to query this model from Claude Desktop, Cursor, or any MCP-compatible client.