BestLLMfor EN Your hardware. Your LLM. Your call.
APIOpen data Find my LLM
Model fiche

Aya Expanse 32B

By Cohere For AI · United States

chat general multilingual
Parameters
32B
License
CC-BY-NC 4.0
Context
8k
VRAM (Q4)
19 GB
Released
October 2024

Overview

The 32B sibling of Aya Expanse from Cohere For AI, delivering a 25% gain on low-resource languages and 89.9% win rate on Dolly vs Mixtral 8x22B. CC-BY-NC.

When to pick this model

  • You're doing high-quality multilingual research at the 30B tier
  • You need top-tier low-resource language performance
  • You're comparing against Mixtral 8x22B on multilingual benchmarks
  • Non-commercial use is acceptable for your project

VRAM requirements by quantization

QuantizationVRAM required
Q4_K_M (recommended)19 GB
Q5_K_M23 GB
Q8_035 GB
FP16 (no quantization)64 GB

VRAM figures include model weights plus a typical 8k KV cache and ~600 MB runtime overhead (Ollama / llama.cpp baseline). Add headroom for higher context lengths.

Published benchmark scores

BenchmarkScore
Dolly (vs Mixtral 8x22B)89.9

Scores published by the model author or aggregated from public leaderboards. Re-measured monthly by our editorial team.

Strengths

  • 25% improvement on low-resource languages vs peers
  • 23 language coverage
  • 89.9% win rate on Dolly vs Mixtral 8x22B
  • Strong general performance for its size

Limitations

  • CC-BY-NC 4.0 — no commercial use
  • Only 8K context window
  • Newer Qwen 3 models close much of the gap with permissive licenses

Architecture & training

Architecture: Dense (Command R base) · 23 languages

Training: Multilingual fine-tune of the Command backbone.

Verdict

The strongest open multilingual 32B for research — license disqualifies it for production.

Quick start

ollama run aya-expanse:32b

Or use the open-source MCP server to query this model from Claude Desktop, Cursor, or any MCP-compatible client.

Tools

Is Aya Expanse 32B the right pick for you?

Compute self-hosted ROI → Back to catalog