BestLLMfor EN Your hardware. Your LLM. Your call.
APIOpen data Find my LLM
Model fiche

Aya Expanse 8B

By Cohere For AI · United States

chat general multilingual
Parameters
8B
License
CC-BY-NC 4.0
Context
8k
VRAM (Q4)
5 GB
Released
October 2024

Overview

Cohere For AI's multilingual 8B covering 23 languages, outperforming Gemma 2 9B and Llama 3.1 8B in its language set. CC-BY-NC — non-commercial only.

When to pick this model

  • You're doing multilingual research that doesn't require commercial use
  • You need strong coverage of low-resource languages at the 8B tier
  • You're benchmarking against Gemma 2 9B and Llama 3.1 8B on non-English tasks
  • You're building an internal evaluation harness

VRAM requirements by quantization

QuantizationVRAM required
Q4_K_M (recommended)5 GB
Q5_K_M6 GB
Q8_09 GB
FP16 (no quantization)16 GB

VRAM figures include model weights plus a typical 8k KV cache and ~600 MB runtime overhead (Ollama / llama.cpp baseline). Add headroom for higher context lengths.

Published benchmark scores

BenchmarkScore
Dolly (vs Llama 3.1 8B)83.9

Scores published by the model author or aggregated from public leaderboards. Re-measured monthly by our editorial team.

Strengths

  • 23 language coverage with strong low-resource performance
  • Beats Gemma 2 9B and Llama 3.1 8B on multilingual benchmarks
  • Particularly strong on low-resource languages
  • Compact 8B footprint

Limitations

  • CC-BY-NC 4.0 — no commercial deployment
  • Only 8K context
  • Outclassed by Qwen 3 8B on most general tasks

Architecture & training

Architecture: Dense · 32 layers · 32 heads · SwiGLU · GQA · SentencePiece ~128k vocab

Training: 23 languages, multilingual focus.

Verdict

A strong multilingual research model held back by its non-commercial license.

Quick start

ollama run aya-expanse:8b

Or use the open-source MCP server to query this model from Claude Desktop, Cursor, or any MCP-compatible client.

Tools

Is Aya Expanse 8B the right pick for you?

Compute self-hosted ROI → Back to catalog