BestLLMfor EN Your hardware. Your LLM. Your call.
APIOpen data Find my LLM
Model fiche

Command R+ 104B (08-2024)

By Cohere · United States

chat general multilingual
Parameters
104B
License
CC-BY-NC 4.0
Context
125k
VRAM (Q4)
60 GB
Released
August 2024 (refresh)

Overview

Cohere's 104B RAG and tool-use flagship from August 2024 — 128K context, 23 languages. Licensed CC-BY-NC, so non-commercial only without a Cohere agreement.

When to pick this model

  • You're building research or internal-only RAG systems
  • You need top-tier tool-use behavior in an open weights model
  • You need broad multilingual coverage across 23 languages
  • You're evaluating before signing a commercial agreement with Cohere

VRAM requirements by quantization

QuantizationVRAM required
Q4_K_M (recommended)60 GB
Q5_K_M72 GB
Q8_0110 GB
FP16 (no quantization)208 GB

VRAM figures include model weights plus a typical 8k KV cache and ~600 MB runtime overhead (Ollama / llama.cpp baseline). Add headroom for higher context lengths.

Strengths

  • Best-in-class open RAG and tool-use at release
  • 128K context window
  • 23 language coverage
  • Higher throughput and lower latency than the April 2024 release

Limitations

  • CC-BY-NC 4.0 — no commercial use without a separate license
  • 60GB+ VRAM in Q4
  • Surpassed by newer 100B-class models on general benchmarks

Architecture & training

Architecture: Dense · optimized for RAG and tool-use · GQA

Training: 23 languages, +50% throughput / -25% latency vs April 2024 version.

Verdict

Strong RAG and tool-use, but the non-commercial license rules it out of most production deployments.

Quick start

ollama run command-r-plus:104b

Or use the open-source MCP server to query this model from Claude Desktop, Cursor, or any MCP-compatible client.

Tools

Is Command R+ 104B (08-2024) the right pick for you?

Compute self-hosted ROI → Back to catalog