BestLLMfor EN Your hardware. Your LLM. Your call.
APIOpen data Find my LLM
Model fiche

Yi 1.5 34B Chat

By 01.AI · China

chat general multilingual
Parameters
34B
License
Apache 2.0
Context
4k
VRAM (Q4)
20 GB
Released
May 2024

Overview

01.AI's dense 34B chat model under Apache 2.0, trained on 3.6T tokens with strong English-Chinese bilingual quality.

When to pick this model

  • Chinese-English bilingual chat needing open weights
  • Llama-compatible tooling pipelines at the 34B scale
  • Research baselines from the 2024 dense-34B era
  • Workloads where Apache 2.0 is mandatory at 34B
  • Use cases where Qwen 2.5 32B isn't an option

VRAM requirements by quantization

QuantizationVRAM required
Q4_K_M (recommended)20 GB
Q5_K_M24 GB
Q8_036 GB
FP16 (no quantization)68 GB

VRAM figures include model weights plus a typical 8k KV cache and ~600 MB runtime overhead (Ollama / llama.cpp baseline). Add headroom for higher context lengths.

Published benchmark scores

BenchmarkScore
MMLU77.2
HumanEval75.2

Scores published by the model author or aggregated from public leaderboards. Re-measured monthly by our editorial team.

Strengths

  • Excellent Chinese-language performance
  • Compatible with Llama tooling and quantization
  • Apache 2.0 license enables free commercial use
  • Stable chat behavior and well-understood quirks

Limitations

  • 4096-token context is severely limiting today
  • Outclassed by Qwen 2.5 32B in 2025
  • No multimodal or tool-use specialization

Architecture & training

Architecture: Dense Transformer · 34B · Yi 1.5 · Llama-compatible

Training: 01.AI — 3.1T multilingual EN/ZH tokens. Successor to Yi-34B.

Verdict

A competent Apache-licensed bilingual 34B from 2024 — only pick it over Qwen 2.5 32B when license terms force your hand.

Quick start

ollama run yi:34b

Or use the open-source MCP server to query this model from Claude Desktop, Cursor, or any MCP-compatible client.

Tools

Is Yi 1.5 34B Chat the right pick for you?

Compute self-hosted ROI → Back to catalog