Yi 1.5 34B Chat
By 01.AI · China
Overview
01.AI's dense 34B chat model under Apache 2.0, trained on 3.6T tokens with strong English-Chinese bilingual quality.
When to pick this model
- Chinese-English bilingual chat needing open weights
- Llama-compatible tooling pipelines at the 34B scale
- Research baselines from the 2024 dense-34B era
- Workloads where Apache 2.0 is mandatory at 34B
- Use cases where Qwen 2.5 32B isn't an option
VRAM requirements by quantization
| Quantization | VRAM required |
|---|---|
| Q4_K_M (recommended) | 20 GB |
| Q5_K_M | 24 GB |
| Q8_0 | 36 GB |
| FP16 (no quantization) | 68 GB |
VRAM figures include model weights plus a typical 8k KV cache and ~600 MB runtime overhead (Ollama / llama.cpp baseline). Add headroom for higher context lengths.
Published benchmark scores
| Benchmark | Score |
|---|---|
| MMLU | 77.2 |
| HumanEval | 75.2 |
Scores published by the model author or aggregated from public leaderboards. Re-measured monthly by our editorial team.
Strengths
- Excellent Chinese-language performance
- Compatible with Llama tooling and quantization
- Apache 2.0 license enables free commercial use
- Stable chat behavior and well-understood quirks
Limitations
- 4096-token context is severely limiting today
- Outclassed by Qwen 2.5 32B in 2025
- No multimodal or tool-use specialization
Architecture & training
Architecture: Dense Transformer · 34B · Yi 1.5 · Llama-compatible
Training: 01.AI — 3.1T multilingual EN/ZH tokens. Successor to Yi-34B.
A competent Apache-licensed bilingual 34B from 2024 — only pick it over Qwen 2.5 32B when license terms force your hand.
Quick start
ollama run yi:34bOr use the open-source MCP server to query this model from Claude Desktop, Cursor, or any MCP-compatible client.