Yi Coder 9B Chat
By 01.AI · China
Overview
01.AI's 9B code model covering 52 programming languages, hitting 23% on LiveCodeBench — best-in-class under 10B and beating DeepSeek Coder 33B.
When to pick this model
- Code completion and review on a single consumer GPU
- Polyglot codebases spanning many languages
- Self-hosted Copilot alternatives at small scale
- Code workloads needing 128k context for repo-level reasoning
- Cost-sensitive deployments where Qwen Coder is overkill
VRAM requirements by quantization
| Quantization | VRAM required |
|---|---|
| Q4_K_M (recommended) | 5.5 GB |
| Q5_K_M | 7 GB |
| Q8_0 | 10 GB |
| FP16 (no quantization) | 18 GB |
VRAM figures include model weights plus a typical 8k KV cache and ~600 MB runtime overhead (Ollama / llama.cpp baseline). Add headroom for higher context lengths.
Published benchmark scores
| Benchmark | Score |
|---|---|
| LiveCodeBench | 23 |
Scores published by the model author or aggregated from public leaderboards. Re-measured monthly by our editorial team.
Strengths
- LiveCodeBench 23% leads the sub-10B field
- Outperforms DeepSeek Coder 33B at a fraction of the size
- Coverage across 52 programming languages
- Apache 2.0 license
- 128k context enables repo-scale code understanding
Limitations
- Less popular than Qwen Coder, so fewer fine-tunes exist
- No instruction-tuned variant beyond chat
- Quality gap versus Qwen 2.5 Coder 7B/14B in 2025
Architecture & training
Architecture: Dense 9B code · Llama base · 128k ctx
Training: 52 programming languages.
The strongest sub-10B code model in its release window — still a sharp pick when you need 128k context on modest hardware.
Quick start
ollama run yi-coder:9bOr use the open-source MCP server to query this model from Claude Desktop, Cursor, or any MCP-compatible client.