Model fiche
Kanana 2 30B-A3B Thinking
By Kakao · South Korea
chat
general
reasoning
multilingual
moe
Overview
Kakao's agentic 30B MoE (3B active) with native hybrid thinking and Korean-first training. Apache 2.0 with MLA attention and 131k context.
When to pick this model
- Korean-language products from chat to content generation
- Multilingual deployments covering KR/EN/JP/ZH/TH/VI
- Agentic workflows that benefit from a togglable thinking mode
- Long-document analysis up to 131k tokens
- Apache 2.0 commercial use on a single 24GB GPU
VRAM requirements by quantization
| Quantization | VRAM required |
|---|---|
| Q4_K_M (recommended) | 18 GB |
| Q5_K_M | 22 GB |
| Q8_0 | 33 GB |
| FP16 (no quantization) | 60 GB |
VRAM figures include model weights plus a typical 8k KV cache and ~600 MB runtime overhead (Ollama / llama.cpp baseline). Add headroom for higher context lengths.
Strengths
- 131k context window in a 30B MoE
- Hybrid thinking/non-thinking mode toggle
- Native Korean performance backed by Kakao's corpus
- MLA attention cuts KV-cache footprint
- Apache 2.0 with only 3B active params per token
Limitations
- Around 18 GB VRAM in Q4 — fits a single GPU but tight on consumer cards
- Quality drops outside Korean and English
Architecture & training
Architecture: MoE · 30B · Kakao Brain Kanana 2 · 131k context · native Korean
Training: Kakao — strong in Korean, hybrid thinking/non-thinking reasoning.
Verdict
The strongest open Korean model right now, with thinking mode and a sane VRAM budget on the side.
Quick start
ollama pull hf.co/kakaoai/Kanana-2-30B-GGUFOr use the open-source MCP server to query this model from Claude Desktop, Cursor, or any MCP-compatible client.
Tools
Is Kanana 2 30B-A3B Thinking the right pick for you?