BestLLMfor EN Your hardware. Your LLM. Your call.
APIOpen data Find my LLM
Model fiche

Kanana 2 30B-A3B Thinking

By Kakao · South Korea

chat general reasoning multilingual moe
Parameters
30B
License
Apache 2.0
Context
128k
VRAM (Q4)
18 GB
Released
April 2025

Overview

Kakao's agentic 30B MoE (3B active) with native hybrid thinking and Korean-first training. Apache 2.0 with MLA attention and 131k context.

When to pick this model

  • Korean-language products from chat to content generation
  • Multilingual deployments covering KR/EN/JP/ZH/TH/VI
  • Agentic workflows that benefit from a togglable thinking mode
  • Long-document analysis up to 131k tokens
  • Apache 2.0 commercial use on a single 24GB GPU

VRAM requirements by quantization

QuantizationVRAM required
Q4_K_M (recommended)18 GB
Q5_K_M22 GB
Q8_033 GB
FP16 (no quantization)60 GB

VRAM figures include model weights plus a typical 8k KV cache and ~600 MB runtime overhead (Ollama / llama.cpp baseline). Add headroom for higher context lengths.

Strengths

  • 131k context window in a 30B MoE
  • Hybrid thinking/non-thinking mode toggle
  • Native Korean performance backed by Kakao's corpus
  • MLA attention cuts KV-cache footprint
  • Apache 2.0 with only 3B active params per token

Limitations

  • Around 18 GB VRAM in Q4 — fits a single GPU but tight on consumer cards
  • Quality drops outside Korean and English

Architecture & training

Architecture: MoE · 30B · Kakao Brain Kanana 2 · 131k context · native Korean

Training: Kakao — strong in Korean, hybrid thinking/non-thinking reasoning.

Verdict

The strongest open Korean model right now, with thinking mode and a sane VRAM budget on the side.

Quick start

ollama pull hf.co/kakaoai/Kanana-2-30B-GGUF

Or use the open-source MCP server to query this model from Claude Desktop, Cursor, or any MCP-compatible client.

Tools

Is Kanana 2 30B-A3B Thinking the right pick for you?

Compute self-hosted ROI → Back to catalog