BestLLMfor EN Your hardware. Your LLM. Your call.
APIOpen data Find my LLM
Model fiche

Nemotron Cascade 2 30B-A3B

By NVIDIA · United States

chat code reasoning moe
Parameters
30B
License
NVIDIA Open Model License
Context
125k
VRAM (Q4)
17 GB
Released
April 2026

Overview

NVIDIA's 30B MoE (3B active) with both thinking and instruct modes. Earned IMO 2025 and IOI 2025 gold medals — 30B-class reasoning at 3B-active inference speed. Released April 2026.

When to pick this model

  • Competition-grade math and code workloads
  • Reasoning agents needing fast inference (3B active)
  • Single-GPU deployments on 24 GB cards in Q4
  • Production systems on NVIDIA Open Model License terms
  • Tasks switching between thinking and instruct modes

VRAM requirements by quantization

QuantizationVRAM required
Q4_K_M (recommended)17 GB
Q5_K_M21 GB
Q8_032 GB
FP16 (no quantization)60 GB

VRAM figures include model weights plus a typical 8k KV cache and ~600 MB runtime overhead (Ollama / llama.cpp baseline). Add headroom for higher context lengths.

Published benchmark scores

BenchmarkScore
AIME 202588

Scores published by the model author or aggregated from public leaderboards. Re-measured monthly by our editorial team.

Strengths

  • Gold medal at IMO 2025 and IOI 2025 in thinking mode
  • Fast inference with only 3B active params
  • Fits on a 24 GB GPU at Q4
  • Commercial use allowed under NVIDIA Open Model License

Limitations

  • NVIDIA Open Model License — not Apache or MIT
  • 32+ GB VRAM total in Q4 (full model is 30B)
  • Thinking mode generation can be slow

Architecture & training

Architecture: MoE 30B/3B active · unified thinking mode + instruct · 128k ctx

Training: Trained by NVIDIA. Gold medal at IMO 2025 and IOI 2025 in thinking mode. Optimized for mathematical reasoning and competitive code.

Verdict

Olympic-grade reasoning at 3B-active inference cost — the sharpest open math and code model in its weight class.

Quick start

ollama run nemotron-cascade-2

Or use the open-source MCP server to query this model from Claude Desktop, Cursor, or any MCP-compatible client.

Tools

Is Nemotron Cascade 2 30B-A3B the right pick for you?

Compute self-hosted ROI → Back to catalog