Nemotron 3 Super 120B
By NVIDIA · United States
Overview
NVIDIA's first frontier-class release, a 120B MoE with 12B active parameters scoring 60% on SWE-Bench Verified. Ships with the 10T-token training corpus.
When to pick this model
- Enterprise deployments needing NVIDIA's commercial license
- SWE-Bench-grade coding agents on a multi-GPU rig
- Long-context analysis up to 128K tokens
- Reproducible research using the released training data
- Replacing closed APIs with NVIDIA-backed weights
VRAM requirements by quantization
| Quantization | VRAM required |
|---|---|
| Q4_K_M (recommended) | 72 GB |
| Q5_K_M | 86 GB |
| Q8_0 | 132 GB |
| FP16 (no quantization) | 240 GB |
VRAM figures include model weights plus a typical 8k KV cache and ~600 MB runtime overhead (Ollama / llama.cpp baseline). Add headroom for higher context lengths.
Published benchmark scores
| Benchmark | Score |
|---|---|
| SWE-Bench Verified | 60 |
Scores published by the model author or aggregated from public leaderboards. Re-measured monthly by our editorial team.
Strengths
- NVIDIA's first true frontier open release
- 60% on SWE-Bench Verified
- Commercially permissive NVIDIA Open Model License
- 10T-token training corpus released alongside weights
Limitations
- 72GB+ in Q4 needs serious hardware
- Ollama support is still partial
- License is permissive but not Apache 2.0
Architecture & training
Architecture: MoE 120B/12B active · NVIDIA Open Model License
Training: 10T training tokens also released.
A credible NVIDIA-backed frontier model with the rare bonus of a public training corpus.
Quick start
ollama run nemotron-3-super:120bOr use the open-source MCP server to query this model from Claude Desktop, Cursor, or any MCP-compatible client.