BestLLMfor EN Your hardware. Your LLM. Your call.
APIOpen data Find my LLM
Model fiche

Nemotron 3 Super 120B

By NVIDIA · United States

chat general reasoning moe
Parameters
120B
License
NVIDIA Open Model License
Context
125k
VRAM (Q4)
72 GB
Released
March 2026

Overview

NVIDIA's first frontier-class release, a 120B MoE with 12B active parameters scoring 60% on SWE-Bench Verified. Ships with the 10T-token training corpus.

When to pick this model

  • Enterprise deployments needing NVIDIA's commercial license
  • SWE-Bench-grade coding agents on a multi-GPU rig
  • Long-context analysis up to 128K tokens
  • Reproducible research using the released training data
  • Replacing closed APIs with NVIDIA-backed weights

VRAM requirements by quantization

QuantizationVRAM required
Q4_K_M (recommended)72 GB
Q5_K_M86 GB
Q8_0132 GB
FP16 (no quantization)240 GB

VRAM figures include model weights plus a typical 8k KV cache and ~600 MB runtime overhead (Ollama / llama.cpp baseline). Add headroom for higher context lengths.

Published benchmark scores

BenchmarkScore
SWE-Bench Verified60

Scores published by the model author or aggregated from public leaderboards. Re-measured monthly by our editorial team.

Strengths

  • NVIDIA's first true frontier open release
  • 60% on SWE-Bench Verified
  • Commercially permissive NVIDIA Open Model License
  • 10T-token training corpus released alongside weights

Limitations

  • 72GB+ in Q4 needs serious hardware
  • Ollama support is still partial
  • License is permissive but not Apache 2.0

Architecture & training

Architecture: MoE 120B/12B active · NVIDIA Open Model License

Training: 10T training tokens also released.

Verdict

A credible NVIDIA-backed frontier model with the rare bonus of a public training corpus.

Quick start

ollama run nemotron-3-super:120b

Or use the open-source MCP server to query this model from Claude Desktop, Cursor, or any MCP-compatible client.

Tools

Is Nemotron 3 Super 120B the right pick for you?

Compute self-hosted ROI → Back to catalog