BestLLMfor EN Your hardware. Your LLM. Your call.
APIOpen data Find my LLM
Model fiche

Nemotron 3 Nano Omni 30B-A3B

By NVIDIA · United States

chat vision audio reasoning moe
Parameters
30B
License
NVIDIA Open Model License
Context
250k
VRAM (Q4)
21 GB
Released
28 April 2026

Overview

NVIDIA's omnimodal MoE: 30B total / 3B active, handling text, image, audio, and video in 256k context. Hybrid Mamba2-MoE architecture delivers 9x the throughput of competing open omni models. Released April 2026.

When to pick this model

  • High-throughput omnimodal inference on NVIDIA hardware
  • Single-GPU deployments needing text + image + audio + video
  • Long-context multimodal analysis (256k)
  • Production pipelines built on NVIDIA NIM
  • English-only voice and video assistants

VRAM requirements by quantization

QuantizationVRAM required
Q4_K_M (recommended)21 GB
Q5_K_M25 GB
Q8_033 GB
FP16 (no quantization)62 GB

VRAM figures include model weights plus a typical 8k KV cache and ~600 MB runtime overhead (Ollama / llama.cpp baseline). Add headroom for higher context lengths.

Strengths

  • Native omnimodal: text, image, audio, video
  • 256k context window
  • 9x throughput versus other open omni models
  • Runs on a single GPU thanks to 3B active MoE
  • First-class NVIDIA NIM pipeline

Limitations

  • English-only
  • Full multimodal requires llama.cpp or vLLM (Ollama is text-only)
  • NVIDIA Open Model License is not Apache or MIT

Architecture & training

Architecture: Hybrid Mamba2-Transformer MoE · 30B total / 3B active · Conv3D + EVS · integrated vision/audio/video

Training: 354.6M samples · ~717B tokens across 1,395 datasets. English only. BF16, FP8, NVFP4 variants released.

Verdict

The fastest open omnimodal model on a single GPU — as long as you only need English.

Quick start

# HuggingFace : nvidia/Nemotron-3-Nano-Omni-30B-A3B-Reasoning-BF16

Or use the open-source MCP server to query this model from Claude Desktop, Cursor, or any MCP-compatible client.

Tools

Is Nemotron 3 Nano Omni 30B-A3B the right pick for you?

Compute self-hosted ROI → Back to catalog