Model fiche
Nemotron Nano v2 VL 12B
By NVIDIA · United States
vision
chat
Overview
NVIDIA's 12.6B enterprise VLM with strong DocVQA and ChartQA scores, tuned for professional document extraction workflows.
When to pick this model
- Enterprise document extraction and DocVQA pipelines
- Chart and table understanding at production scale
- Single-GPU multimodal deployments
- Long-context multimodal tasks up to 128k tokens
VRAM requirements by quantization
| Quantization | VRAM required |
|---|---|
| Q4_K_M (recommended) | 8 GB |
| Q5_K_M | 10 GB |
| Q8_0 | 14 GB |
| FP16 (no quantization) | 25 GB |
VRAM figures include model weights plus a typical 8k KV cache and ~600 MB runtime overhead (Ollama / llama.cpp baseline). Add headroom for higher context lengths.
Strengths
- Combined vision and text in a 12B footprint
- 128k context window
- Strong DocVQA and ChartQA benchmark scores
- NVIDIA Open Model license
Limitations
- Trails Qwen3-VL 30B on complex visual reasoning
- NVIDIA license terms differ from Apache or MIT
- Smaller community than Qwen or LLaVA families
Architecture & training
Architecture: Dense vision · 12.6B · Nemotron-Nano-v2 VL · 128k context
Training: NVIDIA Nemotron Nano v2 multimodal — text + images in 12B.
Verdict
A focused enterprise VLM that punches above its weight on documents and charts — the right call when extraction is the job.
Quick start
ollama run nemotron3-v2:12bOr use the open-source MCP server to query this model from Claude Desktop, Cursor, or any MCP-compatible client.