Model fiche
Qwen 3 VL 30B-A3B
By Alibaba · China
vision
chat
general
moe
multilingual
Overview
Qwen 3 VL's sweet spot: a 30B MoE with 3B active parameters and 256k context. Delivers most of the 235B's quality at a fraction of the hardware cost.
When to pick this model
- Single-GPU multimodal deployments
- Long-context document and chart analysis
- Cost-conscious teams wanting near-flagship vision
- Apache-licensed VLM for commercial products
VRAM requirements by quantization
| Quantization | VRAM required |
|---|---|
| Q4_K_M (recommended) | 19 GB |
| Q5_K_M | 23 GB |
| Q8_0 | 35 GB |
| FP16 (no quantization) | 62 GB |
VRAM figures include model weights plus a typical 8k KV cache and ~600 MB runtime overhead (Ollama / llama.cpp baseline). Add headroom for higher context lengths.
Strengths
- Around 19 GB VRAM at Q4 — fits a single 24 GB card
- Native 262k multimodal context
- Efficient MoE with only 3B active parameters
- Apache 2.0
Limitations
- Lags the 235B on complex scene understanding
- Fewer fine-tunes than the older Qwen2-VL family
Architecture & training
Architecture: MoE vision · 30B · Qwen3-VL · 262k context
Training: Qwen3-VL 30B — good quality/accessibility tradeoff for vision MoE.
Verdict
The pragmatic open-vision choice in 2026 — most of the flagship's quality on hardware most teams already own.
Quick start
ollama run qwen3-vl:30bOr use the open-source MCP server to query this model from Claude Desktop, Cursor, or any MCP-compatible client.