Model fiche
Qwen 3 VL 235B-A22B
By Alibaba · China
vision
chat
general
moe
multilingual
Overview
Alibaba's flagship Qwen 3 vision model: 235B MoE with 22B active parameters and a native 256k context that extends to 1M. The current open-weight vision leader.
When to pick this model
- Best-in-class open vision performance
- Long-context multimodal analysis (256k native, 1M extended)
- Document, chart, and video understanding at scale
- Apache-licensed alternative to closed multimodal APIs
VRAM requirements by quantization
| Quantization | VRAM required |
|---|---|
| Q4_K_M (recommended) | 142 GB |
| Q5_K_M | 170 GB |
| Q8_0 | 250 GB |
| FP16 (no quantization) | 470 GB |
VRAM figures include model weights plus a typical 8k KV cache and ~600 MB runtime overhead (Ollama / llama.cpp baseline). Add headroom for higher context lengths.
Strengths
- Top open-weight vision model as of May 2025
- 262k native context, extensible to 1M tokens
- Apache 2.0 license
- Only 22B active parameters keeps inference tractable
Limitations
- Around 142 GB VRAM at Q4 — multi-GPU required
- Heavier operational lift than dense alternatives
- Overkill for simple captioning workloads
Architecture & training
Architecture: MoE vision · 235B total / 22B active · Qwen3-VL flagship
Training: Qwen3-VL 235B — text, images, video, 262k native context.
Verdict
The open-vision benchmark to beat — if you can afford the GPUs, this is the model to deploy.
Quick start
ollama run qwen3-vl:235bOr use the open-source MCP server to query this model from Claude Desktop, Cursor, or any MCP-compatible client.