Model fiche
Qwen 3 VL 8B
By Alibaba · China
vision
chat
general
multilingual
Overview
The dense 8B entry in Qwen 3 VL, offering strong OCR and document analysis with a remarkable 256k multimodal context for its size.
When to pick this model
- On-device or edge multimodal inference
- OCR and structured document extraction
- Long-context multimodal tasks on modest GPUs
- Quick prototyping of VLM-powered features
VRAM requirements by quantization
| Quantization | VRAM required |
|---|---|
| Q4_K_M (recommended) | 6 GB |
| Q5_K_M | 7 GB |
| Q8_0 | 10 GB |
| FP16 (no quantization) | 16 GB |
VRAM figures include model weights plus a typical 8k KV cache and ~600 MB runtime overhead (Ollama / llama.cpp baseline). Add headroom for higher context lengths.
Strengths
- Around 6 GB VRAM at Q4 — runs almost anywhere
- 262k multimodal context in an 8B model
- Solid OCR and document analysis
- Apache 2.0
Limitations
- Trails the 30B variant on complex scene reasoning
- Limited capacity for advanced visual reasoning
Architecture & training
Architecture: Dense vision · 8B · Qwen3-VL · 262k context
Training: Qwen3-VL 8B — accessible version of the Qwen3 vision family.
Verdict
The go-to small open VLM — Apache-licensed, long-context, and capable enough for most production document workflows.
Quick start
ollama run qwen3-vl:8bOr use the open-source MCP server to query this model from Claude Desktop, Cursor, or any MCP-compatible client.