Model fiche
Qwen 3.6 27B
By Alibaba · China
chat
code
reasoning
vision
multilingual
Overview
Alibaba's Qwen 3.6 27B — multimodal vision and text with a native 256k context, tuned for multilingual reasoning and code. Fits a 16GB GPU at Q4.
When to pick this model
- Multilingual code generation across mixed-language codebases
- Long-context document analysis up to 256k tokens
- Vision-grounded reasoning over screenshots, diagrams, and PDFs
- Self-hosted alternative to commercial multimodal APIs
VRAM requirements by quantization
| Quantization | VRAM required |
|---|---|
| Q4_K_M (recommended) | 16 GB |
| Q5_K_M | 19 GB |
| Q8_0 | 29 GB |
| FP16 (no quantization) | 54 GB |
VRAM figures include model weights plus a typical 8k KV cache and ~600 MB runtime overhead (Ollama / llama.cpp baseline). Add headroom for higher context lengths.
Strengths
- Native 256k context handles entire repos and long PDFs
- Genuinely multimodal — vision plus text
- Strong multilingual code performance
- Reasoning sharpened over earlier Qwen generations
Limitations
- Qwen License — not strictly Apache, review terms
- Needs ~16GB VRAM at Q4
- Gated on Hugging Face
Architecture & training
Architecture: Dense transformer · 27B parameters · multimodal text + image · 256k context
Training: Qwen 3.6 family (Alibaba). Multimodal vision, focus on reasoning and multilingual code.
Verdict
Qwen's most capable mid-size open model — a strong multimodal pick for a single 16GB+ GPU.
Quick start
ollama run qwen3.6Or use the open-source MCP server to query this model from Claude Desktop, Cursor, or any MCP-compatible client.