Model fiche
Qwen 3.5 27B
By Alibaba · China
chat
general
reasoning
multilingual
Overview
Alibaba's dense 27B Qwen 3.5 with a 262K context window and calibrated thinking mode. One of the best quality-to-size trade-offs in the open 25B-30B class.
When to pick this model
- Math, science, and STEM-heavy reasoning
- Long-context analysis at 100K+ tokens
- Single high-end GPU production deployments
- Multilingual technical workloads
- Replacing closed mid-tier APIs
VRAM requirements by quantization
| Quantization | VRAM required |
|---|---|
| Q4_K_M (recommended) | 16 GB |
| Q5_K_M | 19 GB |
| Q8_0 | 29 GB |
| FP16 (no quantization) | 54 GB |
VRAM figures include model weights plus a typical 8k KV cache and ~600 MB runtime overhead (Ollama / llama.cpp baseline). Add headroom for higher context lengths.
Strengths
- 262K native context window
- Well-calibrated thinking mode
- Strong math and science reasoning
- Apache 2.0 license
Limitations
- Needs ~16GB VRAM in Q4
- Gemma 3 27B is a close competitor
- Thinking mode adds latency on simple queries
Architecture & training
Architecture: Dense · 27B · Qwen 3.5 · hybrid thinking · 262k native context
Training: Enriched Qwen 3.5 corpus, strong in complex reasoning with long context.
Verdict
The best Apache-licensed dense model in the 27B class for long-context reasoning.
Quick start
ollama run qwen3.5:27bOr use the open-source MCP server to query this model from Claude Desktop, Cursor, or any MCP-compatible client.