Model fiche
Qwen 3.5 9B
By Alibaba · China
chat
general
reasoning
multilingual
Overview
Alibaba's next-generation dense 9B model with a 262K native context window and an improved toggleable thinking mode. Apache 2.0 licensed.
When to pick this model
- Long-document analysis without RAG
- Multilingual assistants covering 119 languages
- Switching between fast and deep reasoning per request
- Single-GPU production deployments
- Permissive commercial use cases
VRAM requirements by quantization
| Quantization | VRAM required |
|---|---|
| Q4_K_M (recommended) | 6 GB |
| Q5_K_M | 7 GB |
| Q8_0 | 10 GB |
| FP16 (no quantization) | 18 GB |
VRAM figures include model weights plus a typical 8k KV cache and ~600 MB runtime overhead (Ollama / llama.cpp baseline). Add headroom for higher context lengths.
Strengths
- 262K native context in a 9B parameter model
- Toggleable thinking mode for cost control
- Strong multilingual performance across 119 languages
- Apache 2.0 license
Limitations
- Fine-tune ecosystem is still less mature than Qwen 2.5
- Thinking mode can be verbose by default
Architecture & training
Architecture: Dense · 9B · Qwen 3.5 · hybrid thinking · 262k native context
Training: Qwen 3 evolution with 262k context and improved thinking. 119 languages.
Verdict
The best long-context Apache-licensed 9B today, especially if you need toggleable reasoning.
Quick start
ollama run qwen3.5:9bOr use the open-source MCP server to query this model from Claude Desktop, Cursor, or any MCP-compatible client.