BestLLMfor EN Your hardware. Your LLM. Your call.
APIOpen data Find my LLM
Model fiche

Qwen 3.5 0.8B

By Alibaba · China

chat general small multilingual
Parameters
0.8B
License
Apache 2.0
Context
250k
VRAM (Q4)
0.5 GB
Released
April 2026

Overview

Alibaba's ultra-compact 0.8B chat model with a 256k context window and a sub-1GB Q4 footprint, Apache 2.0 on Ollama. Runs on CPUs, integrated GPUs, and Raspberry Pi.

When to pick this model

  • Embedded assistants on phones, SBCs, and microcontrollers with NPUs
  • Cheap classification, routing, or instruction-following at scale
  • Offline chat where memory and power budgets are tight
  • Long-context retrieval scenarios that don't need deep reasoning

VRAM requirements by quantization

QuantizationVRAM required
Q4_K_M (recommended)0.5 GB
Q5_K_M0.6 GB
Q8_00.9 GB
FP16 (no quantization)1.6 GB

VRAM figures include model weights plus a typical 8k KV cache and ~600 MB runtime overhead (Ollama / llama.cpp baseline). Add headroom for higher context lengths.

Strengths

  • Negligible memory footprint — under 1GB at Q4
  • 256k context, rare at this size
  • Apache 2.0 distribution via Ollama
  • Runs comfortably on CPU, integrated GPU, or Raspberry Pi

Limitations

  • Reasoning quality is inherently limited at 0.8B
  • Text-only — no vision capability
  • Hugging Face distribution uses the Qwen license rather than Apache

Architecture & training

Architecture: Dense Transformer · 0.8B parameters

Training: Qwen 3.5 family (Alibaba). Ultra-compact variant aligned for chat/instruct.

Verdict

The right pick when you need a real LLM in under a gigabyte and don't need it to think hard.

Quick start

ollama run qwen3.5

Or use the open-source MCP server to query this model from Claude Desktop, Cursor, or any MCP-compatible client.

Tools

Is Qwen 3.5 0.8B the right pick for you?

Compute self-hosted ROI → Back to catalog