BestLLMfor EN Your hardware. Your LLM. Your call.
APIOpen data Find my LLM
Model fiche

QwQ 32B

By Alibaba · China

reasoning
Parameters
32B
License
Apache 2.0
Context
128k
VRAM (Q4)
19 GB
Released
March 2025

Overview

Alibaba's dedicated 32B reasoner, trained with reinforcement learning rather than distillation. Hits 79.5 on AIME24 and 90.6 on MATH-500 — a direct Apache-licensed alternative to DeepSeek R1.

When to pick this model

  • You need a frontier-class reasoner you can run on a single 48GB GPU
  • You're solving math, logic, or formal problems where chain-of-thought matters
  • You want an Apache-licensed alternative to DeepSeek R1
  • You need 131K context for long reasoning traces

VRAM requirements by quantization

QuantizationVRAM required
Q4_K_M (recommended)19 GB
Q5_K_M23 GB
Q8_035 GB
FP16 (no quantization)64 GB

VRAM figures include model weights plus a typical 8k KV cache and ~600 MB runtime overhead (Ollama / llama.cpp baseline). Add headroom for higher context lengths.

Published benchmark scores

BenchmarkScore
AIME 202479.5
MATH-50090.6

Scores published by the model author or aggregated from public leaderboards. Re-measured monthly by our editorial team.

Strengths

  • Direct competitor to DeepSeek R1 at a fraction of the size
  • 131K context for long thinking traces
  • Trained with RL, not just distilled
  • Apache 2.0

Limitations

  • Very verbose — token costs add up fast
  • Requires YaRN for context beyond 8K
  • Overkill for non-reasoning chat workloads

Architecture & training

Architecture: Dense · 64 layers · GQA (40Q/8KV) · RoPE · SwiGLU · trained with outcome-based RL

Training: RL on reasoning (not a simple distillation).

Verdict

The best Apache-licensed reasoner you can run on a single GPU.

Quick start

ollama run qwq:32b

Or use the open-source MCP server to query this model from Claude Desktop, Cursor, or any MCP-compatible client.

Tools

Is QwQ 32B the right pick for you?

Compute self-hosted ROI → Back to catalog