Model fiche
gpt-oss 20B
By OpenAI · United States
chat
general
reasoning
moe
small
Overview
OpenAI's compact open-weight MoE with 3.6B active out of 21B total parameters. Matches o3-mini on a laptop-class GPU under Apache 2.0.
When to pick this model
- Local development on consumer or workstation GPUs
- Edge deployments needing frontier-vendor quality
- 128k-context tasks without datacenter hardware
- Apache-licensed replacement for o3-mini API calls
VRAM requirements by quantization
| Quantization | VRAM required |
|---|---|
| Q4_K_M (recommended) | 13 GB |
| Q5_K_M | 16 GB |
| Q8_0 | 23 GB |
| FP16 (no quantization) | 42 GB |
VRAM figures include model weights plus a typical 8k KV cache and ~600 MB runtime overhead (Ollama / llama.cpp baseline). Add headroom for higher context lengths.
Strengths
- Apache 2.0 with full commercial freedom
- Around 13 GB VRAM at Q4 — runs on a 16 GB card
- OpenAI quality in an accessible footprint
- Native 128k context
Limitations
- MoE format uses more VRAM than equivalent dense models
- Fewer community fine-tunes than Llama or Qwen
Architecture & training
Architecture: MoE · ~21B total / ~4B active · OpenAI open-source compact
Training: Lightweight version of OpenAI's open-source GPT series, ideal for local deployment.
Verdict
The clear default for local OpenAI-quality inference — accessible VRAM, 128k context, and a real license.
Quick start
ollama run openai/gpt-oss:20bOr use the open-source MCP server to query this model from Claude Desktop, Cursor, or any MCP-compatible client.