BestLLMfor EN Your hardware. Your LLM. Your call.
APIOpen data Find my LLM
Model fiche

gpt-oss 20B

By OpenAI · United States

chat general reasoning moe small
Parameters
21B
License
Apache 2.0
Context
125k
VRAM (Q4)
13 GB
Released
April 2025

Overview

OpenAI's compact open-weight MoE with 3.6B active out of 21B total parameters. Matches o3-mini on a laptop-class GPU under Apache 2.0.

When to pick this model

  • Local development on consumer or workstation GPUs
  • Edge deployments needing frontier-vendor quality
  • 128k-context tasks without datacenter hardware
  • Apache-licensed replacement for o3-mini API calls

VRAM requirements by quantization

QuantizationVRAM required
Q4_K_M (recommended)13 GB
Q5_K_M16 GB
Q8_023 GB
FP16 (no quantization)42 GB

VRAM figures include model weights plus a typical 8k KV cache and ~600 MB runtime overhead (Ollama / llama.cpp baseline). Add headroom for higher context lengths.

Strengths

  • Apache 2.0 with full commercial freedom
  • Around 13 GB VRAM at Q4 — runs on a 16 GB card
  • OpenAI quality in an accessible footprint
  • Native 128k context

Limitations

  • MoE format uses more VRAM than equivalent dense models
  • Fewer community fine-tunes than Llama or Qwen

Architecture & training

Architecture: MoE · ~21B total / ~4B active · OpenAI open-source compact

Training: Lightweight version of OpenAI's open-source GPT series, ideal for local deployment.

Verdict

The clear default for local OpenAI-quality inference — accessible VRAM, 128k context, and a real license.

Quick start

ollama run openai/gpt-oss:20b

Or use the open-source MCP server to query this model from Claude Desktop, Cursor, or any MCP-compatible client.

Tools

Is gpt-oss 20B the right pick for you?

Compute self-hosted ROI → Back to catalog