Model fiche
gpt-oss 120B
By OpenAI · United States
chat
general
reasoning
moe
Overview
OpenAI's first open-weight return: a 117B MoE with 5.1B active parameters, matching o4-mini quality. Fits a single 80 GB GPU and ships under Apache 2.0.
When to pick this model
- Production deployments wanting OpenAI quality on owned hardware
- Reasoning and coding workloads at frontier quality
- 128k-context document analysis on a single 80 GB GPU
- Apache-licensed alternative to API-only o4-mini
VRAM requirements by quantization
| Quantization | VRAM required |
|---|---|
| Q4_K_M (recommended) | 70 GB |
| Q5_K_M | 85 GB |
| Q8_0 | 125 GB |
| FP16 (no quantization) | 234 GB |
VRAM figures include model weights plus a typical 8k KV cache and ~600 MB runtime overhead (Ollama / llama.cpp baseline). Add headroom for higher context lengths.
Strengths
- Matches o4-mini on reasoning and coding benchmarks
- Apache 2.0 license with full commercial use
- 128k context out of the box
- Fits on a single 80 GB accelerator
Limitations
- Around 70 GB VRAM at Q4 — multi-GPU for higher precision
- MoE deployment is operationally more complex than dense
Architecture & training
Architecture: MoE · ~117B total / ~20B active · OpenAI open-source · 128k ctx
Training: OpenAI — first open-weight model released by OpenAI under MIT license.
Verdict
The most consequential open-weight release in years — frontier OpenAI quality on a single GPU under Apache 2.0.
Quick start
ollama run openai/gpt-oss:120bOr use the open-source MCP server to query this model from Claude Desktop, Cursor, or any MCP-compatible client.