BestLLMfor EN Your hardware. Your LLM. Your call.
APIOpen data Find my LLM
Model fiche

gpt-oss 120B

By OpenAI · United States

chat general reasoning moe
Parameters
117B
License
Apache 2.0
Context
125k
VRAM (Q4)
70 GB
Released
April 2025

Overview

OpenAI's first open-weight return: a 117B MoE with 5.1B active parameters, matching o4-mini quality. Fits a single 80 GB GPU and ships under Apache 2.0.

When to pick this model

  • Production deployments wanting OpenAI quality on owned hardware
  • Reasoning and coding workloads at frontier quality
  • 128k-context document analysis on a single 80 GB GPU
  • Apache-licensed alternative to API-only o4-mini

VRAM requirements by quantization

QuantizationVRAM required
Q4_K_M (recommended)70 GB
Q5_K_M85 GB
Q8_0125 GB
FP16 (no quantization)234 GB

VRAM figures include model weights plus a typical 8k KV cache and ~600 MB runtime overhead (Ollama / llama.cpp baseline). Add headroom for higher context lengths.

Strengths

  • Matches o4-mini on reasoning and coding benchmarks
  • Apache 2.0 license with full commercial use
  • 128k context out of the box
  • Fits on a single 80 GB accelerator

Limitations

  • Around 70 GB VRAM at Q4 — multi-GPU for higher precision
  • MoE deployment is operationally more complex than dense

Architecture & training

Architecture: MoE · ~117B total / ~20B active · OpenAI open-source · 128k ctx

Training: OpenAI — first open-weight model released by OpenAI under MIT license.

Verdict

The most consequential open-weight release in years — frontier OpenAI quality on a single GPU under Apache 2.0.

Quick start

ollama run openai/gpt-oss:120b

Or use the open-source MCP server to query this model from Claude Desktop, Cursor, or any MCP-compatible client.

Tools

Is gpt-oss 120B the right pick for you?

Compute self-hosted ROI → Back to catalog