Model fiche
Seed-OSS 36B Instruct
By ByteDance · China
chat
general
Overview
ByteDance's first major open release: a dense 36B model with a native 524k context — roughly 4× the competition. Apache 2.0.
When to pick this model
- Extreme long-document analysis (codebases, books, transcripts)
- RAG-free workflows that load everything into context
- Dense-model deployments preferring predictable behavior
- Apache-licensed commercial use
VRAM requirements by quantization
| Quantization | VRAM required |
|---|---|
| Q4_K_M (recommended) | 22 GB |
| Q5_K_M | 26 GB |
| Q8_0 | 40 GB |
| FP16 (no quantization) | 72 GB |
VRAM figures include model weights plus a typical 8k KV cache and ~600 MB runtime overhead (Ollama / llama.cpp baseline). Add headroom for higher context lengths.
Strengths
- 524k native context — a record for accessible dense models
- Dense 36B is easier to deploy than equivalent MoEs
- Strong long-document comprehension
- Apache 2.0
Limitations
- Around 22 GB VRAM at Q4 (much more with full context)
- ByteDance license terms need a careful read
- Limited fine-tune ecosystem at launch
Architecture & training
Architecture: Dense · 36B · ByteDance Seed-OSS · 524k native context
Training: ByteDance — very long context (524k tokens) natively supported.
Verdict
Unmatched long-context for a dense open model — the pick when you genuinely need to load 500k+ tokens at once.
Quick start
ollama pull hf.co/ByteDance/seed-oss-36b-GGUFOr use the open-source MCP server to query this model from Claude Desktop, Cursor, or any MCP-compatible client.