DBRX Instruct
By Databricks · United States
Overview
Databricks' 132B MoE with 36B active params, trained on 12T tokens — state-of-the-art at March 2024 release but largely surpassed by DeepSeek V3 and R1.
When to pick this model
- Databricks-native pipelines that need an in-house model
- Code and math workloads where 36B active params shine
- Research comparisons against modern frontier MoEs
- Multi-GPU deployments already provisioned for 100B+ models
- Internal evals before migrating to DeepSeek V3 or R1
VRAM requirements by quantization
| Quantization | VRAM required |
|---|---|
| Q4_K_M (recommended) | 76 GB |
| Q5_K_M | 94 GB |
| Q8_0 | 140 GB |
| FP16 (no quantization) | 264 GB |
VRAM figures include model weights plus a typical 8k KV cache and ~600 MB runtime overhead (Ollama / llama.cpp baseline). Add headroom for higher context lengths.
Published benchmark scores
| Benchmark | Score |
|---|---|
| MMLU | 73.7 |
| HumanEval | 70.1 |
Scores published by the model author or aggregated from public leaderboards. Re-measured monthly by our editorial team.
Strengths
- State-of-the-art quality at March 2024 release
- Strong on code and math benchmarks
- Databricks Open Model License is broadly permissive
- 12T tokens of high-quality training data
Limitations
- ~76 GB VRAM at Q4 demands multi-GPU serving
- Largely outclassed by DeepSeek V3 and R1 in 2025
- HuggingFace repo is gated, slowing access
Architecture & training
Architecture: MoE · 132B total / 36B active · 16 experts, 4 active per token
Training: Databricks — 12T high-quality tokens, strong in code and science.
Historically important but no longer competitive — only choose it inside Databricks pipelines where the integration justifies the cost.
Quick start
ollama run dbrxOr use the open-source MCP server to query this model from Claude Desktop, Cursor, or any MCP-compatible client.