Model fiche
Trinity Mini 26B-A3B
By Arcee AI · United States
chat
general
moe
Overview
Arcee AI's US-built MoE with 3B active parameters out of 26B total. Apache-licensed, fast in practice, and tuned for agent-style workloads.
When to pick this model
- Agent frameworks needing fast, capable open models
- Enterprise deployments preferring US-based vendors
- Single-GPU inference with 128k context
- Apache-licensed MoE for commercial products
VRAM requirements by quantization
| Quantization | VRAM required |
|---|---|
| Q4_K_M (recommended) | 15 GB |
| Q5_K_M | 18 GB |
| Q8_0 | 28 GB |
| FP16 (no quantization) | 52 GB |
VRAM figures include model weights plus a typical 8k KV cache and ~600 MB runtime overhead (Ollama / llama.cpp baseline). Add headroom for higher context lengths.
Strengths
- Efficient MoE with around 3.5B active parameters
- 131k context window
- Tuned for agent and tool-use workflows
- Apache 2.0
Limitations
- Limited public benchmark coverage
- Less name recognition than Mistral or Qwen
- Smaller fine-tune ecosystem
Architecture & training
Architecture: MoE · 26B total / 3.5B active · Arcee AI · 131k context
Training: Arcee AI — compact MoE for agents and enterprise.
Verdict
A solid US-built MoE for agent work — worth a serious look if you value Apache licensing and a domestic vendor.
Quick start
ollama pull hf.co/arcee-ai/Trinity-Mini-26B-GGUFOr use the open-source MCP server to query this model from Claude Desktop, Cursor, or any MCP-compatible client.