BestLLMfor EN Your hardware. Your LLM. Your call.
APIOpen data Find my LLM
Model fiche

Trinity Mini 26B-A3B

By Arcee AI · United States

chat general moe
Parameters
26B
License
Apache 2.0
Context
128k
VRAM (Q4)
15 GB
Released
March 2025

Overview

Arcee AI's US-built MoE with 3B active parameters out of 26B total. Apache-licensed, fast in practice, and tuned for agent-style workloads.

When to pick this model

  • Agent frameworks needing fast, capable open models
  • Enterprise deployments preferring US-based vendors
  • Single-GPU inference with 128k context
  • Apache-licensed MoE for commercial products

VRAM requirements by quantization

QuantizationVRAM required
Q4_K_M (recommended)15 GB
Q5_K_M18 GB
Q8_028 GB
FP16 (no quantization)52 GB

VRAM figures include model weights plus a typical 8k KV cache and ~600 MB runtime overhead (Ollama / llama.cpp baseline). Add headroom for higher context lengths.

Strengths

  • Efficient MoE with around 3.5B active parameters
  • 131k context window
  • Tuned for agent and tool-use workflows
  • Apache 2.0

Limitations

  • Limited public benchmark coverage
  • Less name recognition than Mistral or Qwen
  • Smaller fine-tune ecosystem

Architecture & training

Architecture: MoE · 26B total / 3.5B active · Arcee AI · 131k context

Training: Arcee AI — compact MoE for agents and enterprise.

Verdict

A solid US-built MoE for agent work — worth a serious look if you value Apache licensing and a domestic vendor.

Quick start

ollama pull hf.co/arcee-ai/Trinity-Mini-26B-GGUF

Or use the open-source MCP server to query this model from Claude Desktop, Cursor, or any MCP-compatible client.

Tools

Is Trinity Mini 26B-A3B the right pick for you?

Compute self-hosted ROI → Back to catalog