BestLLMfor EN Your hardware. Your LLM. Your call.
APIOpen data Find my LLM
Model fiche

Qwen 3 Omni 30B-A3B

By Alibaba · China

vision audio chat moe
Parameters
30B
License
Apache 2.0
Context
128k
VRAM (Q4)
19 GB
Released
May 2025

Overview

Alibaba's omni-modal 30B MoE (3B active) with streaming speech, 119-language ASR, and Apache 2.0 licensing. The most accessible truly omnimodal open model.

When to pick this model

  • Voice-first assistants with low-latency speech in/out
  • Multilingual ASR across 119 languages
  • Real-time multimodal agents on a single GPU
  • Long-context multimodal reasoning (131k)
  • Apache 2.0 commercial deployments

VRAM requirements by quantization

QuantizationVRAM required
Q4_K_M (recommended)19 GB
Q5_K_M23 GB
Q8_035 GB
FP16 (no quantization)62 GB

VRAM figures include model weights plus a typical 8k KV cache and ~600 MB runtime overhead (Ollama / llama.cpp baseline). Add headroom for higher context lengths.

Strengths

  • Native omnimodal I/O: text, image, audio in and out
  • 131k context
  • Streaming speech for low-latency voice apps
  • Apache 2.0 license
  • Only 3B active params per token

Limitations

  • Around 19 GB VRAM in Q4
  • Audio path is still maturing relative to text and vision
  • Tooling support uneven outside vLLM

Architecture & training

Architecture: MoE · 30B · Qwen3-Omni · text + vision + audio end-to-end

Training: Qwen3-Omni 30B — Qwen omnimodal model (text, images, audio in/out).

Verdict

The default open choice if you actually need audio in and out, not just text and images.

Quick start

ollama run qwen3-omni:30b

Or use the open-source MCP server to query this model from Claude Desktop, Cursor, or any MCP-compatible client.

Tools

Is Qwen 3 Omni 30B-A3B the right pick for you?

Compute self-hosted ROI → Back to catalog