BestLLMfor EN Your hardware. Your LLM. Your call.
APIOpen data Find my LLM
Model fiche

LLaDA 2.0 Uni 16B

By Ant Group / inclusionAI · China

chat vision general moe
Parameters
16B
License
Apache 2.0
Context
8k
VRAM (Q4)
18 GB
Released
April 2026

Overview

Ant Group's first open Apache 2.0 diffusion LLM: a 16B/1B MoE paired with a 6.2B diffusion decoder, unifying text and vision generation and editing. Released April 2026.

When to pick this model

  • Research on diffusion-based language models
  • Unified text + image generation and editing in one stack
  • Interleaved thinking workflows during generation
  • Apache 2.0 commercial use of dLLM architectures
  • Experiments comparing diffusion vs. autoregressive decoding

VRAM requirements by quantization

QuantizationVRAM required
Q4_K_M (recommended)18 GB
Q5_K_M22 GB
Q8_030 GB
FP16 (no quantization)47 GB

VRAM figures include model weights plus a typical 8k KV cache and ~600 MB runtime overhead (Ollama / llama.cpp baseline). Add headroom for higher context lengths.

Strengths

  • The first Apache 2.0 open diffusion LLM
  • Unified text, vision, generation, and editing
  • Interleaved 'thinking' mode during diffusion
  • Decoder-turbo distillation runs 8 diffusion steps instead of 50
  • Apache 2.0 commercial license

Limitations

  • Diffusion architecture not supported by Ollama or llama.cpp
  • Requires Flash Attention 2 and CUDA 12.4
  • Around 47 GB VRAM during active generation
  • Only 8k context window

Architecture & training

Architecture: MoE 16B/1B active + Discrete Semantic Tokenizer (SigLIP-VQ) + Decoder Diffusion 6.2B + VAE

Training: Masked Token Prediction paradigm. Distilled decoder-turbo (10× speedup, 8 steps instead of 50). SPRINT acceleration.

Verdict

A research-first release that proves Apache 2.0 dLLMs are real — production users should wait for tooling to catch up.

Quick start

# HuggingFace : inclusionAI/LLaDA2.0-Uni (Flash Attn 2 + CUDA 12.4 requis)

Or use the open-source MCP server to query this model from Claude Desktop, Cursor, or any MCP-compatible client.

Tools

Is LLaDA 2.0 Uni 16B the right pick for you?

Compute self-hosted ROI → Back to catalog