BestLLMfor EN Your hardware. Your LLM. Your call.
APIOpen data Find my LLM
Model fiche

LLaVA-OneVision 7B

By LMMs-Lab · Singapore

vision chat
Parameters
7B
License
Apache 2.0
Context
32k
VRAM (Q4)
5 GB
Released
August 2024

Overview

An Apache-licensed 7B vision-language model from LMMs-Lab, combining SigLIP SO400M with Qwen2-7B. Handles single images, multi-image inputs, and video at over 170k monthly downloads.

When to pick this model

  • Self-hosted VLM apps needing a permissive license
  • Multi-image reasoning and short video understanding
  • Fine-tuning base for domain-specific vision tasks
  • Cost-sensitive image captioning and VQA pipelines

VRAM requirements by quantization

QuantizationVRAM required
Q4_K_M (recommended)5 GB
Q5_K_M6 GB
Q8_09 GB
FP16 (no quantization)16 GB

VRAM figures include model weights plus a typical 8k KV cache and ~600 MB runtime overhead (Ollama / llama.cpp baseline). Add headroom for higher context lengths.

Strengths

  • Fully Apache 2.0 with no commercial gotchas
  • Genuine multi-image and video support
  • Mature ecosystem with strong community traction
  • Solid Qwen2-7B language backbone

Limitations

  • No official Ollama packaging
  • English-first; weaker on non-English vision QA
  • Outpaced by Qwen3-VL on most 2025 benchmarks

Architecture & training

Architecture: VLM 7B · SO400M + Qwen2-7B · image/multi-image/video

Training: LMMs-Lab (Singapore).

Verdict

A dependable, truly open VLM for self-hosters who value Apache licensing over the latest leaderboard score.

Quick start

# HuggingFace : lmms-lab/llava-onevision-qwen2-7b-ov

Or use the open-source MCP server to query this model from Claude Desktop, Cursor, or any MCP-compatible client.

Tools

Is LLaVA-OneVision 7B the right pick for you?

Compute self-hosted ROI → Back to catalog