InternVL 3.5 8B
By OpenGVLab · China
Overview
OpenGVLab's 8B vision-language model leading MMMU among open models. Built at Shanghai AI Lab and released under Apache 2.0.
When to pick this model
- Best-in-class 8B vision for OCR and chart understanding
- Single-GPU multimodal deployments
- Document and PDF analysis pipelines
- Apache-licensed VLM for commercial products
VRAM requirements by quantization
| Quantization | VRAM required |
|---|---|
| Q4_K_M (recommended) | 6 GB |
| Q5_K_M | 7 GB |
| Q8_0 | 10 GB |
| FP16 (no quantization) | 16 GB |
VRAM figures include model weights plus a typical 8k KV cache and ~600 MB runtime overhead (Ollama / llama.cpp baseline). Add headroom for higher context lengths.
Published benchmark scores
| Benchmark | Score |
|---|---|
| MMMU | 61.5 |
| DocVQA | 94.1 |
Scores published by the model author or aggregated from public leaderboards. Re-measured monthly by our editorial team.
Strengths
- Top quality-per-parameter ratio in 8B vision
- Strong OCR and chart understanding
- Apache 2.0 license
- Solid VQA and short-video performance
Limitations
- 32k context limits long-document multimodal work
- Weaker multilingual coverage than Qwen2-VL
- No native long-context extension
Architecture & training
Architecture: Dense vision · 8B · InternVL 3.5 · InternLM backbone
Training: OpenGVLab — OCR, VQA, charts, short videos, PDF documents.
The benchmark-leading small open VLM for OCR and charts — the right pick when you need accuracy more than context length.
Quick start
ollama run internvl3.5:8bOr use the open-source MCP server to query this model from Claude Desktop, Cursor, or any MCP-compatible client.