Model fiche
LFM2.5 Thinking 1.2B
By Liquid AI · United States
chat
general
reasoning
small
Overview
Liquid AI's 1.2B reasoning variant with an explicit thinking mode, sub-1GB Q4 footprint, and CPU/iGPU-friendly inference. 32k context.
When to pick this model
- On-device reasoning on laptops and SBCs without a discrete GPU
- Latency-sensitive tasks that still benefit from chain-of-thought
- Edge agents where memory budget rules out larger models
- Privacy-first deployments that must stay fully local
VRAM requirements by quantization
| Quantization | VRAM required |
|---|---|
| Q4_K_M (recommended) | 0.7 GB |
| Q5_K_M | 0.9 GB |
| Q8_0 | 1.3 GB |
| FP16 (no quantization) | 2.4 GB |
VRAM figures include model weights plus a typical 8k KV cache and ~600 MB runtime overhead (Ollama / llama.cpp baseline). Add headroom for higher context lengths.
Strengths
- Negligible memory footprint — under 1GB at Q4
- Runs comfortably on CPU and integrated GPUs
- Explicit thinking mode for visible chain-of-thought
- Low-latency inference suitable for interactive use
Limitations
- 1.2B parameters cap absolute capability
- 32k context is short by 2026 standards
- LFM Open License rather than pure Apache
Architecture & training
Architecture: Liquid Foundation Model · 1.2B parameters · 32k context · thinking mode
Training: Liquid AI's LFM2.5 family. Reasoning variant with explicit chain of thought.
Verdict
The most capable sub-2B reasoning model that still fits comfortably on a CPU-only laptop.
Quick start
ollama run lfm2.5-thinkingOr use the open-source MCP server to query this model from Claude Desktop, Cursor, or any MCP-compatible client.