BestLLMfor EN Your hardware. Your LLM. Your call.
APIOpen data Find my LLM
Model fiche

LFM2.5 Thinking 1.2B

By Liquid AI · United States

chat general reasoning small
Parameters
1.2B
License
LFM Open License v1.0
Context
32k
VRAM (Q4)
0.7 GB
Released
February 2026

Overview

Liquid AI's 1.2B reasoning variant with an explicit thinking mode, sub-1GB Q4 footprint, and CPU/iGPU-friendly inference. 32k context.

When to pick this model

  • On-device reasoning on laptops and SBCs without a discrete GPU
  • Latency-sensitive tasks that still benefit from chain-of-thought
  • Edge agents where memory budget rules out larger models
  • Privacy-first deployments that must stay fully local

VRAM requirements by quantization

QuantizationVRAM required
Q4_K_M (recommended)0.7 GB
Q5_K_M0.9 GB
Q8_01.3 GB
FP16 (no quantization)2.4 GB

VRAM figures include model weights plus a typical 8k KV cache and ~600 MB runtime overhead (Ollama / llama.cpp baseline). Add headroom for higher context lengths.

Strengths

  • Negligible memory footprint — under 1GB at Q4
  • Runs comfortably on CPU and integrated GPUs
  • Explicit thinking mode for visible chain-of-thought
  • Low-latency inference suitable for interactive use

Limitations

  • 1.2B parameters cap absolute capability
  • 32k context is short by 2026 standards
  • LFM Open License rather than pure Apache

Architecture & training

Architecture: Liquid Foundation Model · 1.2B parameters · 32k context · thinking mode

Training: Liquid AI's LFM2.5 family. Reasoning variant with explicit chain of thought.

Verdict

The most capable sub-2B reasoning model that still fits comfortably on a CPU-only laptop.

Quick start

ollama run lfm2.5-thinking

Or use the open-source MCP server to query this model from Claude Desktop, Cursor, or any MCP-compatible client.

Tools

Is LFM2.5 Thinking 1.2B the right pick for you?

Compute self-hosted ROI → Back to catalog