Liquid AI Ships LFM2.5-230M with llama.cpp, MLX, vLLM, SGLang, and ONNX Support for On-Device Inference

AI Agents 📅 2026-06-28 👁 30 views 🏷 Liquid AI, LFM2.5-230M, on-device inference, llama.cpp, MLX, vLLM, SGLang, ONNX, edge AI, small language model, Galaxy S25 Ultra, Raspberry Pi 5

Liquid AI has released its smallest language model to date, the LFM2.5-230M, designed specifically for on-device inference. The model delivers impressive performance, achieving 213 tokens per second on a Samsung Galaxy S25 Ultra and 42 tokens per second on a Raspberry Pi 5. This new release expands the company's lineup of efficient, hardware-friendly models suited for edge deployment.

Seamless Integration with Popular Inference Frameworks

The LFM2.5-230M is fully compatible with multiple leading inference engines, including:

llama.cpp: Optimized for CPU and hybrid CPU/GPU inference on consumer hardware.
MLX: Apple's machine learning framework for efficient execution on Mac devices.
vLLM: High-throughput serving system for production environments.
SGLang: Structured generation language for reliable, formatted outputs.
ONNX: Open standard for model interchange across platforms.

This broad framework support enables developers to deploy the model across a wide range of devices—from smartphones to single-board computers—without significant modifications to their existing pipelines.

Implications for Edge AI in 2026

As of 2026, the push toward running capable AI models directly on devices has intensified due to growing privacy concerns, latency requirements, and the need for offline functionality. The LFM2.5-230M addresses these trends by delivering competitive performance at a tiny footprint. At just 230 million parameters, it is well-suited for tasks like real-time language understanding, lightweight text generation, and embedded conversational agents.

Liquid AI's focus on on-device inference aligns with industry moves toward edge-native AI, especially in mobile and IoT ecosystems. By supporting multiple inference backends, the company ensures that its model can be integrated into diverse software stacks, from mobile apps to robotics controllers.

Technical Highlights

Model size: 230M parameters
On-device speed: 213 tokens/s (Galaxy S25 Ultra), 42 tokens/s (Raspberry Pi 5)
Inference support: llama.cpp, MLX, vLLM, SGLang, ONNX
Target use cases: Edge AI, mobile assistants, low-latency language processing

With the LFM2.5-230M, Liquid AI provides a practical option for developers who need efficient language capabilities without relying on cloud infrastructure. The combination of tiny footprint, broad framework support, and strong per-token performance makes it a noteworthy addition to the 2026 edge AI landscape.

via MarkTechPost

Liquid AI Ships LFM2.5-230M with llama.cpp, MLX, vLLM, SGLang, and ONNX Support for On-Device Inference

Seamless Integration with Popular Inference Frameworks

Implications for Edge AI in 2026

Technical Highlights

Related