Lossless LLM 3x Throughput Increase by LMCache

https://news.ycombinator.com/rss Hits: 12

Summary

Redis for LLMs - Infinite and Ultra-Fast LMCache is an LLM serving engine extension to reduce TTFT and increase throughput, especially under long-context scenarios. By storing the KV caches of reusable texts across various locations, including (GPU, CPU DRAM, Local Disk), LMCache reuses the KV caches of any reused text (not necessarily prefix) in any serving engine instance. Thus, LMCache saves precious GPU cycles and reduces user response delay. By combining LMCache with vLLM, LMCache achieves 3-10x delay savings and GPU cycle reduction in many LLM use cases, including multi-round QA and RAG. Try LMCache with pre-built vllm docker images here. 🚀 Performance snapshot 💻 Installation and Quickstart Please refer to our detailed documentation for LMCache V1 and LMCache V0 Interested in Connecting? Fill out the interest form, sign up for our newsletter, or drop an email, and our team will reach out to you! 🛣️ News and Milestones LMCache V1 with vLLM integration with following features is live 🔥 High performance CPU KVCache offloading Disaggregated prefill P2P KVCache sharing LMCache V1 with vLLM integration with following features is live 🔥 LMCache is supported in the vLLM production stack ecosystem LMCache is supported in the vLLM production stack ecosystem User and developer documentation User and developer documentation Stable support for non-prefix KV caches Stable support for non-prefix KV caches Support installation through pip install and integrate with latest vLLM Support installation through pip install and integrate with latest vLLM First release of LMCache 📖 Blogs and documentations Our latest blog posts and the documentation pages are available online Community meeting The community meeting for LMCache is hosted weekly. Meeting Details: Tuesdays at 9:00 AM PT – Add to Calendar Tuesdays at 6:30 PM PT – Add to Calendar Meetings alternate weekly between the two times. All are welcome to join! Contributing We welcome and value any contributions and collaborations...

First seen: 2025-06-28 12:31

Last seen: 2025-06-28 23:35

Read Full Article More from this Source

Lossless LLM 3x Throughput Increase by LMCache

Summary

Related News

Schizophrenia Is the Price We Pay for Minds Poised Near the Edge of a Cliff

Group of investors represented by YouTuber Perifractic buys Commodore

Sirius: A GPU-native SQL engine

A literary magazine accessible only via telnet

An Indoor Beehive in My Bedroom Wall