LightMem (ICLR 2026): Lightweight and Efficient Memory-Augmented Generation — 10×+ gains with 100× lower cost

Posted by zxlzr@reddit | LocalLLaMA | View on Reddit | 15 comments

We’re excited to share that our work **LightMem** has been accepted to **ICLR 2026** 🎉 **Paper:** [https://arxiv.org/abs/2510.18866](https://arxiv.org/abs/2510.18866) **Code:** [https://github.com/zjunlp/LightMem](https://github.com/zjunlp/LightMem) LightMem is a lightweight, modular memory system for LLM agents that enables scalable long-context reasoning and structured memory management across tasks and environments. # 🧩 Motivation LLMs struggle in long, multi-turn interactions: * context grows noisy and expensive * models get “lost in the middle” * memory layers add latency & token cost Existing memory systems can be accurate — but often heavy on tokens, API calls, and runtime. https://preview.redd.it/5zoz8i0wgvlg1.png?width=672&format=png&auto=webp&s=6bb278e942b4587a5e4c4271c57a077aa59f4136 # 💡 LightMem keeps memories compact, topical, and consistent: **1️⃣ Pre-compress sensory memory** Filter redundant / low-value tokens before storage. **2️⃣ Topic-aware short-term memory** Cluster turns by topic and summarize into precise memory units. **3️⃣ Sleep-time long-term consolidation** Incremental inserts at runtime + offline high-fidelity updates (no latency hit). # 🔬 Results On **LongMemEval**: * Accuracy ↑ up to **\~10.9%** * Tokens ↓ up to **117×** * API calls ↓ up to **159×** * Runtime ↓ **>12×** So LightMem often improves reasoning **while dramatically cutting cost**. # 🧪 Recent updates * Baseline evaluation framework across memory systems (Mem0, A-MEM, LangMem) on LoCoMo & LongMemEval * Demo video + tutorial notebooks (multiple scenarios) * MCP Server integration → multi-tool memory invocation * Full LoCoMo dataset support * GLM-4.6 integration with reproducible scripts * Local deployment via Ollama, vLLM, Transformers (auto-load) # 🧱 Positioning LightMem is designed as a **modular memory layer** that can sit inside agent stacks: * long-context agents * tool-using agents * autonomous workflows * conversational systems Think: structured memory that scales without exploding tokens. # 🙌 Feedback welcome We’d love input from: * agent framework devs * memory / RAG researchers * long-context model folks * applied LLM teams Issues & PRs welcome: [https://github.com/zjunlp/LightMem](https://github.com/zjunlp/LightMem) Let’s make agent memory practical, scalable, and lightweight 🚀

15 Comments

[-]

ruizibdz@reddit

Wonder how much of the huge performance improvement would lands into real skills related framework like openclaw/hermes-agent. It would be a great impact if even lands 50%.

Other_Chest_1039@reddit

Interesting numbers. I've been playing with a hybrid retrieval stack (vector + BM25 + graph + temporal fused via RRF) and noticed the biggest practical win isn't the fancy retrieval math - it's idempotent ingest and entity resolution. Without deterministic chunk IDs, I had 36% duplicates in production collection after a few months. Without alias-merging, one person ended up as 6 separate graph nodes and recall scattered. How does LightMem handle churn over time?

crusoe@reddit

This is awesome but having gotten back into the python coding space after 15 years somehow the package management is worse. Conda, mamba, its all real bad. Just a complete pain. I'm sticking to rust because it's a billion times easier than the current python mess.

zxlzr@reddit (OP)

Haha, Python library versions are a big headache lots of open-source projects struggle with this.

smwaqas89@reddit

Honestly, LightMem's modular design is a significant leap for LLM performance, especially in handling long-context reasoning. The reported 10× performance gains are impressive but we need concrete benchmarks to understand how this stacks against current solutions. For real world applications scalable memory management can greatly improve usability, though integrating it into existing systems might present challenges. Transitioning may require careful planning around data pipelines and system architecture to maximize those gains. It’ll be interesting to see how this advances practical use cases as teams adopt this tech...

Thank you for your interest. You’re absolutely right, most current benchmarks are not very objective. Some newer benchmarks may provide a more accurate assessment of efficiency. We will continue to maintain and improve them going forward.

ClimateBoss@reddit

How do i use this in llama.cpp?

we now support Ollama, vLLM, Transformers (auto-load), we will try to support llama.cpp soon.

blakeheron@reddit

the "sleep-time consolidation" approach is clever — i've tried doing similar with Mem0 but the token overhead gets painful fast when you're running 100k+ context windows. one thing i'd watch out for: how do you handle memory invalidation when the underlying facts change? that's the part that always bites me with agent memory systems. do you version individual memory units or just overwrite?

We’ve just overwrite, and we’ll continue to refine and improve this work going forward.

overwrite works until something breaks and you need to trace back why. that’s where versioning pays for itself.

nuclearbananana@reddit

ooh very interesting, an actually decent memory system that's not ridiculously expensive. I see accuracy is competitive with full text, but how are costs? Would also be curious about the locomo breakdown by subset

Thank you for your interest. We will continue maintaining the GitHub repository to improve the user experience.

Busy_Entrepreneur709@reddit

AI_Novice2@reddit

This sounds like a fascinating development! Memory-augmented generation has so much potential, especially if it can achieve those kinds of efficiency gains. I'm curious about the practical implications of LightMem—how do you see it being applied in real-world scenarios? Also, if it can indeed lower costs significantly, it might make advanced AI models more accessible for smaller companies or researchers without big budgets. It would be great to hear thoughts on how this could democratize AI technology. Have any of you worked with similar memory-augmented techniques before? What challenges did you face?

Reply to Post

15 Comments