LightMem (ICLR 2026): Lightweight and Efficient Memory-Augmented Generation — 10×+ gains with 100× lower cost

Posted by zxlzr@reddit | LocalLLaMA | View on Reddit | 15 comments

We’re excited to share that our work **LightMem** has been accepted to **ICLR 2026** 🎉 **Paper:** [https://arxiv.org/abs/2510.18866](https://arxiv.org/abs/2510.18866) **Code:** [https://github.com/zjunlp/LightMem](https://github.com/zjunlp/LightMem) LightMem is a lightweight, modular memory system for LLM agents that enables scalable long-context reasoning and structured memory management across tasks and environments. # 🧩 Motivation LLMs struggle in long, multi-turn interactions: * context grows noisy and expensive * models get “lost in the middle” * memory layers add latency & token cost Existing memory systems can be accurate — but often heavy on tokens, API calls, and runtime. https://preview.redd.it/5zoz8i0wgvlg1.png?width=672&format=png&auto=webp&s=6bb278e942b4587a5e4c4271c57a077aa59f4136 # 💡 LightMem keeps memories compact, topical, and consistent: **1️⃣ Pre-compress sensory memory** Filter redundant / low-value tokens before storage. **2️⃣ Topic-aware short-term memory** Cluster turns by topic and summarize into precise memory units. **3️⃣ Sleep-time long-term consolidation** Incremental inserts at runtime + offline high-fidelity updates (no latency hit). # 🔬 Results On **LongMemEval**: * Accuracy ↑ up to **\~10.9%** * Tokens ↓ up to **117×** * API calls ↓ up to **159×** * Runtime ↓ **>12×** So LightMem often improves reasoning **while dramatically cutting cost**. # 🧪 Recent updates * Baseline evaluation framework across memory systems (Mem0, A-MEM, LangMem) on LoCoMo & LongMemEval * Demo video + tutorial notebooks (multiple scenarios) * MCP Server integration → multi-tool memory invocation * Full LoCoMo dataset support * GLM-4.6 integration with reproducible scripts * Local deployment via Ollama, vLLM, Transformers (auto-load) # 🧱 Positioning LightMem is designed as a **modular memory layer** that can sit inside agent stacks: * long-context agents * tool-using agents * autonomous workflows * conversational systems Think: structured memory that scales without exploding tokens. # 🙌 Feedback welcome We’d love input from: * agent framework devs * memory / RAG researchers * long-context model folks * applied LLM teams Issues & PRs welcome: [https://github.com/zjunlp/LightMem](https://github.com/zjunlp/LightMem) Let’s make agent memory practical, scalable, and lightweight 🚀