Agent Memory

Posted by AutomataManifold@reddit | LocalLLaMA | View on Reddit | 31 comments

I was researching what options are out there for handling memory for agent-based systems and so forth, and I figured that maybe someone else would benefit from seeing the list.

A lot of agent systems assume GPT access and aren't set up to use local models at all, even if they would theoretically outperform GPT-3. You can often hack in a call to a local server via an API, but it's a bit of a pain and there's no guarantee that the prompts will even work on a different model.

Memory specific projects on GitHub:

Letta - "Letta is an open source framework for building stateful LLM applications." - seems to be designed to run as a server. Based around the ideas in the MemGPT paper, which involves using an LLM to self-edit memory via tool calling. You can call the server from Python with the SDK. There's documentation for connecting to vLLM and Ollama. They recommend using Q6 or Q8 models.

Memoripy - new kid on the block, supports Ollama and OpenAI with other support coming. Tries to model memory in a way that keeps more important memories more available than less important ones.

Mem0 - "an intelligent memory layer" - has gpt-4o as the default but can use LiteLLM to talk to open models.

cognee - "Cognee implements scalable, modular ECL (Extract, Cognify, Load) pipelines" - A little more oriented around being able to ingest documents versus just remembering chats. The idea seems to be that it helps you structure data for the LLM. Can talk to any OpenAI compatible endpoint as a custom provider with a simple way to specify the host endpoint URL (so many things hardcode the URL!). Plus an Ollama specific setting. Has a minimum open model recommended is Mixtral-8x7B

Motorhead (DEPRECATED) - no longer maintained - server to handle chat application memory

Haystack Basic Agent Memory Tool - agent memory for Haystack agents, with both short and long-term memory.

memary - A bit more agent-focused, automatically generates memories from agent interactions. Assumes local models via Ollama.

kernel-memory - a Microsoft experimental research project that has memory as a plugin for other services.

Zep - maintains a temporal knowledge graph of user information to track how facts change over time. Supports using any OpenAI compatible API, with LiteLLM explicitly mentioned as a possible proxy. Has a Community edition and a host Cloud version; the Cloud version supports importing non-chat data.

MemoryScope - Memory database for chatbots. Can use Qwen. Includes memory consolidation and reflection, not just retrieval.

Just write your own:

LangGraph Memory Service - an example template that shows how to implement memory for LangGraph agents.

txtai - while txtai doesn't have an official example of implementing chatbot memory, they have plenty of RAG examples like this one and this one and this one that make me think it would be a viable option.

Langroid has vector storage and source citation.

LangChain memory

Other things:

WilmerAI has assistants with memory.

EMENT: Enhancing Long-Term Episodic Memory in Large Language Models - research project, combining embeddings and entity extraction.
Agent frameworks

Did I miss anything? Anyone had success using these with open models?