Anyone else building persistent memory for local LLM agents? Here's my approach

Posted by New_Election2109@reddit | LocalLLaMA | View on Reddit | 17 comments

Been hitting the same wall for a while: every new session with an LLM agent starts from zero. You explain your stack, your constraints, your decisions — then open a new chat and do it all again.

Been working on an approach to this — a local daemon called Mnemostroma that sits between you and your agents and builds memory silently in the background.

**How it works:**

- Watches conversation I/O and extracts what actually matters (decisions, constraints, key facts)

- Compresses into structured multi-layer memory — not raw logs

- Surfaces it back via MCP tools when relevant (\~20ms retrieval)

- Forgets low-value noise gradually, keeps important decisions long-term

- Fully offline — SQLite + ONNX INT8, no cloud, no Docker, no torch

**The design choice I keep questioning:**

The agent only *reads* memory — it never writes it. A separate Observer pipeline does all the watching and storing in the background. Feels cleaner and harder to corrupt, but curious if others would want the agent to annotate its own memory directly.

**Current state:** v1.8.1 beta, 400+ tests passing, \~420 MB RAM baseline. Not on PyPI yet.

Works with Claude Desktop, Claude Code, Cursor, Windsurf, Zed — anything that speaks MCP.

Code and install instructions in the repo if anyone wants to poke at it:

https://github.com/GG-QandV/mnemostroma

Curious how others are handling this — stuffing everything into system prompt, RAG over transcripts, something else entirely?