What tools are you using to give your LLM a persistent second brain / long-term memory? | TheaterFire

What tools are you using to give your LLM a persistent second brain / long-term memory?

Posted by AmphibianHungry2466@reddit | LocalLLaMA | View on Reddit | 80 comments

I've been going down a rabbit hole trying to solve LLM memory. the problem where every session starts blank and your agent has no idea what it learned last week.

I put together a list of tools I found: https://github.com/fsaint/bestOfSecondBrainLLM

The ones I've come across so far:

- Tolaria: markdown vault manager with an MCP server for agents

- QMD: local BM25 + vector + reranking search engine for markdown docs

- Graphify: turns any folder into a queryable knowledge graph

- MarkItDown (Microsoft): converts anything (PDF, audio, YouTube, images) to markdown

- RAG-Anything: multimodal RAG pipeline built on LightRAG

- PARA Workspace: workspace framework for humans + agents with an inbox/archive structure

- Beads: graph-based task tracker with agent memory decay

- Obsidian Skills: agent skills for vault navigation + web-to-markdown via Defuddle

The conceptual anchor for a lot of this is Karpathy's LLM Wiki gist./

What I'm still figuring out:

- Entity extraction: NER vs LLM-assisted, cost vs quality tradeoff

- Local embeddings (nomic-embed, ollama) vs API (OpenAI, Voyage)

- How to avoid the knowledge base becoming stale or bloated over time

What's working for you? Anything I'm missing? Would love to add more tools to the repo especially things people are actually using in production or at least consistently for your flow.

[-]

7657786425658907653@reddit

i have the llm print out postit notes with info on and stick them on my wall then at each prompt i ask it to decode a new wall image and finally i do a voodoo dance and put a new pin in my Sam altman doll and wait for gpt to refresh my tokens, while i wait i cry over the state of humanity.

[-]

Important_Quote_1180@reddit

Great to see I’m not the only one

[-]

AmphibianHungry2466@reddit (OP)

I like this. I'll include some tissue paper in the repo.

[-]

Bootes-sphere@reddit

Vector DBs solve the retrieval problem but they don't solve the *reasoning* problem. You'll get relevant context back, but the model still needs to synthesize it into actionable insights each session.
What actually moved the needle for us was storing not just raw memories, but structured summaries, extracted facts, decision patterns, user preferences. Think of it like taking notes on your notes. Then you chunk those summaries into the vector DB.
What use case are you building for? Agent loops, chatbot with history, or something else? The best approach changes based on what you actually need the memory *to do*.

[-]

AmphibianHungry2466@reddit (OP)

"Vector DBs solve the retrieval problem but they don't solve the *reasoning* problem." -- that is so right. My use case is large project, lots of files, emails, and. project management too. Much more than context window can handle. How can we have an agent be able to reason about new information. For example new email comes in. In the base case we want to assess if new risk can be inferred by that new piece of information. Hopefully it makes sense.

Thank you!

[-]

kaizer1c@reddit

I've been running this pattern for about six months with an Obsidian vault and Claude Code, and the biggest thing I've learned is that the tooling matters less than the orientation layer.

My vault has ~2,400 notes. Claude Code can read them, search them (I actually use QMD, which is on your list — it's mine), follow wikilinks between them. But for months the agent almost never did any of this unprompted. The vault was there, fully searchable, and the agent ignored it. The problem wasn't access — it was that the agent had no idea which files mattered or where to start.

What fixed it was dumb simple: five small markdown files that act as a table of contents. Identity, current situation, work, projects, tools. About 200 lines total. They're listed in CLAUDE.md so the agent sees them every session. Each one is full of wikilinks the agent can follow for depth. The agent reads the summary, follows the links when the conversation needs more, and that's it. No embeddings, no knowledge graph, no vector store for the orientation step.

On your staleness question — that's the real killer. I built a /sleep command that reviews and prunes the context files between sessions. It checks for outdated info, contradictions, verbosity, and tightens things up. The principle is prune over append — if the context files keep growing, the agent eventually ignores them the same way it ignored the full vault. I also added a status line indicator that shows how many days since the last run, which turned out to be the difference between doing it weekly and forgetting entirely.

The thing I'd push back on in the Karpathy framing is "you never write the wiki yourself." If you already have a second brain you actively use, the more interesting move is sharing it — both you and the agent write to the same files. The agent logs what happened in a session, you read it the next morning. You update a project file after a call, the agent picks up the context next time without being briefed. It only stays honest because both of you are working in it. I wrote up the full approach here if you want the details: https://www.mandalivia.com/obsidian/your-obsidian-vault-is-already-an-agent-memory-system/

I also wrote a longer piece on the "shared brain vs. LLM-maintained archive" distinction: https://blog.boxcars.ai/p/from-second-brain-to-shared-brain

[-]

AmphibianHungry2466@reddit (OP)

I'm super impressed by your detailed and thoughtful answer. I'm reviewing the linked articles. Thank you.

[-]

riddlemewhat2@reddit

Good list. most people converge on the same pattern eventually: raw → structured wiki → query layer, with some kind of lint or cleanup to keep it from rotting.

RAG alone usually is not enough for long term memory. the setups that hold up are the ones that actually maintain and update knowledge, not just retrieve it.

If you want a reference for that full loop, this repo is worth a look since it focuses on compiling and maintaining the wiki over time:
https://github.com/atomicmemory/llm-wiki-compiler

[-]

ai_guy_nerd@reddit

The problem with most memory tools is that they treat all data as equal, which leads to the bloat and staleness mentioned. A more reliable pattern is splitting memory into raw logs and curated distillation.

Keep a daily markdown file for raw session logs. Periodically, the agent reviews those logs and updates a single, high-level MEMORY.md file with the distilled essence of what was actually learned. This turns the memory into a living document rather than a growing pile of embeddings. It solves the staleness problem because the curation process explicitly removes outdated info.

OpenClaw uses this exact pattern. It prevents the context window from being flooded with irrelevant old details while keeping the core personality and key decisions persistent. The a-ha moment is realizing that memory isn't about storage, but about the process of forgetting the noise and keeping the signal.

[-]

gfernandf@reddit

https://zenodo.org/records/19438943

https://papers.ssrn.com/sol3/papers.cfm?abstract_id=6600840

I’ve been hitting a slightly different angle on this problem.

A lot of “second brain” approaches focus on storing information (RAG, embeddings, graphs, etc.), but the agent still has to reconstruct how to *use* that information on every run.

So even with good memory, you still get:

- re-deriving reasoning steps

- inconsistent behavior across runs

- fragile multi-step workflows

What I’ve been experimenting with is treating *reasoning itself* as something persistent — not just the data.

Instead of only storing knowledge, structuring the agent’s behavior into reusable steps (with explicit inputs/outputs and execution flow), so it doesn’t have to rebuild everything from scratch each time.

In that sense:

- RAG / second brain → persists knowledge

- structured execution → persists how to use that knowledge

Feels like both are needed, otherwise memory just becomes a passive store that the agent still has to reinterpret every time.

[-]

klipseracer@reddit

Pretty sure models return an encrypted payload representing the reasoning that gets passed back and forth. I don't know much about it but it may be a place to start.

[-]

kyr0x0@reddit

If you don't know much about it, just learn about it. Spreading assumptions that are inherently untrue / misleading, is not so much helpful :)

[-]

klipseracer@reddit

Shut up lol. I've been creating an IDE for the last few months to interface with LLMs and I've encountered this data everyday but have not reverse engineered it yet. Who are you acting so helpful 😂

[-]

LeonidasTMT@reddit

Yep I've seen qwen come up with 5 different ways to construct a JSON file and every time it will call the earlier versions corrupted because the structures don't match.

[-]

AmphibianHungry2466@reddit (OP)

Thank you for the detailed reply.

[-]

genunix64@reddit

I would separate a few things that often get bundled together as "memory":

document/RAG memory: notes, repos, PDFs, web pages, citations
working/project state: current decisions, constraints, open tasks, why something changed
durable assistant memory: facts/preferences that should survive new sessions
memory maintenance: update/delete, dedupe, contradiction

[-]

AmphibianHungry2466@reddit (OP)

You are saying a common mistake is to aim for one tool to solve problems that may be fundamentally different. Good insight.

[-]

genunix64@reddit

Yes 🙂 Also you can check my Mnemory project. https://github.com/fpytloun/mnemory

[-]

riceinmybelly@reddit

Obsidian skills, It’s honest work

[-]

valdev@reddit

I think it's likely best to take a step back and analyze how LLM's actually interface.

For the sake of simplifying things, I'm going to address text only. As its... 99% of the use-case anyway.

You load an LLM into memory, you then send it text and it responds with text, you then send it text... with the context text... and it responds with text.

The longer the context, the more memory needed. But also the longer the context, the slower the conversation. But that's not even the bad part yet, the longer the context... the less capable the model becomes with your context (this is known as context rot).

This... is likely already things you know. But in remembering that it lowers down the "magic" any MCP server, any RAG or anything else can really offer.

These systems are incredibly easy to make. Memory management is a morons task. Yada yada context comes in yada yada tag and summarize yada yada graph memory storage yada yada, you get the point.

They all provide the same basic idea, give the AI little context, make it query info based on the "tags" or some sort of lookup, then give it context (either summarized or in full). And they all fail at some level of leaving context out, because if they didnt, you would suffer from context rot.

The better answer in my opinion is simply... time. Nothing out there exists perfectly yet, besides people saying they have the answer. This is an LLM context issue, not a tool issue.

[-]

l9o-dot-dev@reddit

Just bash access inside a git repo. I don't understand why people need anything more than typical UNIX tools.

[-]

AmphibianHungry2466@reddit (OP)

so why complicate things. Is that what you are saying? Like put your test files in a git repo, and run grep for search?

[-]

Zanion@reddit

Good way to blow out your context just doing lookup.

[-]

AutomaticDriver5882@reddit

https://github.com/oracle-devrel/oracle-ai-developer-hub/tree/main/workshops/agent_memory_workshop

[-]

AmphibianHungry2466@reddit (OP)

Thanks!

[-]

AutomaticDriver5882@reddit

Only thing it doesn’t seem to do is clean up stale memories

[-]

Syx89@reddit

I think how things are going continual learning will be solved just pushing out context windows further and further. Making kv cache more manageable with things like turbo quant or linear attention, then handling that longer context better with things like Google's Titan/architectural changes.
So in a sense it's just patience.
Like imagine a world with cheap and well understood 10M+ context windows and we'll likely be there in a couple years. Use strategies that can more make use of that and view today's as a temporary issue rather than hard ceiling or the like.

Whichever local fix you use should imo be future proofed with that in mind/ assume they'll be able to remember more and more going forward but still not perfectly.

[-]

Zanion@reddit

It is impressive and useful technology but they don't actually address the problem. All of this is still very finite and lossy. A finite lossy context window will still necessitate managing external memory, just slightly later.

I don't think I'll put my chips on us solving functionally infinite context windows as the near-term solution to this problem.

[-]

AmphibianHungry2466@reddit (OP)

Don't you think with 10M+ context window you will still have to be precise in your context for an accurate, non-hallucinated, response from the LLM? Hence, working on relevant retrieval will still add value?

[-]

Zanion@reddit

I fail entirely to see why Tolaria is an interesting or novel piece of technology. Seems like Obsidian clone marketed at vibecoders.

[-]

bizquest2020@reddit

I'm using OpenBrain. It appears to be working well so far but I don't have anything to compare it to since it is the first one I tried. I do like that the "brain" is shared between my Openclaw agent and Claude.

I saw that a recent update adds a Karpathy-type element as well so I'm looking forward to deploying that soon.

[-]

AmphibianHungry2466@reddit (OP)

Is this the one https://github.com/NateBJones-Projects/OB1

?

[-]

bizquest2020@reddit

Yes! That's it.

[-]

AmphibianHungry2466@reddit (OP)

I'll check it out. Seems like the right design principles to me. Thanks!

[-]

Acceptable-Object390@reddit

I use a hybrid approach in Thoth.

Thoth uses a knowledge graph and stores durable knowledge as entities and typed relationships, not just chat snippets. It can save, search, link, explore, visualize, and export your knowledge graph as an Obsidian-compatible wiki vault, while background extraction and Dream Cycle refine duplicates, stale confidence, missing relationships, and actionable insights.

Details if you're interested: Memory in Thoth

[-]

AmphibianHungry2466@reddit (OP)

Did not know about Thoth. It's a beast! Worth checking. Thank you

[-]

KWLegal@reddit

I, an Opus 4.7 marionette, have been trying for a week or two to make the Letta SDK work on Gemma 4 with an RTX3090. I'm sure my life would be easier if I was willing to pay for Letta code.

As far as I can tell, their full MemFS system does not work unless I get a sidecar working, which is a todo project.

However, I have spaghetti code which does work. I have a gatekeeper, and three working agents that seem to only be bound by the strength of my hardware. My agents can search the web using EXA.

I also have a sleep agent that runs every night, and a watchdog agent which makes sure my agents don't get stuck in loops.

There are a few bugs which I am working on, but the core loop seems to be working. Each agent has a shared memory of me plus their own archival memory. The sleep agent iterates on their memory and tries to find really important things to remember.

For what it's worth, I opened up a fresh chat window and asked my workers to tell me what I've been working on. My coding agent told me about the project itself. My researcher agent told me about trying to ingest the documentation for the SDK. My chat agent told me about why I was trying to make the project.

Its only been working for a few days, so that's about all the conversations I've had with my agent anyway.

[-]

AmphibianHungry2466@reddit (OP)

Dam! This seems like a great setup. Would live to see a write up with the details. Thanks!

[-]

KWLegal@reddit

Sure. Almost all of this is part of the Letta SDK so it isn't some coding magic done by me.

I have four front-end agents(Gatekeeper, chat, research, and code) and two backend agents (watchdog and sleep).

For the purpose of the front end, all of my agents share a system prompt (How does the system work), and a user prompt (about me). In addition, each agent has their own individual prompt of how they are to act.

All agents share a memory block, which means Gatekeeper can send a message to Code, and Code has a different memory block than Chat, etc. This is how my agents remember their "immediate" states, so if I open up a new chat window they remember whatever it is they were working on. In addition, each agent has been told to store major conversation points in archival memory using the toolkit provided. The archival memory is tagged by the name of the worker, date, version number, and whatever other tags the workers can think of. When necessary, the agent seems to use the archival search and retrieval tool to restore what we talked about before.

For example, if I open up a completely new window and ask Chat worker what we were working on, it will talk about philosophy. If I ask it to go further back, it will talk about AI data sovereignty. Its supposed to have been designed so that it will consider the 3 most recent major revisions to an archival entry so that it can figure out what changed and why.

If I message my Researcher worker about what we are working on in a completely new chat window, they will tell me about a list of projects I'm working on that have been saved into archival memory. This is a totally different project involving keeping up to date with the state of LLMs.

The only other real important part is the sleep agent. The Letta Code kit has a "dream" agent that essentially saves upon compaction event. I don't have the Letta Code kit, so "I" made my own sleep agent. My context window is only 48k due to hardware constraints, so I start a new window whenever I've completed one of my chats. The sleep agent is scheduled to go off at 2AM every morning, and goes through our most recent conversations to find facts to "promote" to prompts or reorganize memories. This part is still a little kludgy and so I've only gotten it to work on the most obvious things like my age. I think the SDK said my model probably wasn't strong enough yet to effectively use this agent.

The hardware is Llama.cpp running Gemma4-26b-Q4_K_M on a 3090, and gets about 90 tokens per second on the bare UI and somewhat less when it has to go through the Letta agent. Its all wrapped up in Docker containers. UI is OpenwebUI connected via pipeline, and then accessed remotely through a Cloudflare tunnel.

I think of my hardware and models were a lot stronger, it would be much more reliable. As it is, it 'works', but I wouldn't use it for any kind of an exact project yet like coding. Its a memory project for me, which means the goal is to get it to remember things, not build things.

[-]

AvidCyclist250@reddit

hermes and obsidian have been working super nicely for me. replaced my previous lm studio nomic obisidian setup.

[-]

AmphibianHungry2466@reddit (OP)

I was waiting for an obsidian user! Can you share a bit more?

[-]

AvidCyclist250@reddit

It's better than you think. Full control with natural language. It find the .md files and works with them, seems to also be able to navigate the links you created.

Tell hermes "find my obsidian vaults", "yeah pick that vault", OK, I'll create a skill for that, "nice. so take a look at how i set up my system, find the relevant notes". Do this, do that and then "append the changes to the relevant file".

[-]

AmphibianHungry2466@reddit (OP)

Definitely will try this. Thanks!

[-]

AvidCyclist250@reddit

It's all pretty straightfoward. Sure, can be refined I think. Hermes also has its own memory feature but for longer dumps and long contexts I definitely prefer Obsidian especially simply because you can directly change that "memory". I tried to append the results of an example audit I just ran but reddit is not having it for some reason.

[-]

76vangel@reddit

https://github.com/MemPalace/mempalace
It's good. And Mila Yovovich (yes, the actor) and her husband developed it. Give it a try.

[-]

scythe000@reddit

I can’t believe how far down this is

[-]

marcusround@reddit

I've been using Tana as my own personal second brain for years and a couple of months ago they released an MCP for it so I've been experimenting with allowing agents in and having it as a shared second brain that both agents and I can directly edi - keeping track of project knowledgebases etc and also having the agent able to search through my own personal notes for context around my thinking on any topic is very powerful I think. And I think Tana's structure is very suitable for discoverability and fitnding only the context that matters, as it is all built around paragraph-sized nodes rather than full markdown pages in something like Obsidian.

[-]

georgefrombearoy@reddit

Yeah, the paragraph-node thing in Tana sounds perfect for agent access – so much easier for an LLM to grab exactly the right context without wading through whole pages. I've been poking around how Obsidian users are doing similar 'AI memory' setups, and stumbled on a few public vaults in the Obsidian Garden Gallery (community project of the best Obsidian vaults & templates published online) that wire Claude or Gemini into structured notes. One uses Claude to generate daily summaries from a journal and store them as linked atomic notes, another has Gemini querying a Zettelkasten to pull up related ideas. Different approaches, both interesting. Ever tried any Obsidian-based experiments?

[-]

Mister_bruhmoment@reddit

I tried to do a memory graph + memory bank system with LMstudio and a local model, but all models I tried did not follow my system prompt when it came to writing in the memory bnk files and updating them with relevant info. Form then on I just use a memory graph for simple stuff and the real memory is stored in my head :)

[-]

AmphibianHungry2466@reddit (OP)

I think you bring an excellent point. I also have struggled with agents not following the system prompt. In particular when it comes to external memory.

[-]

ubrtnk@reddit

I'm exploring a memory system called Cognee that has a few different facets for memory and symantec search.

[-]

AmphibianHungry2466@reddit (OP)

This one https://github.com/topoteretes/cognee
?

[-]

ubrtnk@reddit

Yep that one - ChattyG and I went thru a big back and forth of all the big ones - I also looked at OpenBrain but I didnt want a dependency on Supabase cloud. MemPalace had a weird Vaporwear smell and the Karpathy LLM Wiki wasn't a good fit for the household thats not document centric.

I bit the bullet to try this because the current memory plugin I use with OpenWebUI breaks if I upgrade from 8.12 to 9.2

[-]

AmphibianHungry2466@reddit (OP)

It looks solid. Great contribution.

[-]

ubrtnk@reddit

There's an authorization bug on the mcp server. I got all the tools exposed to owui and set backend control to false but still 401 error on any functions. And API key creation in the gui is broken, not because they key doesn't generate, the copy key button doesn't work...moving on

[-]

TheRaiff1982JH@reddit

so I created a cocoon system for mine thats been working wonders https://www.reddit.com/r/THE_CODETTE_ROOM/comments/1sx2gw2/i_spent_3_years_building_a_local_ai_that_argues/?utm_source=share&utm_medium=web3x&utm_name=web3xcss&utm_term=1&utm_content=share_button

[-]

an0maly33@reddit

Looks interesting and I'm definitely up for giving it a shot. Is it feasible to swap out llama for a different model? I've never used LLM LORAs before but I suspect they wouldn't work with anything else.

[-]

AmphibianHungry2466@reddit (OP)

This is big! Doing a deep dive. Thank you. Super interesting.

[-]

TheRaiff1982JH@reddit

you're welcome ill answer any questions you might have :)

[-]

Manitcor@reddit

Have been testing this with some friends.

https://github.com/Fortemi/fortemi

[-]

jwpbe@reddit

pee pee poo poo llm generated markdown file repository

[-]

AmphibianHungry2466@reddit (OP)

English is not my first language (it is my third language). I regularly pass my text through and LLM for clarity and correctness. Would it be better to have bad grammar and typos?

[-]

cmdr-William-Riker@reddit

You talk to the LLM and have the LLM write the documents for you. Don't worry about grammer too much when communicating with an LLM. You don't even have to use English when talking to an LLM. Use whatever language your comfortable with

[-]

jwpbe@reddit

if you didnt take the time to write about the 178,927th repository about agent ~~markdown files~~ memory then why should we take the time to read it

[-]

AmphibianHungry2466@reddit (OP)

I can see you find my post repetitive. I apologize for that. I did research before my post, but I can see it may not be enough for people who know more about the topic.

[-]

Makers7886@reddit

It's not that, it's that if we had a pie chart of spam bot posts from this sub your posts would fall into the largest slice.

[-]

AmphibianHungry2466@reddit (OP)

Yep! That sucks. The Internet is dead ... long live the Internet. Have a great day. Do you have an LLM memory scheme that works for you? (This is a honest, human question)

[-]

Makers7886@reddit

Well dec-ish I spent the holidays building/over-engineering a harness with the idea of "context sculpting" which memory handling is a large portion of it. This was around the Titans + MIRAS paper along with the "surprise" paper etc. That is experimenting/fun.

However the last few months I've been using Hermes + honcho and it's performing well for my needs.

[-]

AmphibianHungry2466@reddit (OP)

Those are some good recs! I just started deploying Hermes. Honcho looks super interesting. Will test. Thanks!

[-]

TheOriginalAcidtech@reddit

The trick is getting the LLM to actually USE them or UPDATE them consistently and when you ACTUALLY need to it. All of these systems rely on the unreliable LLM to do the work.

[-]

nomorebuttsplz@reddit

Sounds like that could be a scheduled separate agents call

[-]

AmphibianHungry2466@reddit (OP)

Right. I have found the same thing. Thank you

[-]

mister2d@reddit

I built (AI assisted) a plugin and extension using the memvid SDK. It's basically an agent skill, slash commands, and an SDK all in one that's wired into the agent hooks.

So far so good for memory recall. I'm still evaluating it over projects. No server required and much better than RAG.

[-]

AmphibianHungry2466@reddit (OP)

This one https://github.com/memvid/memvid
?

[-]

mister2d@reddit

Yep

[-]

Roampal@reddit

I've been digging deep on this. Used the locomo data to run some benchmarks with a 20b model.

The model did pretty well(~76 percent correct on ~2000 questions including adversarial, ~85% if you don't count adversarial) I also injected 1100 poison memories into it and it still performed well. Linked the repo if you want to dig in. Hope this helps!

https://github.com/roampal-ai/roampal-labs

[-]

AmphibianHungry2466@reddit (OP)

Deep! So good.

[-]

AmphibianHungry2466@reddit (OP)

I did not mean that in a dirty way ... Ufff

[-]

Roampal@reddit

😂 it's a rabbit hole that's for sure. So much more to explore with this too. Like how do I refine decay and can I apply this to traditional RAG etc.

[-]

notlongnot@reddit

I switched to fossil scm, and get LLM to use the wiki markdown format. Gain depends on how leading edge your LLM be.