Anyone else running fully local persistent agents with a real “living brain” + dreaming cycle? (open source experiment)

Posted by Notforyou23@reddit | LocalLLaMA | View on Reddit | 6 comments

I’ve been deep in the local agent game and keep hitting the same wall: most setups still feel stateless. You restart, context evaporates, and the agent forgets everything you taught it last week. So I spent the last few months building something different — a complete local AI operating system called Home23 that treats the agent’s memory like a living, growing brain:

Drop files/PDFs/notes into a dashboard feeder → gets compiled into structured knowledge (not just chunks)
Continuous cognitive loop + actual dreaming/consolidation phase when idle (prunes noise, finds connections)
The agent can launch its own research runs (11 atomic tools) and auto-ingest the results back into its own brain
Persistent identity layers (SOUL.md, MISSION.md, etc.) so it stays “you” across sessions
Bonus: Evobrew IDE that talks directly to the brain + live pulse dashboard

It runs 100% local (Ollama path is dead simple) or with any provider, no Claude tax required.Repo is here if you want to poke around: https://github.com/notforyou23/home23 I’m not here to shill — I genuinely want to know what other people are doing for long-term agent memory and autonomous research in 2026. Have you tried anything similar (Engram, Mem0 forks, OpenClaw local setups, custom RAG + skills, etc.)?
What would you actually use a persistent, dreaming local agent OS for? Personal life OS? Codebase co-pilot that never forgets? Research companion?Would love to hear your setups or wild ideas — I can’t have all the good ones.(Quick start is literally 3 commands.)

[-]

ahbond@reddit

This resonates. I've been building something similar but coming at it from the HPC/cognitive science angle rather than the personal assistant angle.

My setup (Atlas AI, running on an HP Z840 with 2x Quadro GV100 32GB):

- Freudian cognitive architecture — Superego (safety/ethics), Id (creative generation), and a 4-agent "Divine Council" (Judge/Advocate/Synthesizer/Ethicist) that debates responses via Tree-of-Thought. Each agent is a separate Gemma 4 26B-A4B instance with its own persona, running on llama.cpp.

- Persistent memory — episodic (what happened), semantic (what it knows), procedural (how to do things). Not just RAG chunks — structured memory with consolidation, similar to your dreaming phase. The metacognition subsystem tracks confidence and calibrates itself over time.

- Self-calibrating loop — disagreement between council agents triggers temperature adjustment and knowledge gap detection. If the agents can't agree, the system knows it doesn't know.

- 10 subsystems total — event fabric, LH/RH hemispheres, memory, metacognition, safety gateway, DHT, LLM routing, environment, integration. Built on the AGI-HPC framework (786 tests passing across 6 development sprints).

The dreaming/consolidation piece you mention is interesting — we have a similar concept where the system reviews episodic memory during idle time and promotes recurring patterns to semantic memory. What's your consolidation strategy? Simple frequency-based pruning or something more structured?

For the "what would you use it for" question, mine started as a research companion (I work on AI Safety and Alignment) but evolved into something closer to a cognitive testbed. The real value isn't any single feature, it's that the system accumulates context about your work over weeks and months. My agent knows my codebase, my publication deadlines, my thermal constraints on the GPU cluster, and which Reddit comments led to actual research improvements. ;-)

The statelessness problem you describe is real. Context window compression helps in-session but cross-session memory is the hard part. How are you handling memory conflicts when old knowledge contradicts new information?

Cheers,

Andrew.

[-]

MR_Weiner@reddit

each agent is a separate Gemma instance

Just to clarify, are you running four separate llama instances of Gemma? If so, you may be able to get better performance or run a larger model by running one llama server and just pointing all 4 agents to it and they can run in parallel.