Built a local-first AI memory system that indexes screen activity, meetings, and voice notes ( MCP + automations)

Posted by Top_Speaker_7785@reddit | LocalLLaMA | View on Reddit | 11 comments

Been experimenting with an idea — what if your AI assistant actually remembered everything you did on your computer? Not stateless chats, but real persistent context. So I built ScreenMind. It continuously captures your screen (using perceptual hashing so it only triggers when content actually changes), runs each frame through Gemma 4 E2B via llama.cpp, and builds a searchable timeline of your day. You can:

Honestly still figuring out the agent/automation side — right now it's more workflow-driven than truly autonomous, trying not to oversell it. The retrieval quality and onboarding friction also need work. But the core idea I keep coming back to is that local AI gets way more useful once it has real context about what you're actually doing — your screen, your conversations, your patterns — instead of starting from zero every time.

Would love feedback, especially on inference optimization ideas. The E2B model handles everything right now — vision analysis, chat, audio — so GPU scheduling between those tasks has been the main challenge.

GitHub: https://github.com/ayushh0110/ScreenMind
Demo: https://youtu.be/CxkkBT_EvPw