I reverse-engineered Claude Desktop's storage to build a local memory layer (no API, 100% offline)

Posted by foufouadi@reddit | LocalLLaMA | View on Reddit | 11 comments

Hey r/LocalLLaMA,

Claude Desktop has no memory API. So I reverse-engineered its local storage.

Getting to the conversation data required cracking several layers:

- `FF 11 02` header → Snappy-compressed IDB blob (from `idb_value_wrapping.cc`)

- 15-byte Blink metadata prefix to strip

- Custom V8 deserializer in C# (Node's `v8.deserialize()` chokes on Blink host objects)

- Then I discovered the HTTP cache (zstd `f_*` files) was actually much cleaner for real-time interception.

The result is Mnemos — a local MCP server that:

- Watches Cache/Cache_Data in real-time (FileSystemWatcher + zstd)

- Syncs history from IndexedDB blobs (Snappy + V8 deserialization)

- Vectorizes everything locally with MiniLM-L6-v2 via ONNX

- Exposes hybrid search (BM25 + cosine, merged with RRF) back to the LLM

Because it's a standard MCP server, you can hook up your own local LLMs to your entire Claude chat history too.

100% offline. Nothing leaves your machine. Full reverse engineering writeup in the repo.

I went with hybrid search (BM25 + cosine + RRF) for retrieval. For those building local memory layers, what chunking or retrieval strategies are you finding most effective for raw conversational logs?