Lethe: local markdown memory for Claude Code, DuckDB per project, no server

Posted by Technical_Gur_3858@reddit | LocalLLaMA | View on Reddit | 5 comments

Memory store for coding agents that lives entirely in a .lethe/ directory per project. Markdown files for the source of truth, DuckDB for the index, BM25 + dense + cross-encoder rerank for retrieval. Cross-project search via DuckDB ATTACH instead of a central store. Ships as a Claude Code plugin (writes session summaries via hooks, retrieves via a memory-recall skill) and also works as a CLI and Python library.

A learned per-cluster suppression layer on top of the hybrid pipeline adds a statistically significant +0.017 NDCG@10 on LongMemEval's full benchmark. The arXiv draft in the repo includes a second-dataset replication on NFCorpus where the mechanism does not transfer, so it's scoped to long-term conversational memory specifically.

Link in comments.

[-]

sebakubisz@reddit

how much latency does the cross-encoder rerank add on top of BM25 + dense? I wonder what the tradeoff looks like in practice

[-]

Technical_Gur_3858@reddit (OP)

BM25 + dense retrieval + query encode: \~45 ms. Cross-encoder rerank over the top-30 candidate pool: \~25 ms. Adaptive deep pass (200 candidates, triggered when cross-encoder confidence on the shallow pass is below threshold): \~150 ms.

So \~25 ms extra per query in the common case, \~150 ms on the rarer deep pass. That 25 ms buys a +63% NDCG jump over bi-encoder-only on LongMemEval, so the per-ms ROI is strong. Cold start is a separate cost (\~700 ms for first call to load ONNX weights) but amortizes to zero if you keep the process alive, which is the common case for hooks or a long-lived CLI.

[-]

paulqq@reddit

Well chosen name. The lethe, i love it

[-]

conockrad@reddit

That’s very interesting - thanks for sharing!

[-]

Technical_Gur_3858@reddit (OP)

https://github.com/teimurjan/lethe