Lethe: local markdown memory for Claude Code, DuckDB per project, no server
Posted by Technical_Gur_3858@reddit | LocalLLaMA | View on Reddit | 5 comments
Memory store for coding agents that lives entirely in a .lethe/ directory per project. Markdown files for the source of truth, DuckDB for the index, BM25 + dense + cross-encoder rerank for retrieval. Cross-project search via DuckDB ATTACH instead of a central store. Ships as a Claude Code plugin (writes session summaries via hooks, retrieves via a memory-recall skill) and also works as a CLI and Python library.
A learned per-cluster suppression layer on top of the hybrid pipeline adds a statistically significant +0.017 NDCG@10 on LongMemEval's full benchmark. The arXiv draft in the repo includes a second-dataset replication on NFCorpus where the mechanism does not transfer, so it's scoped to long-term conversational memory specifically.
Link in comments.
sebakubisz@reddit
how much latency does the cross-encoder rerank add on top of BM25 + dense? I wonder what the tradeoff looks like in practice
Technical_Gur_3858@reddit (OP)
BM25 + dense retrieval + query encode: \~45 ms. Cross-encoder rerank over the top-30 candidate pool: \~25 ms. Adaptive deep pass (200 candidates, triggered when cross-encoder confidence on the shallow pass is below threshold): \~150 ms.
So \~25 ms extra per query in the common case, \~150 ms on the rarer deep pass. That 25 ms buys a +63% NDCG jump over bi-encoder-only on LongMemEval, so the per-ms ROI is strong. Cold start is a separate cost (\~700 ms for first call to load ONNX weights) but amortizes to zero if you keep the process alive, which is the common case for hooks or a long-lived CLI.
paulqq@reddit
Well chosen name. The lethe, i love it
conockrad@reddit
That’s very interesting - thanks for sharing!
Technical_Gur_3858@reddit (OP)
https://github.com/teimurjan/lethe