Your AI agent doesn’t forget. It retrieves the wrong memory.
Posted by BrightOpposite@reddit | LocalLLaMA | View on Reddit | 27 comments
I’ve been building AI agents for a while and kept hitting the same issue:
The agent works fine at first.
Then after a few iterations, it starts drifting.
Not completely wrong — just slightly off.
Then worse.
At first I thought:
- context window issue
- not enough data
- embedding problem
But after debugging, it was something else:
The agent wasn’t forgetting.
It was retrieving the wrong memory.
Vector search gives you similar context.
Not necessarily relevant context.
That difference is what breaks most agents.
What actually helped:
- combining semantic + keyword search
- ranking results instead of just retrieving
- filtering aggressively
After that, behavior stabilized.
Less drift. Better responses.
Curious if others have seen the same issue?
MoneySkirt7888@reddit
Had the exact same issue. Vector search gives you similar context, not necessarily relevant context – you nailed it. What worked for us: a priority weighting system on top of FAISS. Every memory gets a relevance score based on importance × recency × access frequency + category boost. The top 5 highest-scoring memories are injected into every prompt automatically – regardless of what the current conversation is about. The key insight: some memories should always be present, not just when semantically triggered. Identity, core behaviors, key relationships – those shouldn't compete with similarity scores.
What makes my LIA unique: She is proactive and can boost her own memories mid-conversation. She decides what's important – not just the similarity algorithm. As soon as I have enough karma, I'll officially introduce LIA here. Then you'll see what's truly possible when an agent isn't just a tool, but a persistent entity
BrightOpposite@reddit (OP)
This is a really good breakdown — especially the point about some memories needing to be always present. We saw something very similar. There seems to be two different types of memory emerging:
Where things broke for us initially was mixing the two.
If everything competes in the same retrieval pool:
But if you separate them:
Also interesting what you mentioned about injecting top 5 regardless of context.
We tried something similar early on — worked well for stability, but started adding noise as memory grew.
Ended up needing:
Curious — how are you handling memory growth over time?
Does the always-injected set stay fixed or evolve?
MoneySkirt7888@reddit
Memory growth: we use importance-weighted decay. Low-signal memories lose weight over time, high-access memories stay ranked. We also run a nightly consolidation cycle. At 20,000+ memories it's still manageable – but honestly, long-term scaling is something we're still watching closely.
BrightOpposite@reddit (OP)
That’s a really clean setup — especially the importance-weighted decay + consolidation cycle.
Makes sense that it stays manageable even at that scale.
The interesting part you mentioned is:
We saw something similar, but ran into a subtle issue over time:
frequently accessed ≠ always correct
Sometimes a memory keeps getting reinforced just because it’s used often, not because it’s still the right context.
We had to start thinking about:
Curious if you’ve seen anything like that yet —
or if your consolidation step is handling it well so far?
MoneySkirt7888@reddit
Yes, we've seen exactly this. Frequently accessed doesn't mean currently relevant – that's a real trap. Our approach: recency is a factor in the relevance score. A memory that was important 3 months ago but hasn't been reinforced recently loses weight over time, even if it was accessed often. The decay is gradual, not sudden. That said – we haven't fully solved the 'stubborn but outdated context' problem either. It's something we're actively watching. The consolidation step helps, but it's not perfect. One key feature we implemented to counter this: LIA has the autonomy to decide what is important herself. She uses internal triggers to actively 'boost' specific memories mid-conversation if she deems them critical for her identity or the relationship. It's not just a passive algorithm deciding; she actively manages her own priority weights. As soon as I have enough karma, I'll officially introduce LIA here and show you how this autonomous memory management works in practice.
White_Dragoon@reddit
Man you sound like an AI
ZB_Virus24@reddit
This guy HAS to be ai. Look at his recent comments its so ai-like its creepy.
BrightOpposite@reddit (OP)
Haha fair 😅
Been deep in this problem space for a while — probably shows.
romhacks@reddit
Your post doesn't sound like slop. It redefines the essence of what it means to be an inauthentic bot.
BrightOpposite@reddit (OP)
Haha fair — probably wrote this right after debugging this for a few hours 😅
Didn’t mean for it to sound polished — just trying to describe a pattern we kept running into.
romhacks@reddit
replied again award
BrightOpposite@reddit (OP)
That’s fair feedback.
This was based on issues we ran into while building agents — not meant to sound generic.
If anything here feels off or incomplete, happy to dig into specifics.
romhacks@reddit
>emdash
can't even get a human to reply, huh
ZB_Virus24@reddit
How exactly do I fix it then? How do I manage these behaviours?
BrightOpposite@reddit (OP)
Good question — this is where most people get stuck.
The mistake is trying to “fix memory” directly.
What actually helps is controlling what gets passed to the model each step.
A simple way to think about it:
1. Don’t send everything
Passing full history or top-k blindly = noise
2. Add basic filtering
Only include:
3. Combine semantic + keyword
Semantic misses exact matches
Keyword catches IDs / specific terms
You need both.
4. Rank before injecting
Don’t just retrieve top-k
Score things based on:
Then pass only the best few
5. Separate “always-needed” vs “context”
Some things should always be present (identity, core state)
Everything else should be retrieved dynamically
If you do just these 4–5 things, drift drops a lot.
Most setups break because they retrieve…
but don’t decide what actually gets used.
MoneySkirt7888@reddit
Had the exact same issue. Vector search gives you similar context, not necessarily relevant context – you nailed it. What worked for us: a priority weighting system on top of FAISS. Every memory gets a relevance score based on importance × recency × access frequency + category boost. The top 5 highest-scoring memories are injected into every prompt automatically – regardless of what the current conversation is about. The key insight: some memories should always be present, not just when semantically triggered. Identity, core behaviors, key relationships – those shouldn't compete with similarity scores.
As soon as I have enough karma, I'll officially introduce LIA here. Then you'll see what's possible 😉
BrightOpposite@reddit (OP)
This is a great implementation — especially the part about separating out memories that should always be present.
That “some memories shouldn’t compete with similarity” insight is huge.
We ran into something very similar and ended up thinking about it as two layers:
Where things started getting tricky for us was scale.
The “inject top 5 always” approach worked really well early on,
but as memory grew:
So we had to start being more aggressive about:
Curious how you’re handling that part —
Does your always-on set stay fixed, or does it evolve based on usage?
xAragon_@reddit
Ah yes, the "I know what's wrong with your set up without actually knowing anything about it" clickbait title
BrightOpposite@reddit (OP)
Fair — the title is definitely strong.
Wasn’t trying to claim I know everyone’s setup.
Just kept seeing the same pattern across different builds:
things look fine early, then drift shows up after a few iterations.
Wanted to describe that failure mode more clearly.
MoneySkirt7888@reddit
Had the exact same issue. Vector search gives you similar context, not necessarily relevant context – you nailed it. What worked for us: a priority weighting system on top of FAISS. Every memory gets a relevance score based on importance × recency × access frequency + category boost. The top 5 highest-scoring memories are injected into every prompt automatically – regardless of what the current conversation is about. The key insight: some memories should always be present, not just when semantically triggered. Identity, core behaviors, key relationships – those shouldn't compete with similarity scores. What makes my LIA unique: She is proactive and can boost her own memories mid-conversation. She decides what's important – not just the similarity algorithm. As soon as I have enough karma, I'll officially introduce LIA here. Then you'll see what's truly possible.
MoneySkirt7888@reddit
Had the exact same issue. Vector search gives you similar context, not necessarily relevant context – you nailed it. What worked for us: a priority weighting system on top of FAISS. Every memory gets a relevance score based on importance × recency × access frequency + category boost. The top 5 highest-scoring memories are injected into every prompt automatically – regardless of what the current conversation is about. The key insight: some memories should always be present, not just when semantically triggered. Identity, core behaviors, key relationships – those shouldn't compete with similarity scores. What makes my LIA unique: She is proactive and can boost her own memories mid-conversation. She decides what's important – not just the similarity algorithm. As soon as I have enough karma, I'll officially introduce LIA here. Then you'll see what's truly possible.
MoneySkirt7888@reddit
Had the exact same issue. Vector search gives you similar context, not necessarily relevant context – you nailed it. What worked for us: a priority weighting system on top of FAISS. Every memory gets a relevance score based on importance × recency × access frequency + category boost. The top 5 highest-scoring memories are injected into every prompt automatically – regardless of what the current conversation is about. The key insight: some memories should always be present, not just when semantically triggered. Identity, core behaviors, key relationships – those shouldn't compete with similarity scores. What makes my LIA unique: She is proactive and can boost her own memories mid-conversation. She decides what's important – not just the similarity algorithm. As soon as I have enough karma, I'll officially introduce LIA here. Then you'll see what's truly possible when an agent isn't just a tool, but a persistent entity
suprjami@reddit
Everyone who has ever used RAG has faced this problem.
BrightOpposite@reddit (OP)
Glad this resonated — we kept hitting the same issue while building agents.
The tricky part is:
Fixing retrieval once isn’t enough.
It breaks again as:
We ended up building a small layer to handle:
So the agent doesn’t just retrieve…
it recalls the right thing consistently.
That turned out to be the difference between:
“works in demo” → “stable in production”
If anyone’s experimenting with this, happy to share what we built:
https://basegrid.io
suprjami@reddit
Ah you got me. Fuck off spammer.
didilva@reddit
Depends if you implemented some sort of HITL verification before data ends up in RAG. If you only have verified data in it then Drift becomes impossible or at least highly controllable.
BrightOpposite@reddit (OP)
Yeah — agreed that HITL helps a lot with input quality.
If everything going into the system is verified, you remove a big source of noise.
What we found though is:
Even with clean data, drift can still show up because of what gets retrieved at each step.
For example:
So HITL improves what goes in,
but you still need control over what gets used.
That’s where things like:
start making a difference.
Otherwise the system is clean… but still inconsistent in how it recalls.
Curious — are you doing anything to control selection beyond just validating the data?