One thing we found while building long-horizon agents: context density mattered more than context length
Posted by Ok_Celery_4154@reddit | LocalLLaMA | View on Reddit | 1 comments
We’ve been experimenting with a long-horizon agent setup, and one thing that became increasingly obvious was this:
A lot of failures weren’t coming from insufficient context window size, but from low information density inside the active context.
In other words, even when the model had “enough room,” the decision quality still degraded once too much low-value state, tool history, and irrelevant memory accumulated.
So we started testing a different design approach:
- keep the tool interface minimal
- retrieve memory on demand instead of loading everything
- explicitly convert successful task experience into reusable SOPs/scripts
- compress or trim context aggressively when it stops being decision-relevant
A few things we observed:
- repeated runs of similar tasks became much cheaper over time
- token usage dropped by as much as 89.6% on repeated tasks in our setup
- the system showed a pretty visible cold-start → convergence pattern
- on some harder web tasks, reducing context noise mattered more than adding more structure
My current takeaway is that for agent systems, context management may be a more fundamental bottleneck than raw context length.
Curious whether others here have seen similar behavior:
- with memory-heavy agents
- with tool-using workflows
- or with long web / desktop task chains
If useful, we wrote up the implementation and evaluation details here:
Would be especially interested in pushback on:
- whether “context information density” is actually a useful framing
- how others handle reusable skill formation
- whether repeated-task convergence holds outside narrow task families
FatheredPuma81@reddit
Oh so that's why OpenCode deletes tool calls every time an agent finishes its turn even if you have plenty of Context left. Well other than the obvious of preventing long context rot and improving speed.