why do agents still fail in multi-step workflows even when each step works fine?

Posted by weoraage@reddit | LocalLLaMA | View on Reddit | 13 comments

testing a few agent setups lately and sth keeps bothering me. individually, each step usually works. calling tools, generating outputs, even simple reasoning. but once you chain them into a real workflow, things start breaking in weird ways. it either loses track halfway, doesn’t recover from a small failure, or just stops without finishing the task

it feels like the problem isn’t capability anymore, but consistency across steps. like there’s no real notion of finishing the job, just executing pieces of it. curious if others here have found a setup that actually handles multi-step workflows reliably, esp when something goes wrong mid-way