Sandboxed agent runs for GitHub repos with replayable video output

Posted by delecioushelix@reddit | programming | View on Reddit | 2 comments

I’ve been experimenting with a workflow for making coding-agent runs more observable.

Instead of asking an agent to summarize a repo, the system runs the agent in a sandbox against a GitHub repo and records the actual terminal/browser session. The result is a replayable video of what happened: setup, failures, retries, browser state, and final output.

The motivation is that text summaries from agents hide a lot. For repo evaluation, the path matters as much as the final answer.

High-level flow:

GitHub repo → sandbox → agent run → terminal/browser recording → processed replay

Demo: https://www.trymyrepo.com

Architecture notes: https://www.trymyrepo.com/how-it-works

Planning to open source it next week. Curious if people here think this kind of “visual evidence” is useful for agent workflows, or if logs/traces are enough.