Sandboxed agent runs for GitHub repos with replayable video output

Posted by delecioushelix@reddit | programming | View on Reddit | 2 comments

I’ve been experimenting with a workflow for making coding-agent runs more observable.

Instead of asking an agent to summarize a repo, the system runs the agent in a sandbox against a GitHub repo and records the actual terminal/browser session. The result is a replayable video of what happened: setup, failures, retries, browser state, and final output.

The motivation is that text summaries from agents hide a lot. For repo evaluation, the path matters as much as the final answer.

High-level flow:

GitHub repo → sandbox → agent run → terminal/browser recording → processed replay

Demo: https://www.trymyrepo.com

Architecture notes: https://www.trymyrepo.com/how-it-works

Planning to open source it next week. Curious if people here think this kind of “visual evidence” is useful for agent workflows, or if logs/traces are enough.

[-]

programming-ModTeam@reddit

r/programming is not a place to post your project, get feedback, ask for help, or promote your startup.

Technical write-ups on what makes a project technically challenging, interesting, or educational are allowed and encouraged, but just a link to a GitHub page or a list of features is not allowed.

The technical write-up must be the focus of the post, not just a tickbox-checking exercise to get us to allow it. This is a technical subreddit.

We don't care what you built, we care how you build it.

fiskfisk@reddit

We don't need LLM generated spam. Sod off.