Running AI agents in sandboxes vs. isolated VMs with full desktops what's your setup?

Posted by Different-Degree-761@reddit | LocalLLaMA | View on Reddit | 7 comments

I've been experimenting with different ways to give AI agents access to a real computer (not just code execution) and wanted to share what I've found.

The problem: Most agent sandboxes (E2B, containers, etc.) work fine for running Python scripts, but they break down when your agent needs to:

Open and navigate a browser
Use GUI applications
Persist files and state across sessions
Install system-level packages

What actually works: Giving the agent a full Linux desktop inside an isolated VM. It gets a real OS, a screen, a file system, persistence and the isolation means it can't touch anything outside its own workspace.

Three approaches I've looked at:

DIY with QEMU/KVM Full control, but you own all the infra (image management, VNC, networking, cleanup)
Cloud VMs (EC2/GCE) Isolation out of the box, but slow to provision and no built-in screen capture for Computer Use
Purpose-built platforms Sub-second provisioning, native Computer Use API, persistent workspaces

For those running agents that need more than code execution what's your isolation setup? Anyone else moved from sandboxes to full VMs?

[-]

aniketmaurya@reddit

curious what kind of application do you automate with full-desktop use?

Deep_Ad1959@reddit

there's a middle ground between full VM isolation and raw container access that most people skip: using the OS accessibility APIs directly from the host without giving the agent pixel-level screen access. on both windows and mac, you can query the entire UI tree of any running application programmatically, get every button label, text field value, and menu item, then perform targeted clicks through the same API layer. the agent never needs a "screen" at all, so there's no VNC overhead, no screenshot round-trips, and the blast radius stays small because you can scope which apps the agent can see and interact with.

ai_guy_nerd@reddit

Full VMs for Computer Use is the right call if you need real persistence and stateful interactions. The sequential execution problem you mentioned (relay race → Amdahl's Law) is real — I've hit it with multi-GPU setups too.

One thing worth testing: container-based approach with volume mounts plus host access via socket binding. You get most of the isolation benefits without VM provisioning overhead, and agents can still interact with the host desktop via local sockets. Not a perfect fit for everyone, but the latency is way better than VM snapshot/restore cycles.

The purpose-built platforms (like the ones Anthropic documented for Computer Use) handle the screen capture plus isolation combo elegantly. If you need that level of production polish, they're worth the cost. For experimentation though, QEMU plus VNC plus a simple agent loop works fine if you can stomach the provisioning.

What's your primary blocker right now — the VM provisioning latency, or agent state management across runs?

CommonPurpose1969@reddit

As far as I know, Claude Code and Codex use bubblewrap. If you use Linux, you can decide how much of your desktop you want to share.

draconisx4@reddit

I've found isolated VMs are solid for keeping agents in check, especially for runtime oversight and preventing unintended system access, which cuts down on governance risks. It's a bit of a setup grind, but monitoring tools make it way easier to track what your agents are up to.

Chupa-Skrull@reddit

The problem: Most agent sandboxes (E2B, containers, etc.) work fine for running Python scripts, but they break down when your agent needs to: [redacted]

I mean, if you configure them that way, sure.

If you're working on Linux you can give a podman container access to your Wayland session, dbus, mount your projects directory from outside so they can work on code without seeing anything else about your system (you of course want to back it up regularly), whatever you want.

If you set it up that way, it feels basically native, but gives you a nice blast radius containment layer should things go crazy. It has the nice bonus of third party providers never learn anything meaningful about your host system or personal files, though I imagine that's less of a concern for this sub.

Is it necessary compared to just running a VM? Of course not, but it was fun to set up!

Different-Degree-761@reddit (OP)

Wrote up the full comparison here: https://lebureau.talentai.fr/blog/run-ai-agent-isolated-vm