Looking for a reliable browser use agent that handles most daily tasks.
Posted by TheReedemer69@reddit | LocalLLaMA | View on Reddit | 18 comments
I am open to any option whether it's local or service based.
For online services I tried
- Chatgpt agent : it's almost the worst option ever. way too slow, stupid, limited, and gets blocked on most sites.
- Manus agent: it's capable and versatile but its cost is simply unsustainable and even then still manages to be locked by a lot of sites (since bot detection and data center IP)
- Perplexity computer: it's almost capable of achieving any task but it's cost prohibitive.
- Perplexity Comet browser: it's the most balanced option so far. uses your own browser so it avoids almost all bot detection, reliably capable of navigating most sites. but the only problem is on pro account you hit ur account limits really quick.
- qwen2.5:3b-instruct locally via ollama + playwright mcp via CDP (Chrome DevTools Protocol). my pc can't handle any larger models so this was the only one I was able to use locally. other than being slow it got stuck all the time doing the simplest of tasks. so it wasn't usable at all.
- Gemini 3.1 Flash-Lite + the same setup as qwen. it's a little bit better but still not good enough.
type of tasks I usually tend to do revolve around job applications, simple automation like go to login protected site x and fetch x data, use my account to make x post follow x, solve x assignment for me and report the results, and even heavy troubleshooting/api discovery...etc
BuffMcBigHuge@reddit
I've had success using agent-browser with cdp.
You can launch your own chrome with a custom profile, install any extensions you wish, authenticate with any website you want, and have your agent point to the cdp server. Then your agent controls the browser itself, you can watch it perform actions, and help it when it needs to, instead of going full headless.
Agent-Browser helps with reducing token usage and DOM manipulation.
TheReedemer69@reddit (OP)
seems good but any idea how it compares with
https://github.com/browser-use/browser-use
Oren_Lester@reddit
agent-browser is faster and consume less tokens in its browser loops / snapshots compared to browser-use.
take a look at caphub.io for flights, search, Reddit, jobs, news, etc, it gives the agent clean structured results instead of making it grind through browser steps. That can mean \~10x faster runs and \~3x lower token spend.
TheReedemer69@reddit (OP)
sounds useless for my general use case buddy.
Forward_Compute001@reddit
I would wan to know the same
the_omicron@reddit
Why not use Hermes instead? It uses Camofox.
TheReedemer69@reddit (OP)
pls elaborate
the_omicron@reddit
Hermes is an agent, it is like the hybrid of Claude Code and OpenClaw. It uses Camofox for browser so it wouldn't get walled by websites that hates bots. It never failed on me.
__JockY__@reddit
Google “Hermes” and “camofox”. Or if that’s too much work, let me google it for you.
Here you go: https://hermes-agent.nousresearch.com/docs/user-guide/features/browser
You’re welcome.
TheReedemer69@reddit (OP)
I was asking about his personal experience with it.
__JockY__@reddit
Maybe inside your head that’s what you were feeling, but the words you expressed on the outside did not match that at all.
You asked for elaboration and got it.
If you wanted personal experience you should have asked for it.
My wife communicates in the same way and 20 years later she still forgets that I don’t possess in my head all the context that she has in her head. She gets frustrated when she feels like I’m not understanding her requests but really she just said “pls elaborate” when what she meant was “tell me how you feel about using Hermes and Camofox, and what did you find it brings that the other integrations do not, perhaps with a focus on my use case that I haven’t explained but you should know anyway.”
As I tell my kids: if you don’t ask for what you want, you won’t get what you want.
TheReedemer69@reddit (OP)
I appreciate the feedback tho.
Forward_Compute001@reddit
I just imagine how we people will evolve with ai just understanding our context better than humans do
TheReedemer69@reddit (OP)
r/thatescalatedquickly
cstocks@reddit
Most browser agents I've tried fall apart on anything beyond simple single-page tasks. The ones that work best give the LLM structured tools (click by selector, fill form, run JS) rather than relying on vision. I've been running an open-source MCP server I built — it connects any MCP-compatible model to Playwright for real browser control, and the key differentiator is parallel sessions. You can have the agent controlling multiple browser tabs simultaneously, which is huge for daily tasks that involve multiple sites. Check it out if interested: https://github.com/ItayRosen/parallel-browser-mcp Works locally with Playwright or with cloud browsers for heavier workloads
opentabs-dev@reddit
the comet insight is the right one — using your own browser to sidestep bot detection and auth is the key. theres an open source mcp server called opentabs that formalizes this into a proper tool layer. chrome extension routes structured tool calls through your existing logged-in tabs, so instead of screenshot-and-click it calls the app's own internal apis directly through your session. for "login protected site x fetch x data" or "make x post follow x" it just works because you're already authenticated — zero bot detection overhead, actual json back not screen pixels. works with claude code, cursor, or any mcp client. needs a capable model (not 3b unfortunately), but if you're ok with claude code or similar it covers ~100 web apps including reddit, x, github, etc: https://github.com/opentabs-dev/opentabs
setec404@reddit
try the claude browser plugin
pmv143@reddit
yeah this is pretty much the tradeoff everywhere right now. You either go local and get limited by hardware, or go cloud and deal with cost, limits, or weird reliability issues
have you tried stuff like runpod or modal or? they’re decent for experimenting, but you’ll still run into cold starts / scaling quirks
You can also try inferx, mainly because you get access to a pretty large catalog of models and can switch between them easily without setting everything up each time makes it way easier to experiment with different models for agent workflows instead of being stuck with one setup