Browser-use alternatives
Posted by Comfortable-Baby-719@reddit | LocalLLaMA | View on Reddit | 15 comments
I'm not sure how many people know about browser-use but we have an app powered by browser-use and it's working pretty well. It's not super fast but it always finds stuff within 1min. Is there any better browser related alternatives that could be more used for production ready?
Our app is basically having the browser agent to look at different groceries websites and have it find certain products
Aggressive_Bed7113@reddit
A 1-minute execution time indicates the LLM is bottlenecked by parsing raw DOM structure. For a production alternative, check out predicate-runtime (PredicateSystems/sdk-python). It approaches the problem differently:
Semantic Snapshots: It extracts only actionable elements instead of feeding the full HTML tree to the model, which significantly reduces token load and latency.
State Determinism: It uses hard code assertions (e.g., element_exists) to verify state changes after an action, rather than relying on the LLM to validate its own clicks.
This prevents the execution drag and endless hallucination loops common when navigating complex e-commerce sites.
Read the blog which hit the front page of hacker news last month: https://predicatesystems.ai/blog/verification-layer-amazon-case-study
The repo: https://github.com/PredicateSystems/sdk-python
a_lit_bruh@reddit
Very interesting approach. I am trying to build some similar agent that can work locally using FSMs. How much is the cost per step, like thinking and executing. Both in terms of input/output tokens as well as time taken.
Aggressive_Bed7113@reddit
It’s totally free if you use local LLM models (could be as small as 3B for executor and 8B for planner. If you want to be more token efficient, you can use the cloud version of the SDK gateway to aggressively prune the DOM with ML-reranking to support ordinality (eg first product with rating of at least 4.0)
nightwing_2@reddit
hey can you tell me more about the token efficiency, im using browser-use my input tokes are close to 16m and output 1m. how can i reduce the input tokens which takes entire dom structure as input every step
Aggressive_Bed7113@reddit
Feeding the entire dom tree to LLM is absolutely unnecessary
See the two demos with code examples:
https://www.reddit.com/r/LocalLLM/s/oDsul7P1lh https://www.reddit.com/r/LocalLLM/s/rZ80ClyGJS
nightwing_2@reddit
RUN SUMMARY
Token usage: total_prompt_tokens=1255331 total_prompt_cost=0.0 total_prompt_cached_tokens=591033 total_prompt_cached_cost=0.0 total_completion_tokens=54701 total_completion_cost=0.0 total_tokens=1310032 total_cost=0.0 entry_count=60 by_model={'claude-sonnet-4-6': ModelUsageStats(model='claude-sonnet-4-6', prompt_tokens=1255331, completion_tokens=54701, total_tokens=1310032, cost=0.0, invocations=60, average_tokens_per_invocation=21833.866666666665)}
Total duration: 1379.5s
Total steps: 64
Errors: [None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, '1 validation error for AgentOutput\naction\n Input should be a valid list [type=list_type, input_value=\'[{"evaluate": {"code": "...join(\\\', \\\'); })()"}]}]\', input_type=str]\n For further information visit https://errors.pydantic.dev/2.12/v/list_type', None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None]
these are my current usage for single test workflow im using claude sonnet
a_lit_bruh@reddit
Whatever program this is, it's not written well. And frankly, you don't need claude-sonnet-4.6 to use a browser. Depending on how clear you are with your browser use instruction, a much smaller model like Qwen3.5 should theoretically achieve the same thing with magnitudes of less tokens. Take OPs agent for example, local models are amazing.
I am actually working on a prototype that tackles much more complex objectives while using less than 500k tokens input and less than 10k tokens as output. With a additional verification-control layer. I will create a similar demo post soon here. Please stay tuned!
nightwing_2@reddit
sure ill wait! but we have tested so many models other than sonnet and we are facing flaky-ness on our test cases. its not confirmed weather the test will run or not or will it go to the right locator everytime we run it
a_lit_bruh@reddit
Oh. Can you give me one example workflow which smaller models did not achieve?
I believe, the larger models with their "Agentic Capabilities" have a lot of fluff inserted in the name of priming prompts before your prompt. They'd also receive lots of junk which is unnecessary for the current work of the browser at that time. So being able to fully control it will give you a lot of control on the LLMs.
Staying true to LocalLlama here, I'd suggest you start looking into Cloud GPU providers and open source models. I can help you with some recos. Feel free to DM me
Aggressive_Bed7113@reddit
Ouch, that burns so many tokens. Try out the examples I linked above, it should save you 80% tokens at least. And you can use local llm small models like 3B - 8B on your machine
Loud-Television-7192@reddit
Depending on what "better for production" means to you, a few things to consider:
If the bottleneck is speed and resource usage, Lightpanda in an open-source headless browser designed specifically for agent workflows. It skips CSS rendering, image loading, and GPU compositing since your agent doesn't necessarily need any of that. On real-world benchmarks (933 pages over network), it uses 215MB vs Chrome's 2GB at 25 parallel tasks and finishes 9x faster.
For your grocery use case specifically, two features might help:
It has a native MCP server built into the binary and is also compatible with Puppeteer/Playwright through CDP if you want to keep your existing automation code
It's beta so not every site works yet, but for a defined set of grocery sites you could test compatibility pretty quickly. We're in active development and if a site you need isn't working, open a GitHub issue and we'll get to it
GitHub: https://github.com/lightpanda-io/browser
Loud-Television-7192@reddit
Technical details on the LP commands: https://lightpanda.io/blog/posts/lp-domain-commands-and-native-mcp
Effective_Ad1215@reddit
Yeah browser-use is decent for prototyping but that 1min per task adds up fast in production. I was scraping grocery sites to and hit the same wall - everything worked but sooooo slow.
Switched to Playwright with some custom caching for DOM snapshots, which helped. But honestly what finally made it production-ready was using Actionbook for the action manuals + caching layer. Cut our agent runtime from like 45 seconds to 5-8 seconds per product search, and token usage dropped to almost nothing. Still requires tuning but night and day difference.
The manual part sucks to set up initially but once you have reliable selectors it just... works. Mostly. Grocery sites change layouts constantly though, fair warning.
Comfortable-Baby-719@reddit (OP)
Thank you so much!! Now I am developing an app that needs to fetch US insurance payor policy websites (like uhc, aetna..) and it contains not only DOM but also embedded pdfs and etc where we need the agent to actually go and fetch the info
By any chance does Actionbook handle those non DOM cases as well? thanks again!
Guinness@reddit
Browserless which is based on Playwright. Crawl4ai. Selenium etc etc.