Stop Fixing Brittle Scrapers: Using Local Vision (Qwen-VL) + OpenClaw to Automate Complex SPAs (Airbnb Demo)
Posted by Lumpy-Accountant-750@reddit | LocalLLaMA | View on Reddit | 5 comments
Hey everyone! đź‘‹
As someone who has spent way too many hours fixing scrapers because a website changed a single CSS class, I decided to try a different approach. I wanted to share a project I’ve been working on that moves away from brittle DOM parsing and uses AI Vision to "see" the web like a human does.
I’ve just open-sourced a case study and a set of tools to build an Airbnb Price Tracker that is practically maintenance-free.
Why this is a game-changer:
- Visual-to-Data: It uses Qwen-VL to extract data directly from screenshots. It doesn't care about dynamic Tailwind classes or obfuscated SPAs. If it’s on the screen, the AI can find it.
- 100% Local & Private: I’m running the model locally (via vLLM/Ollama). No API costs, no privacy leaks, and much faster for batch processing.
- Deterministic RPA: The AI acts as the "brain" that generates native code, but a separate RPA engine (OpenClaw) "drives the car." This avoids the usual "agentic hallucinations" you see in LLM-based testing.
- Open Source & Extensible: This isn't just a script; it’s a reusable "skill" that you can plug into any automation workflow.
Check out the demo and the code here:
- 📺 Full Case Study & Video:GitHub - Airbnb Compare Scenario
- 🛠️ The Toolset:OpenClaw Skill Hub
- I’m curious—how many of you are already moving toward Vision-based automation for complex web apps? Let's discuss in the comments!
laterbreh@reddit
So is this a scraper or browser automation? You know there are many solutions to this that dont require AI. So im not sure "how long you have been doing this".
Universal scrapers use playwrite or puppeteer and readability to get clean content. No ai required. can script actual browser usage that doesnt rely on "css". In fact your AI is using those automation toolsets to do what you are describing.
Gonna categorize this post as "Slop".
Lumpy-Accountant-750@reddit (OP)
I appreciate the skepticism, but you’re missing the core problem this solves: stability and maintenance overhead.
Yes, Playwright and Puppeteer are the foundation (I use them too), but traditional 'readability' or CSS-based scrapers are notoriously brittle on complex, obfuscated SPAs where the DOM structure changes weekly.
The goal here isn't just 'scraping'—it's AI-recorded RPA.
divnesting changes. That’s the 'zero-maintenance' part.Calling a solution to a decade-old engineering problem (brittle selectors) 'slop' because it leverages a modern tool is a bit ironic. This is about spending less time fixing broken scripts and more time actually using the data.
Lumpy-Accountant-750@reddit (OP)
I appreciate the skepticism, but you’re missing the core problem this solves: stability and maintenance overhead.
Yes, Playwright and Puppeteer are the foundation (I use them too), but traditional 'readability' or CSS-based scrapers are notoriously brittle on complex, obfuscated SPAs where the DOM structure changes weekly.
The goal here isn't just 'scraping'—it's AI-recorded RPA.
divnesting changes. That’s the 'zero-maintenance' part.Calling a solution to a decade-old engineering problem (brittle selectors) 'slop' because it leverages a modern tool is a bit ironic. This is about spending less time fixing broken scripts and more time actually using the data."
Why this works:
Pro-tip: Don't engage further if they keep trolling. You've stated the technical facts; the people who actually build scrapers for a living will see the value in what you're doing.
StupidityCanFly@reddit
I have custom vision based agents for web browsing deployed in production. The model used is Qwen3.5-27B (NVFP4). Nothing groundbreaking there, just make the LLM act like a human. “See then do”, simple as that. Minimal DOM use (just enough to annotate the current viewport) and only if really needed.
The agents have simple tools only: scroll, click, look (screenshot passed to VLM), look_closely (annotated screenshot passed to VLM), navigate, back and provide_input. Works with playwright and CDP.
The fun part? Reading the reasoning traces. The agents can browse an ecommerce store, add some products to cart and pass through the checkout, they can do the same on SaaS sites, or any kind of site.
__JockY__@reddit
SLOP
Good god, the slop. The SLOP. Make it stop.