How do you handle broken selectors when scraping e-commerce sites?
Posted by XxAlucard95xX@reddit | learnprogramming | View on Reddit | 4 comments
I’ve got scrapers set up for like 30 different product pages, and every week at least 3 or 4 of them stop working because the HTML changes. It’s getting super annoying to maintain this stuff. Is there a better way to automate fixing these?
hasdata_com@reddit
Keeping scrapers working is just part of the job, HTML changes, you fix selectors. That's normal. LLM libs can auto-update selectors, or use a scraping API to offload maintenance.
Salty_Dugtrio@reddit
Using the proper API for these websites instead of scraping them is the actual solution. Otherwise it's always fighting against changes, as your actions are most likely against the TOS of these platforms.
shelledroot@reddit
Not every website has an API though. But yes that would be the most correct option.
shelledroot@reddit
Them the brakes of scraping, you don't have a standardized nor contractually stable format you can utilize.
Paying for API isn't always an option either as some of these APIs got super expensive to fight against AI, if there even is an API. You could contact the websites themselves and ask if they are willing to expose an API for you, but it'll likely cost you some.