Scraping Yellow Pages for a uni project, met some trouble.
Posted by Optimal_Ad7577@reddit | learnprogramming | View on Reddit | 1 comments
Hey guys (or maybe this belongs in r/nocode, idk)
So I've got a web scraping assignment for one of my data management classes and the target is Yellow Pages. Need to scrape business names, phone numbers, addresses, categories, and website URLs across a few hundred listings in a specific region.
I wrote a basic requests + BeautifulSoup script but it keeps breaking, sometimes the pagination doesn't load right, sometimes I get rate limited after like 50 requests, and I genuinely cnt figure out if it's my code or the site blocking me. Error I keep hitting:
AttributeError: 'NoneType' object has no attribute 'text'
I know it means not finding the element, but the selector looks right when I inspect it manually 🙃
The project is a cleaned CSV that I'll run some basic pandas analysis on business density by category, contact info completeness rate, that kind of thing. The scraping itself isn't being graded, just the dataset quality and the analysis.
What should I do? Or i just use a no-code web scrpaing tool (octoparse and Oxylabs, thunderbit) ? I found these has template like this
I'm also on a deadline lol
jkbruhhehe@reddit
Had the exact same NoneType issue scraping directory sites, usually means the page is rendering some elements client-side so BeautifulSoup never sees them. If you're on a deadline, Octoparse has a free tier and handles pagination + rate limiting automatically, got me a clean CSV in like 20 mins for a similar project. Def come back and learn the requests/BS4 approach after submission tho, it's worth it 👍