Getting deeper into Web Scraping.
Posted by jonfy98@reddit | Python | View on Reddit | 47 comments
I am currently getting deeper into web scraping and trying to figure out if its still worth it to do so.
What kind of niche is worth it to get into?
I would love to hear from your own experience about it and if its still possible to make a small career out of it or its total nonsense?
sweetbeems@reddit
My current job requires a lot of scraping. It's a lot more annoying these days because you probably need to render javascript and use something like scrapy-splash. Pair that with needing a proxy server which charges by the megabyte downloaded, you have to be very selective in your request filtering.
Even still, you'll get frequent random 503s and will need to wait and retry, it's very annoying. I will say that utilizing Pydantic for the incoming data is very nice.
It's a valuable skill. Ultimately you'll learn how to deal with data valadation, error handling and error monitoring which are useful skills in any programming endeavor.
Proof_Resource7669@reddit
I used to burn through proxy budgets the same way until I switched to Qoest Proxy for the unlimited credentials and rotating sessions. No more watching megabytes or babysitting 503s.
The per GB pricing model is basically a tax on not knowing better options exist.
jonfy98@reddit (OP)
Wow that’s amazing. Thank you for your point of view on it. Could you recommend some tools according to this problem that I can look further into?
thundersack-3000@reddit
I want to scrape data from the dark web and pull sites but do it politely. Could there be use cases for this?
ResourceSea5482@reddit
Web scraping is definitely still worth it — I just built a food delivery recommendation bot that scrapes Uber Eats in real-time using async Playwright. The key thing I learned is that modern scraping is less about simple HTTP requests and more about handling dynamic JS-rendered pages.
A few tips from my experience: use
storage_stateinstead ofpersistent_contextif you need to manage auth cookies across multiple async tasks — it avoids a lot of multi-threading headaches. Also, control your request frequency and add human-like delays (2-5s between actions) to stay under the radar.As for career potential, I'd say the niche that's growing is combining scraping with AI/automation — not just collecting data, but building agents that act on it. That's where the real value is.
Prudent_Run_8039@reddit
i suggest you to know about MCPs is like doing scrapping but with AI without touching code also it would be the future in the web
Significant-Ad-2654@reddit
One path worth exploring: instead of building and maintaining individual scrapers, consider building on top of scraping APIs. You still need Python skills to process and analyze the data, but the actual scraping infrastructure (proxies, anti-bot bypass, JS rendering) is handled by the API. This lets you focus on the valuable part the data analysis and business logic. I've seen many scrapers burn out maintaining infrastructure when they could be building products on top of the data
MindlessBand9522@reddit
Scraping more alive than ever, bro. Just look at the amount of scrapers on platforms like Apify. But your attitude is not the correct one. I believe the best use cases are usually found out of necessity when you find a road block in your own work.
Specialist_Golf8133@reddit
The market for pure web scraping work is pretty saturated on freelance platforms, but it's extremely valuable as a complementary skill. Most sustainable opportunities come from combining scraping with domain expertise like finance, healthcare data, or supply chain. The technical challenge is shifting too since more sites use JavaScript rendering and anti-bot measures, so you'd want to learn Selenium or Playwright alongside BeautifulSoup. Career-wise, it's less about being a dedicated scraper and more about being a data engineer or analyst who can source hard-to-get data when APIs don't exist. Focus on solving specific business problems rather than scraping for its own sake.
hasdata_com@reddit
Scraping is alive and well as long as data is valuable. The barrier to entry is just higher now.
jonfy98@reddit (OP)
I understand then this needs to be my next step to hop over this entry level and keep going. Thanks for your reply
hasdata_com@reddit
Good luck )
jonfy98@reddit (OP)
Thank you :)
woodside007@reddit
I'll just say, the bots are getting smarter at detecting scrapes and banning ip's. You definitely need a vpn or proxy service. It is becoming a pain in the ass these days.
jonfy98@reddit (OP)
Thank you for your reply so the biggest issue about it might be the blocking of scraping if no api available ?
sawkurawr@reddit
+1 It's still worth it, maybe a little bit harder to start but it always will be hard.
jonfy98@reddit (OP)
I like to have some challenges and feel motivated to extend my knowledge about it even more now.
Key_Investment_6818@reddit
yep , still worth..but the headache has increased alot , simple beautiful soup doesn't help much anymore
jonfy98@reddit (OP)
Yeah I realized that very quickly and also stepped up a little too but the more complex the harder.
Key_Investment_6818@reddit
curl_cffi and playwright are your new friends then
jonfy98@reddit (OP)
I will look it up and get to know these tools. Thank you for your suggestion, well appreciated
jonfy98@reddit (OP)
I will look it up and get to know it thanks for your suggestion
Dame-Sky@reddit
This is a great deep dive. I'm currently using a similar stack for a portfolio analytics project. One thing I found critical as I scaled was moving to asyncio with semaphores.
If you're hitting smaller regional servers (like I am with the JSE), it’s easy to accidentally trigger a 403 or overwhelm them. Using a semaphore to limit concurrent requests to 3 or 5, combined with a custom User-Agent, has been a game-changer for reliability without being an 'aggressive' scraper.
Ethical scraping is as much about the
asyncio.sleep()and retry logic as it is about theBeautifulSoupselectors!ethmad@reddit
I use encryptedproxydotnet! For web scraping as it’s fast and reliable!
jonfy98@reddit (OP)
Does it also work for sites that have bot detection?
ethmad@reddit
Yes! You can visit the web
jed_l@reddit
Yes. You will run into the typical problems with bot detection. That’s really the hardest problem to solve.
jonfy98@reddit (OP)
I understand then I will need to learn even more about it. Thank you
PoeGar@reddit
The same answer as every other joke: Porn.
deceze@reddit
Well, web scraping is getting information from "unsupported" sources. By that I mean, if something has an API that supplies the data, you should definitely use that, as it's supported, stable and documented. If the data you want does not come with an API and is only on some random website, well, you gotta scrape it.
Personally I have not needed to work with data which only exists on websites. I work with APIs, and I build products that interact with and bridge APIs to create something useful. That's just the field I'm in. If you're in some other field, then scraping information may be useful to you. But it's always a brittle and unsupported system, and you'll mostly be fighting uphill battles.
jonfy98@reddit (OP)
That’s also true APIs is mostly the best way but not for any site which makes it harder in my opinion to scrape
themagicman_1231@reddit
How did you get into web scraping? What sources are you using to learn more? Sounds like a lot of fun.
jonfy98@reddit (OP)
I basically started to look into programming as I also have lots of knowledge about PLC from Siemens for automation. And I just researched about what’s beginner friendly to do especially for freelancing and got mostly the answer of web scraping etc. I learned most of it by one of my tutor and self teaching with understand the functions needed.
Fragrant_Ad3054@reddit
Yes, it's worth it; it's not too late to get started.
Indeed, some types of web scraping are saturated. Focusing on competitive intelligence, for example, still seems like a viable option.
Key_Investment_6818@reddit
hey i like the idea , can you tell me more about it? or is there any repo where i can contribute?
Fragrant_Ad3054@reddit
Thanks, that's kind of you. Unfortunately, there's no public repository because there's a confidentiality agreement for this type of project. However, I can explain its overall operation without any problem.
You can PM me so we can discuss it without hijacking the OP's post :)
Fragrant_Ad3054@reddit
Thanks, that's kind of you. Unfortunately, there's no public repository because there's a confidentiality agreement for this type of project; however, I can explain its overall operation without any problem.
Key_Investment_6818@reddit
i know web-scraping , pretty high level since i do it daily for my org , so i was thinking won't your task require access to chats? and if you guys do that , then won't it breach privacy laws? , can you still explain it ...i was lacking motivation but for something like this i might build for my local area to help children
jonfy98@reddit (OP)
Sounds pretty interesting and salute for your work against this organization.
Definetly going to look into that and trying to adjust my skills.
Thank you.
Fragrant_Ad3054@reddit
If you need information or have any questions about web scraping, you can PM me and I'll try to help if I can, with pleasure :)
jonfy98@reddit (OP)
Thank you for that, I will come back to your offer soon once I gathered little bit of knowledge :)
OryxRSA@reddit
Ya, it's a good skill set. Just get familiar with the terms of sites if you are looking to monetise.
Many sites have non-scrapping terms.
jonfy98@reddit (OP)
Then I'll go deeper into it. Yes would be nice if i could get into monetizing this work.
But you're right, i heard many sites strictly forbid to scrape.
eudaimoniclux@reddit
Definitely worth it. In my current company, I have a project where I need to scrape pricing data from a website that runs in a dynamic javascript. Kinda hard actually, but will be really valuable if I would be able to do it.
jonfy98@reddit (OP)
Interesting for sure because I read that web scraping generally is oversaturated and often titled as easy, but reading your comment seems that its more complex. How would you hanlde dynamic Javaascript?
sugarkrassher@reddit
Whats that
jonfy98@reddit (OP)
Web scraping? basically scraping tons of data from any websites and organizing its data into sheets.