trying scrape a html page with requests module in python
Posted by Abdul7676@reddit | programming | View on Reddit | 8 comments
I want to get the search results of pyautogui in https://pypi.org/search/
and then print it in html format but for some reason it prints error
if you have any idea to solve this issue please help
code:
import requests
query = "https://pypi.org/search/?q="+'pyautogui'
print()
print(query)
#Fetch the results page with requests module
res = requests.get(query)
print(res.text)
output:
https://pypi.org/search/?q=pyautogui
A required part of this site couldn’t load. This may be due to a browser
extension, network issues, or browser settings. Please check your
connection, disable any ad blockers, or try using a different browser.
programming-ModTeam@reddit
This post was removed for violating the "/r/programming is not a support forum" rule. Please see the side-bar for details.
revereddesecration@reddit
Use BeautifulSoup my dude.
pip install beautifulsoup4deceze@reddit
To do what? Parse this HTML and extract the error message?
revereddesecration@reddit
If you’re scraping HTML with Python, you’ll want to use BS4. “Using the requests module” is inane.
deceze@reddit
BS4 only does HTML parsing. You’ll still need to request the HTML with something. BS4 is not going to fetch anything other than what requests gets you.
fiskfisk@reddit
Use the pypi API. It's made for programmatic access to pypi. No need to parse html.
https://docs.pypi.org/api/
riyosko@reddit
PyPI requires Javascript (aka a real browser session) to get you past the check on thier site and aquire a browser cookie, you can get this cookie from a normal browser session in the networking tab from the devtools, or with selenium webdriver, then you can query the site using those requests with the provided cookie.
go the networking tab and reload, you will see a /search request which you can click to view details like cookie, http headers, etc.
copy it and look up how to use a cookie with python requests.
soapbleachdetergent@reddit
r/learnpython might be better subreddit for this question