admin管理员组文章数量:1400614
I'm trying to scrape an HTML element in a webpage. The content of this element are generated by Javascript and thus cannot be scraped by simply running a requests.GET:
response = requests.get(url)
.
I read in other posts that Selenium can be used to solve this issue, but it requires an actual browser installed and the use of the corresponding driver. This code is meant to be ran on different machines that frequently change, and so I cannot write it so that it only works if a particular browser is installed.
If there is a way to scrape the Javascript content without relying on a particular browser then that is what I'm looking for, no matter the module.
I'm trying to scrape an HTML element in a webpage. The content of this element are generated by Javascript and thus cannot be scraped by simply running a requests.GET:
response = requests.get(url)
.
I read in other posts that Selenium can be used to solve this issue, but it requires an actual browser installed and the use of the corresponding driver. This code is meant to be ran on different machines that frequently change, and so I cannot write it so that it only works if a particular browser is installed.
If there is a way to scrape the Javascript content without relying on a particular browser then that is what I'm looking for, no matter the module.
Share Improve this question asked Feb 4, 2022 at 17:05 Put MePut Me 1371 silver badge8 bronze badges1 Answer
Reset to default 7Aside from automating a browser your other 2 options are as follows:
try find the backend query that loads the data via javascript. It's not a guarantee that it will exist but open your browser's Developer Tools - Network tab - fetch/Xhr and then refresh the page, hopefully you'll see requests to a backend api that loads the data you want. If you do find a request click on it and explore the endpoint, headers and possibly the payload that is sent to get the response you are looking for, these can all be recreated in python using requests to that hidden endpoint.
the other possiblility is that the data hidden in the HTML within a script tag possibly in a json file... Open the Elements tab of your developer tools where you can see the HTML of the page, right click on the tag and click "expand recursively" this will open every tag (it might take a second) and you'll be able to scroll down and search for the data you want. Ignore the regular HTML tags, we know it is loaded by javascript so look through any "script" tag. If you do find it then you can hopefully find it in your script with a bination of Beautiful Soup to get the script tag and string slicing to just get out the json.
If neither of those produce results then try requests_html package, and specifically the "render" method. It automatically installs a headless browser when you first run the render method in your script.
What site is it, perhaps I can offer more help if I can see it?
本文标签: Python Scraping JavaScript page without the need of an installed browserStack Overflow
版权声明:本文标题:Python Scraping JavaScript page without the need of an installed browser - Stack Overflow 内容由网友自发贡献,该文观点仅代表作者本人, 转载请联系作者并注明出处:http://www.betaflare.com/web/1744250816a2597240.html, 本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容,一经查实,本站将立刻删除。
发表评论