admin管理员组文章数量:1335840
Lets say I'm using urllib2 and cookiejar (like so) to get responses from websites. Now I'm looking for an easy way to use jQuery to essentially scrape data from the response returned from the webserver.
I understand that there are other modules that can be used in python for web-scraping (like), but is it possibly with just jQuery mands? I'm assuming I'd need some sort of js parser within python?
The reason that I am wanting to use jQuery is that I have ~20 Greasemonkey scripts(mostly written by others) that do some interesting modifications to numerous web sites and web games. They do all of the DOM modifications with jQuery. Instead of pletely refactoring most of this working and dependable code, I'd like to be able to simply port it to python (enabling simple and effective automation).
Lets say I'm using urllib2 and cookiejar (like so) to get responses from websites. Now I'm looking for an easy way to use jQuery to essentially scrape data from the response returned from the webserver.
I understand that there are other modules that can be used in python for web-scraping (like), but is it possibly with just jQuery mands? I'm assuming I'd need some sort of js parser within python?
The reason that I am wanting to use jQuery is that I have ~20 Greasemonkey scripts(mostly written by others) that do some interesting modifications to numerous web sites and web games. They do all of the DOM modifications with jQuery. Instead of pletely refactoring most of this working and dependable code, I'd like to be able to simply port it to python (enabling simple and effective automation).
Share Improve this question edited May 23, 2017 at 12:07 CommunityBot 11 silver badge asked Oct 5, 2012 at 14:37 g19fanaticg19fanatic 11k6 gold badges36 silver badges65 bronze badges2 Answers
Reset to default 7pyquery is suited perfectly for this task.
It allows you to use jQuery like selectors on (X)HTML/XML from Python.
For example:
>>> from pyquery import PyQuery as pq
>>> d = pq("<html><p id="hello">Foo</p></html>")
>>> d("#hello")
[<p#hello.hello>]
>>> d('p:first')
[<p#hello.hello>]
See the plete API documentation for details, and the project page on bitbucket for the source and issue tracker.
Use lxml
to parse the HTML and use it's cssselect
module:
from lxml.cssselect import CSSSelector
from lxml import etree
tree = etree.parse(document)
elements = CSSSelector('div.content')(tree)
本文标签: javascriptHow to run jquery commands on HTML in python for DOM actionsscrapingStack Overflow
版权声明:本文标题:javascript - How to run jquery commands on HTML in python for DOM actionsscraping? - Stack Overflow 内容由网友自发贡献,该文观点仅代表作者本人, 转载请联系作者并注明出处:http://www.betaflare.com/web/1742400173a2467727.html, 本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容,一经查实,本站将立刻删除。
发表评论