admin管理员组

文章数量:1335840

Lets say I'm using urllib2 and cookiejar (like so) to get responses from websites. Now I'm looking for an easy way to use jQuery to essentially scrape data from the response returned from the webserver.

I understand that there are other modules that can be used in python for web-scraping (like), but is it possibly with just jQuery mands? I'm assuming I'd need some sort of js parser within python?

The reason that I am wanting to use jQuery is that I have ~20 Greasemonkey scripts(mostly written by others) that do some interesting modifications to numerous web sites and web games. They do all of the DOM modifications with jQuery. Instead of pletely refactoring most of this working and dependable code, I'd like to be able to simply port it to python (enabling simple and effective automation).

Lets say I'm using urllib2 and cookiejar (like so) to get responses from websites. Now I'm looking for an easy way to use jQuery to essentially scrape data from the response returned from the webserver.

I understand that there are other modules that can be used in python for web-scraping (like), but is it possibly with just jQuery mands? I'm assuming I'd need some sort of js parser within python?

The reason that I am wanting to use jQuery is that I have ~20 Greasemonkey scripts(mostly written by others) that do some interesting modifications to numerous web sites and web games. They do all of the DOM modifications with jQuery. Instead of pletely refactoring most of this working and dependable code, I'd like to be able to simply port it to python (enabling simple and effective automation).

Share Improve this question edited May 23, 2017 at 12:07 CommunityBot 11 silver badge asked Oct 5, 2012 at 14:37 g19fanaticg19fanatic 11k6 gold badges36 silver badges65 bronze badges
Add a ment  | 

2 Answers 2

Reset to default 7

pyquery is suited perfectly for this task.

It allows you to use jQuery like selectors on (X)HTML/XML from Python.

For example:

>>> from pyquery import PyQuery as pq
>>> d = pq("<html><p id="hello">Foo</p></html>")

>>> d("#hello")
[<p#hello.hello>]

>>> d('p:first')
[<p#hello.hello>]

See the plete API documentation for details, and the project page on bitbucket for the source and issue tracker.

Use lxml to parse the HTML and use it's cssselect module:

from lxml.cssselect import CSSSelector
from lxml import etree

tree = etree.parse(document)
elements = CSSSelector('div.content')(tree)

本文标签: javascriptHow to run jquery commands on HTML in python for DOM actionsscrapingStack Overflow