admin管理员组文章数量:1393070
I need to parse an HTML document that contains javascript code with json object.
Something like this:
<html>
<head>
</head>
<body>
<script type="text/javascript">
myJSONObject = {"name": "steve", "city": "new york"}
</script>
<p>Hello World.</p>
</body>
</html>
How can I extract the myJSONObject value with python?
I need to parse an HTML document that contains javascript code with json object.
Something like this:
<html>
<head>
</head>
<body>
<script type="text/javascript">
myJSONObject = {"name": "steve", "city": "new york"}
</script>
<p>Hello World.</p>
</body>
</html>
How can I extract the myJSONObject value with python?
Share Improve this question asked Oct 14, 2011 at 8:48 ShahafShahaf 3571 gold badge7 silver badges16 bronze badges 2- Could you extract out your .js file first? – Kit Ho Commented Oct 14, 2011 at 8:52
- No. I have only the html file and the javascript code inside it. – Shahaf Commented Oct 14, 2011 at 8:55
1 Answer
Reset to default 8You can use lxml to parse the HTML, and then extract the JSON:
>>> import lxml.etree,json
>>> s = '''<html><body><script type="text/javascript">
myJSONObject = {"name": "steve", "city": "new york"}
</script></body></html>'''
>>> js = lxml.etree.HTML(s).find('.//body/script').text
>>> jsonCode = js.partition('=')[2].strip()
>>> json.loads(jsonCode)
{u'city': u'new york', u'name': u'steve'}
本文标签: Extract javascript variable value from html document with pythonStack Overflow
版权声明:本文标题:Extract javascript variable value from html document with python - Stack Overflow 内容由网友自发贡献,该文观点仅代表作者本人, 转载请联系作者并注明出处:http://www.betaflare.com/web/1744773953a2624511.html, 本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容,一经查实,本站将立刻删除。
发表评论