admin管理员组

文章数量:1393070

I need to parse an HTML document that contains javascript code with json object.

Something like this:

<html>
   <head>
   </head>
<body>
    <script type="text/javascript">
        myJSONObject = {"name": "steve", "city": "new york"}
    </script>

   <p>Hello World.</p>
</body>
</html>

How can I extract the myJSONObject value with python?

I need to parse an HTML document that contains javascript code with json object.

Something like this:

<html>
   <head>
   </head>
<body>
    <script type="text/javascript">
        myJSONObject = {"name": "steve", "city": "new york"}
    </script>

   <p>Hello World.</p>
</body>
</html>

How can I extract the myJSONObject value with python?

Share Improve this question asked Oct 14, 2011 at 8:48 ShahafShahaf 3571 gold badge7 silver badges16 bronze badges 2
  • Could you extract out your .js file first? – Kit Ho Commented Oct 14, 2011 at 8:52
  • No. I have only the html file and the javascript code inside it. – Shahaf Commented Oct 14, 2011 at 8:55
Add a ment  | 

1 Answer 1

Reset to default 8

You can use lxml to parse the HTML, and then extract the JSON:

>>> import lxml.etree,json
>>> s = '''<html><body><script type="text/javascript">
             myJSONObject = {"name": "steve", "city": "new york"}
           </script></body></html>'''
>>> js = lxml.etree.HTML(s).find('.//body/script').text
>>> jsonCode = js.partition('=')[2].strip()
>>> json.loads(jsonCode)
{u'city': u'new york', u'name': u'steve'}

本文标签: Extract javascript variable value from html document with pythonStack Overflow