javascript - Python lxml library fails to parse &lt; and &gt; - Stack Overflow

IT技术

更新时间：2025-03-182

admin管理员组
文章数量:1328748

I have an XSLT with javascript in it which uses "&lt ;" and "&gt ;" inside for loop

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="">
<xsl:template match="/">
<html>
  <head> </head>
  <body>
    <script language="javascript" type="text/javascript">
  function example() {
        var trs = document.getElementsByTagName("tr");
    for (var i = 0; i &lt; trs.length; i++) {
    }
      }
     </script>
  </body>
</html>

I am using PYTHON LXML library to generate HTML using XSLT and XML.

import lxml.etree as ET
xml = ET.parse('sample.xml')
xslt = ET.parse('sample.xsl')
transform = ET.XSLT(xslt)
content = transform(xml)
f = open('output.html','w')
f.write(ET.tostring(content , pretty_print=True))
f.close()

But LXML is unable to replace special characters in the output HTML file

&lt ; to '<' and &gt ; to '>'

Is there any standard practice using LXML to replace "&lt ;" to '<' ?

To over e this issue I have to write another piece of code before writing to the file.

content = content.replace("&gt;", ">")
content = content.replace("&lt;", "<")

I have an XSLT with javascript in it which uses "&lt ;" and "&gt ;" inside for loop

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3/1999/XSL/Transform">
<xsl:template match="/">
<html>
  <head> </head>
  <body>
    <script language="javascript" type="text/javascript">
  function example() {
        var trs = document.getElementsByTagName("tr");
    for (var i = 0; i &lt; trs.length; i++) {
    }
      }
     </script>
  </body>
</html>

I am using PYTHON LXML library to generate HTML using XSLT and XML.

import lxml.etree as ET
xml = ET.parse('sample.xml')
xslt = ET.parse('sample.xsl')
transform = ET.XSLT(xslt)
content = transform(xml)
f = open('output.html','w')
f.write(ET.tostring(content , pretty_print=True))
f.close()

But LXML is unable to replace special characters in the output HTML file

&lt ; to '<' and &gt ; to '>'

Is there any standard practice using LXML to replace "&lt ;" to '<' ?

To over e this issue I have to write another piece of code before writing to the file.

content = content.replace("&gt;", ">")
content = content.replace("&lt;", "<")

Share Improve this question edited Jun 2, 2015 at 22:48 JCKE 3945 silver badges15 bronze badges asked Sep 26, 2013 at 0:15 Venkatesh 3,7899 gold badges33 silver badges38 bronze badges

Add a ment |

2 Answers 2

Sorted by: Reset to default 7

In order to decode/convert HTML entities, you should use method="html" in tostring() call:

ET.tostring(content, method="html", pretty_print=True)

or:

lxml.html.tostring(content, pretty_print=True)

DEMO:

from lxml import etree


text = """<html>
  <body>
    <script> 1 &lt; 2 </script>
  </body>
</html>
"""

tree = etree.fromstring(text)
print etree.tostring(tree, method="html")

prints:

<html>
  <body>
    <script> 1 < 2 </script>
  </body>
</html>

You can also just surround the script contents in a CDATA wrapper to stop it getting eaten, like so:

<script language="javascript" type="text/javascript">
  <![CDATA[
    function example() {
          var trs = document.getElementsByTagName("tr");
      for (var i = 0; i < trs.length; i++) {
      }
    }
  ]]>
</script>

本文标签： javascriptPython lxml library fails to parse amplt and ampgtStack Overflow

版权声明：本文标题：javascript - Python lxml library fails to parse &lt; and &gt; - Stack Overflow 内容由网友自发贡献，该文观点仅代表作者本人，转载请联系作者并注明出处：http://www.betaflare.com/web/1742260315a2442389.html，本站仅提供信息存储空间服务，不拥有所有权，不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容，一经查实，本站将立刻删除。

编程频道|软件玩家 - 软件改变生活！

javascript - Python lxml library fails to parse &lt; and &gt; - Stack Overflow

2 Answers 2

更多相关文章