admin管理员组文章数量:1328337
I have an XSLT with javascript in it which uses "< ;" and "> ;" inside for loop
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="">
<xsl:template match="/">
<html>
<head> </head>
<body>
<script language="javascript" type="text/javascript">
function example() {
var trs = document.getElementsByTagName("tr");
for (var i = 0; i < trs.length; i++) {
}
}
</script>
</body>
</html>
I am using PYTHON LXML library to generate HTML using XSLT and XML.
import lxml.etree as ET
xml = ET.parse('sample.xml')
xslt = ET.parse('sample.xsl')
transform = ET.XSLT(xslt)
content = transform(xml)
f = open('output.html','w')
f.write(ET.tostring(content , pretty_print=True))
f.close()
But LXML is unable to replace special characters in the output HTML file
< ; to '<' and > ; to '>'
Is there any standard practice using LXML to replace "< ;" to '<' ?
To over e this issue I have to write another piece of code before writing to the file.
content = content.replace(">", ">")
content = content.replace("<", "<")
I have an XSLT with javascript in it which uses "< ;" and "> ;" inside for loop
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3/1999/XSL/Transform">
<xsl:template match="/">
<html>
<head> </head>
<body>
<script language="javascript" type="text/javascript">
function example() {
var trs = document.getElementsByTagName("tr");
for (var i = 0; i < trs.length; i++) {
}
}
</script>
</body>
</html>
I am using PYTHON LXML library to generate HTML using XSLT and XML.
import lxml.etree as ET
xml = ET.parse('sample.xml')
xslt = ET.parse('sample.xsl')
transform = ET.XSLT(xslt)
content = transform(xml)
f = open('output.html','w')
f.write(ET.tostring(content , pretty_print=True))
f.close()
But LXML is unable to replace special characters in the output HTML file
< ; to '<' and > ; to '>'
Is there any standard practice using LXML to replace "< ;" to '<' ?
To over e this issue I have to write another piece of code before writing to the file.
content = content.replace(">", ">")
content = content.replace("<", "<")
Share
Improve this question
edited Jun 2, 2015 at 22:48
JCKE
3945 silver badges15 bronze badges
asked Sep 26, 2013 at 0:15
VenkateshVenkatesh
3,7899 gold badges33 silver badges38 bronze badges
2 Answers
Reset to default 7In order to decode/convert HTML entities, you should use method="html"
in tostring()
call:
ET.tostring(content, method="html", pretty_print=True)
or:
lxml.html.tostring(content, pretty_print=True)
DEMO:
from lxml import etree
text = """<html>
<body>
<script> 1 < 2 </script>
</body>
</html>
"""
tree = etree.fromstring(text)
print etree.tostring(tree, method="html")
prints:
<html>
<body>
<script> 1 < 2 </script>
</body>
</html>
You can also just surround the script contents in a CDATA wrapper to stop it getting eaten, like so:
<script language="javascript" type="text/javascript">
<![CDATA[
function example() {
var trs = document.getElementsByTagName("tr");
for (var i = 0; i < trs.length; i++) {
}
}
]]>
</script>
本文标签: javascriptPython lxml library fails to parse amplt and ampgtStack Overflow
版权声明:本文标题:javascript - Python lxml library fails to parse &lt; and &gt; - Stack Overflow 内容由网友自发贡献,该文观点仅代表作者本人, 转载请联系作者并注明出处:http://www.betaflare.com/web/1742260315a2442389.html, 本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容,一经查实,本站将立刻删除。
发表评论