admin管理员组文章数量:1389762
My code is in Scala.js, but I think the gist of it should be easy to understand from a JavaScript perspective:
def htmlToXHTML(input: String)
(implicit parser: DOMParser, serializer: XMLSerializer): String = {
val doc = parser.parseFromString(input, "text/html")
val body = getElementByXpath("/html/body", doc).singleNodeValue
val bodyXmlString = serializer.serializeToString(body)
val xmldoc = parser.parseFromString(bodyXmlString, "application/xml")
val xmlDocElems: NodeList = xmldoc.getElementsByTagName("*")
xmlDocElems.foreach{
case elem: Element =>
elem.removeAttribute("xmlns")
println(s"Found element $elem with html: ${elem.outerHTML}")
case node => println(s"Warning: found unexpected non-element node: $node.")
}
xmldoc.firstElementChild.innerHTML
}
This is used above, so including it for pleteness ():
def getElementByXpath(xpath: String, doc: Document): XPathResult =
doc.evaluate(
xpath, doc, null.asInstanceOf[XPathNSResolver],
XPathResult.FIRST_ORDERED_NODE_TYPE, null
)
In short, this function reads an HTML string, converts it to an HTML document, serializes to XML, reparses as XML, and finds all the elements in the doc and loops over them (foreach
), and then removes the xmlns
attribute. It seems that the resulting innerHTML, however, still has the xmlns
attributes on elements, even though the first println
(aka console.log
) indicates we are finding the elements in question, but not removing the xmlns
attributes.
The problem may derive from default values specified in a DTD:
If a default value for the attribute is defined in a DTD, a new attribute immediately appears with the default value
My code is in Scala.js, but I think the gist of it should be easy to understand from a JavaScript perspective:
def htmlToXHTML(input: String)
(implicit parser: DOMParser, serializer: XMLSerializer): String = {
val doc = parser.parseFromString(input, "text/html")
val body = getElementByXpath("/html/body", doc).singleNodeValue
val bodyXmlString = serializer.serializeToString(body)
val xmldoc = parser.parseFromString(bodyXmlString, "application/xml")
val xmlDocElems: NodeList = xmldoc.getElementsByTagName("*")
xmlDocElems.foreach{
case elem: Element =>
elem.removeAttribute("xmlns")
println(s"Found element $elem with html: ${elem.outerHTML}")
case node => println(s"Warning: found unexpected non-element node: $node.")
}
xmldoc.firstElementChild.innerHTML
}
This is used above, so including it for pleteness (https://stackoverflow./a/14284815/3096687):
def getElementByXpath(xpath: String, doc: Document): XPathResult =
doc.evaluate(
xpath, doc, null.asInstanceOf[XPathNSResolver],
XPathResult.FIRST_ORDERED_NODE_TYPE, null
)
In short, this function reads an HTML string, converts it to an HTML document, serializes to XML, reparses as XML, and finds all the elements in the doc and loops over them (foreach
), and then removes the xmlns
attribute. It seems that the resulting innerHTML, however, still has the xmlns
attributes on elements, even though the first println
(aka console.log
) indicates we are finding the elements in question, but not removing the xmlns
attributes.
The problem may derive from default values specified in a DTD:
Share Improve this question edited Jan 19, 2018 at 15:26 bbarker asked Jan 19, 2018 at 14:37 bbarkerbbarker 13.2k11 gold badges47 silver badges70 bronze badges 3If a default value for the attribute is defined in a DTD, a new attribute immediately appears with the default value
-
I just tried
elem.setAttribute("xmlns", "")
instead of remove, but the result is weird:<a0:br xmlns:a0="http://www.w3/1999/xhtml" xmlns="" />
- I guess it doesn't deal well with the lack of namespace. – bbarker Commented Jan 19, 2018 at 14:50 - 1 You misunderstand how the DOM API works. The element's qualified name still includes its namespace URI, even if you remove the xmlns attribute. Think of the xmlns attribute as a cross-check of the namespace URI, rather than the namespace URI itself. – Alohci Commented Jan 19, 2018 at 17:23
-
@Alohci, thanks, yeah I guess I understood that at some level, and that this is ultimately more an issue of serialization format (but
XMLSerializer
does not appear to be configurable, so I was trying to work around that) – bbarker Commented Jan 19, 2018 at 18:26
3 Answers
Reset to default 6As mentioned, the easiest is to remove it from the result string:
xmls.serializeToString(domNode).replace(/xmlns="[^"]+"/, '')
I probably would cheat and remove the xmlns
from the resulting string, as it's a huge pain to make the elements lose their namespace.
If you insist on doing that, you could try building a document from scratch while walking the original DOM -- pedantically copying everything but the namespaces (i.e. using createElementNS with an empty namespace?)
If you only want to remove the xmlns=""
from <html>
, you can use this specific regexp:
xmls.serializeToString(domNode)
.replace(/^<html xmlns="[^"]+">/, "<html>");
本文标签: javascriptHow to serialize XML without xmlns attributesStack Overflow
版权声明:本文标题:javascript - How to serialize XML without xmlns attributes? - Stack Overflow 内容由网友自发贡献,该文观点仅代表作者本人, 转载请联系作者并注明出处:http://www.betaflare.com/web/1744691566a2620025.html, 本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容,一经查实,本站将立刻删除。
发表评论