admin管理员组

文章数量:1427361

For some reason DOMParser is adding some additional #text elements for each newline \n for this url

.rss

...as well as many other RSS I've tried. I checked cnn/bbc feeds, they don't have newlines and dom parser handling them nicely. So I have to add the following before parsing it

var xmlText = htmlText.replace(/\n[ ]*/g, "");
var xmlDoc = parser.parseFromString(xmlText, "text/xml");

Server is returning text/xml.

var channel = xmlDoc.documentElement.childNodes[0];

this returning \n without my code above and channel with correction.

For some reason DOMParser is adding some additional #text elements for each newline \n for this url

http://rt./Root.rss

...as well as many other RSS I've tried. I checked cnn/bbc feeds, they don't have newlines and dom parser handling them nicely. So I have to add the following before parsing it

var xmlText = htmlText.replace(/\n[ ]*/g, "");
var xmlDoc = parser.parseFromString(xmlText, "text/xml");

Server is returning text/xml.

var channel = xmlDoc.documentElement.childNodes[0];

this returning \n without my code above and channel with correction.

Share Improve this question asked May 8, 2010 at 4:42 PabloPablo 29.6k37 gold badges135 silver badges226 bronze badges
Add a ment  | 

3 Answers 3

Reset to default 4

Yes, that's what XML parsers are supposed to do by default. Get used to walking through child nodes checking to see whether they're elements (nodeType===1) or text nodes (3).

From Firefox 3.5 you get the Element Traversal API, giving you properties like firstElementChild and nextElementSibling. This makes walking over the DOM whilst ignoring whitespace easier. Alternatively you could use XPath (doc.evaluate) to find the elements you want.

If you want to remove whitespace nodes for good, it's a much better idea to do it on the parsed DOM than by using a regex hack:

function removeWhitespace(node) {
    for (var i= node.childNodes.length; i-->0;) {
        var child= node.childNodes[i];
        if (child.nodeType===3 && child.data.match(/^\s*$/))
            node.removeChild(child);
        if (child.nodeType===1)
            removeWhitespace(child);
    }
}

For some reason DOMParser is adding some additional #text elements for each newline \n for this url

that is standard behaviour. only IE ignores whithespace between Element Nodes. (XML Whitespace Handling, Whitespace @ MSDN, Whitespace @ MDC)

What is your question? Do you wish to not use the workaround? I think the workaround is necessary as the parser is working as expected.

本文标签: javascriptFirefox DOMParser problemStack Overflow