Sanitizing html string with javascript using browser to interpret html - Stack Overflow-软件玩家

admin管理员组
文章数量:1391934

I want to use a white list of tags, attributes and values to sanitize a html string, before I place it in the dom. Can safely I construct a dom element, and traverse over that to implement the white list filter, assuming that no malicious javascript could execute until I append the dom element to the document? Are there pitfalls to this approach?

Share Improve this question asked Feb 13, 2014 at 0:37 Piwakawaka 5196 silver badges16 bronze badges

I haven't used the library in the accepted answer myself, but you might check out stackoverflow./questions/5575559/… , with the help pages of perhaps most relevance: owasp/index.php/… and code.google./p/owasp-esapi-js/wiki/MitigatingDOMBasedXSS – Brett Zamir Commented Feb 13, 2014 at 0:52
The advantage of this over HTMLPurifier, etc. would be that it can run dynamically on the client-side without round-tripping to the server. – Brett Zamir Commented Feb 13, 2014 at 0:54
As far as the whitelist that you need, while owasp/index.php/… does make mention of one JS library, github./ecto/bleach and perhaps it could be adapted for client-side usage, it appears to rely on regular expressions which I would not trust to do the job very well (e.g., it doesn't currently match newlines within tags). – Brett Zamir Commented Feb 13, 2014 at 1:10
I also found: github./gbirke/Sanitize.js. I like both answers here - what is the protocol about choosing the correct answer? – Piwakawaka Commented Feb 13, 2014 at 17:45
Haven't examined it, but its approach definitely sounds like the way to go. As far as liking both answers, do you mean liking both libraries or liking both of our Stack Overflow answers? If the latter, no worries. Normally, it's whatever you liked the best (I like to pick the first poster if the answers were similar.). Once you have enough reputation, you can also up-vote other answers. – Brett Zamir Commented Feb 13, 2014 at 22:55

Add a ment |

3 Answers 3

Sorted by: Reset to default 2

It doesn't appear that anything will execute until you insert into the document, as per @rvighne's answer, but there are at least these (unusual) exceptions (tested in FF 27.0):

var userInput = '<a href="http://example." onclick="alert(\'boo!\')">Link<\/a>';
var el = document.createElement('div');
el.innerHTML = userInput;
el.addEventListener("click", function(e) {
    if (e.target.nodeName.toLowerCase() === 'a') {
        alert("I will also cause side effects; I shouldn't run on the wrong link!");
    }
});
el.getElementsByTagName('a')[0].click(); // Alerts "boo!" and "I will also cause side effects; I shouldn't run on the wrong link!"

...or...

var userInput = '<a href="http://example." onclick="alert(\'boo!\')">Link<\/a>';
var el = document.createElement('div');
el.innerHTML = userInput;
el.addEventListener("cat", function(e) { this.getElementsByTagName('a')[0].click(); });
var event = new CustomEvent("cat", {"detail":{}});
el.dispatchEvent(event); // Alerts "boo!"

...or... (though setUserData is deprecated, it is still working):

var userInput = '<a href="http://example." onclick="alert(\'boo!\')">Link<\/a>';
var span = document.createElement('span');
span.innerHTML = userInput;
span.setUserData('key', 10, {handle: function (n1, n2, n3, src) {
    src.getElementsByTagName('a')[0].click();
}});
var div = document.createElement('div');
div.appendChild(span);
span.cloneNode(); // Alerts "Boo!"    
var imprt = document.importNode(span, true); // Alerts "Boo!"
var adopt = document.adoptNode(span, true); // Alerts "Boo!"

...or during iteration...

var userInput = '<a href="http://example." onclick="alert(\'Boo!\');">Link</a>';
var span = document.createElement('span');
span.innerHTML = userInput;
var treeWalker = document.createTreeWalker(
  span,
  NodeFilter.SHOW_ELEMENT,
  { acceptNode: function(node) { node.click(); } },
  false
);
var nodeList = [];
while(treeWalker.nextNode()) nodeList.push(treeWalker.currentNode); // Alerts 'Boo!'

But without these kind of (unusual) event interactions, the fact of building into the DOM alone would not, as far as I have been able to detect, cause any side effects (and of course the examples above are contrived and one wouldn't expect to encounter them very often if at all!).

No script embedded in the HTML can execute until it is put in the document. Try running this code on any page:

var html = "<script>document.body.innerHTML = '';</script>";
var div = document.createElement('div');
div.innerHTML = html;

You will notice nothing change. If the "malicious" script in the HTML was run, then the document should have vanished. So, you can use the DOM to sanitize HTML without worrying about bad JS being in the HTML. As long as you snip out the script in your sanitizer of course.

By the way, your approach is pretty safe and smarter than what most people try (parse it with regex, the poor fools). However, it's best to rely on good, trusted HTML sanitizing libraries for this, like HTML Purifier. Or, if you want to do it client-side, you can use ESAPI-JS (remended by @Brett Zamir)

You can use a "sandboxed" iframe that won't execute anything.

var iframe = document.createElement('iframe');
iframe['sandbox'] = 'allow-same-origin';

From w3schools:

The sandbox attribute enables an extra set of restrictions for the content in the iframe. When the sandbox attribute is present, and it will:

block form submission

block script execution

disable APIs

...

P.S. That's, by the way, exactly how we do it in our Html Sanitizer https://github./jitbit/HtmlSanitizer - we use the browser to interpret HTML and convert it to DOM. Feel free to check the code (or actually use the ponent)

(disclaimer: I'm the contributor to that OSS project)

本文标签： Sanitizing html string with javascript using browser to interpret htmlStack Overflow

版权声明：本文标题：Sanitizing html string with javascript using browser to interpret html - Stack Overflow 内容由网友自发贡献，该文观点仅代表作者本人，转载请联系作者并注明出处：http://www.betaflare.com/web/1744719079a2621585.html，本站仅提供信息存储空间服务，不拥有所有权，不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容，一经查实，本站将立刻删除。

编程频道|软件玩家 - 软件改变生活！

Sanitizing html string with javascript using browser to interpret html - Stack Overflow

3 Answers 3

更多相关文章