admin管理员组

文章数量:1295303

I need to decode html in javascript. e.g.:

var str = 'apple & banana';
var strDecoded = htmlDecode(str); // I expect 'apple & banana'

There is no guarantee that the given str is already encoded and mon jquery and DOM tricks are XSS vulnerable:

var attackStr = '&amp;</textarea><img src=x onerror=alert(1)>&#x30cf;&#x30ed;&#x30fc;&#x30ef;&#x30fc;&#x30eb;&#x30c9;'; // if you see 1 alerted, it means it is XSS vulnerable
var strDecoded; // I wish to get: &</textarea><img src=x onerror=alert(1)>ハローワールド

strDecoded = $('<div/>').html(attackStr).text(); // vulnerable in all browsers

strDecoded = $('<textarea/>').html(attackStr).text(); // vulnerable in ie 9 and firefox


var dv = document.createElement('div');
dv.innerHTML = attackStr; // vulnerable in all browsers
strDecoded = dv.innerText;

var ta = document.createElement('textarea');
ta.innerHTML = attackStr; // vulnerable in ie 9 and firefox
strDecoded = ta.value;

Is there any XSS-safe way to html-decode?

I need to decode html in javascript. e.g.:

var str = 'apple &amp; banana';
var strDecoded = htmlDecode(str); // I expect 'apple & banana'

There is no guarantee that the given str is already encoded and mon jquery and DOM tricks are XSS vulnerable:

var attackStr = '&amp;</textarea><img src=x onerror=alert(1)>&#x30cf;&#x30ed;&#x30fc;&#x30ef;&#x30fc;&#x30eb;&#x30c9;'; // if you see 1 alerted, it means it is XSS vulnerable
var strDecoded; // I wish to get: &</textarea><img src=x onerror=alert(1)>ハローワールド

strDecoded = $('<div/>').html(attackStr).text(); // vulnerable in all browsers

strDecoded = $('<textarea/>').html(attackStr).text(); // vulnerable in ie 9 and firefox


var dv = document.createElement('div');
dv.innerHTML = attackStr; // vulnerable in all browsers
strDecoded = dv.innerText;

var ta = document.createElement('textarea');
ta.innerHTML = attackStr; // vulnerable in ie 9 and firefox
strDecoded = ta.value;

Is there any XSS-safe way to html-decode?

Share Improve this question edited Nov 9, 2014 at 12:50 daghan asked Nov 3, 2014 at 9:34 daghandaghan 1,02811 silver badges19 bronze badges 8
  • What is it that you are trying to acplish, exactly? The code that you show doesn't do HTML decoding at all, but HTML parsing. – Guffa Commented Nov 3, 2014 at 9:42
  • Use innerText or jQuery .text() method instead of innerHTML/.html() – Alex Commented Nov 3, 2014 at 9:46
  • hopefully clarified the question – daghan Commented Nov 3, 2014 at 10:11
  • @daghan, how are you Obtaining the string that might be malicious? That could point the way for a best Answer. – vernonner3voltazim Commented Nov 3, 2014 at 14:29
  • @vernonner3voltazim, it is user input which sometimes es encoded sometimes unencoded – daghan Commented Nov 3, 2014 at 15:17
 |  Show 3 more ments

6 Answers 6

Reset to default 5

Taking a mix of your code and the highest-voted (not the accepted) answer at HTML Entity Decode, how about this:

var decodeEntities = (function() {
  // this prevents any overhead from creating the object each time
  var element = document.createElement('textarea');

  function decodeHTMLEntities (str) {
    if(str && typeof str === 'string') {
      str = str.replace(/</g,"&lt;");
      str = str.replace(/>/g,"&gt;");
      element.innerHTML = str;
      str = element.textContent;
      element.textContent = '';
    }

    return str;
  }

  return decodeHTMLEntities;
})();

Fiddle here: http://jsfiddle/ursu67z6/

You could also have a look at https://github./mathiasbynens/he maybe. I haven't gone through it myself, but it might deal with some cases better. I expect that if you are only decoding rather than encoding, the dom-based approach is better.

DOMPurify is a DOM-only, super-fast, uber-tolerant XSS sanitizer for HTML, MathML and SVG. It's written in JavaScript and works in all modern browsers (Safari, Opera (15+), Internet Explorer (9+), Firefox and Chrome - as well as almost anything else using Blink or WebKit). It doesn't break on IE6 or other legacy browsers. It simply does nothing there.

DOMPurify is written by security people who have vast background in web attacks and XSS. Fear not.

I've tested and use DOMPurify and it's really good at sanitize untrusted data on client-side. Using is very simple.

Import the purify.js

<script type="text/javascript" src="purify.js"></script>

And call your untrusted variable.

var attackStr = '</textarea><img src=x onerror=alert(1)>'
var clean = DOMPurify.sanitize(attackStr );

Output will be like following.

<img src="x">

You can test your XSS payload at here https://cure53.de/purify

Source codes, examples and documentations are can be found over here ( https://github./cure53/DOMPurify )

If you want to safely display the content.

Use innerText or jQuery.text() method instead of innerHTML/.html()

You can use jQuery function like below, to encode or decode the input String

function htmlEncode(value){
  return $('<div/>').text(value).html();
}

function htmlDecode(value){
  return $('<div/>').html(value).text();
}

htmlDecode('&lt;b&gt;test&lt;/b&gt;')
// result "<b>test</b>"

htmlDecode('test')
// result "test"

In this code

  1. I'm actually creating a Div which is not actually present on the page
  2. Passing input string to the htmlDecode function
  3. jQuery automatically encode/decode the string
  4. Returning the new html/text

Hope this helps!

Here is a clean solution that does not imply to inject the HTML anywhere. Copy both these functions somewhere in your code: http://phpjs/functions/html_entity_decode/ and http://phpjs/functions/get_html_translation_table/

You'll have to remove "this" in "html_entity_decode" on line 26.

console.log( html_entity_decode('&amp;</textarea><img src=x onerror=alert(1)>') );
// &</textarea><img src=x onerror=alert(1)>

Cheers.

-- EDIT --

Your textarea trick looks good, did it cover all your use cases ?

The only other javascript solution I think about is to use a sandboxed, same-domain, iframe. It gives me good results but would only work in recent web browsers... I post the code in case.

function safeHtmlDecode(str, callback)
{
    var sameDomainBlankPage = document.location.href; // This should be a blank html page located on same domain
    $iframe = $('<iframe sandbox="allow-same-origin"/>').attr("src", sameDomainBlankPage);
    $iframe.on("load", function() {
        var body = $iframe.contents()[0].body;
        body.innerHTML = str;
        callback(body.innerText);
    });
    $("body").append($iframe);
}
$(document).ready(function(){
    var attackStr = '&amp;</textarea><img src=x onerror=alert(1)>&#x30cf;&#x30ed;&#x30fc;&#x30ef;&#x30fc;&#x30eb;&#x30c9;';
    safeHtmlDecode(attackStr, function(htmlString) {
        console.log( htmlString );
    });
});

The best I could get so far:

function htmlDecode(str){
    if(typeof str != "string") return str;
    str = str.replace(/</g,"&lt;");
    str = str.replace(/>/g,"&gt;");     
    var ta = document.createElement("textarea");
    ta.innerHTML = str;
    return ta.value;        
}

//test:
var attackStr = '&amp;</textarea><img src=x onerror=alert(1)>&#x30cf;&#x30ed;&#x30fc;&#x30ef;&#x30fc;&#x30eb;&#x30c9;';
alert(htmlDecode(attackStr)); // &</textarea><img src=x onerror=alert(1)>ハローワールド

本文标签: jqueryXSS safe html decode for JavascriptStack Overflow