admin管理员组

文章数量:1200755

I'm using JavaScript to set the value of an input with text that may contain HTML specific chars such a &   etc. So, I'm trying to find one regex that will match these values and replace them with the appropriate value ("&", " ") respectively, only I can't figure out the regex to do it.

Here's my attempt:

Make an object that contains the matches and reference to the replacement value:

var specialChars = {
  " " : " ",
  "&"  : "&",
  ">"   : ">",
  "&amp;lt;"   : "<"
}

Then, I want to match my string

var stringToMatch = "This string has special chars &amp;amp; and &amp;nbsp;"

I tried something like

stringToMatch.replace(/(&amp;nbsp;|&amp;)/g,specialChars["$1"]);

but it doesn't work. I don't really understand how to capture the special tag and replace it. Any help is greatly appreciated.

I'm using JavaScript to set the value of an input with text that may contain HTML specific chars such a &amp; &nbsp; etc. So, I'm trying to find one regex that will match these values and replace them with the appropriate value ("&", " ") respectively, only I can't figure out the regex to do it.

Here's my attempt:

Make an object that contains the matches and reference to the replacement value:

var specialChars = {
  "&amp;nbsp;" : " ",
  "&amp;amp;"  : "&",
  "&amp;gt;"   : ">",
  "&amp;lt;"   : "<"
}

Then, I want to match my string

var stringToMatch = "This string has special chars &amp;amp; and &amp;nbsp;"

I tried something like

stringToMatch.replace(/(&amp;nbsp;|&amp;)/g,specialChars["$1"]);

but it doesn't work. I don't really understand how to capture the special tag and replace it. Any help is greatly appreciated.

Share Improve this question edited May 15, 2018 at 2:16 wp78de 19k7 gold badges46 silver badges77 bronze badges asked Aug 4, 2009 at 19:41 bradbrad 32.3k29 gold badges104 silver badges158 bronze badges 3
  • Perhaps "&amp;nbsp;" will show your &nbsp;? – lance Commented Aug 4, 2009 at 19:43
  • Why not use escaping? w3schools.com/jsref/jsref_escape.asp – Joel Commented Aug 4, 2009 at 19:46
  • escape would turn &amp; into %26amp%3B. Definitely not what I"m looking for – brad Commented Aug 4, 2009 at 20:36
Add a comment  | 

5 Answers 5

Reset to default 18

I think you can use the functions from a question on a slightly different subject (Efficiently replace all accented characters in a string?).

Jason Bunting's answer has some nice ideas + the necessary explanation, here is his solution with some modifications to get you started (if you find this helpful, upvote his original answer as well, as this is his code, essentially).

var replaceHtmlEntites = (function() {
    var translate_re = /&(nbsp|amp|quot|lt|gt);/g,
        translate = {
            'nbsp': String.fromCharCode(160), 
            'amp' : '&', 
            'quot': '"',
            'lt'  : '<', 
            'gt'  : '>'
        },
        translator = function($0, $1) { 
            return translate[$1]; 
        };

    return function(s) {
        return s.replace(translate_re, translator);
    };
})();

callable as

var stringToMatch = "This string has special chars &amp; and &amp;nbsp;";
var stringOutput  = replaceHtmlEntites(stringToMatch);

Numbered entites are even easier, you can replace them much more generically using a little math and String.fromCharCode().


Another, much simpler possibility would be like this (works in any browser)

function replaceHtmlEntites(string) {
    var div = document.createElement("div");
    div.innerHTML = string;
    return div.textContent || div.innerText;
}

replaceHtmlEntites("This string has special chars &lt; &amp; &gt;");
// -> "This string has special chars < & >"

Another way would be creating a div object

var tmp = document.createElement("div");

Then assigning the text to its innerHTML

tmp.innerHTML = mySpecialString;

And finally reading the element's text content

var output = tmp.textContent || tmp.innerText //for IE compatibility

And there you go...

You can use a function based replacement to do what you want to do:

var myString = '&'+'nbsp;&'+'nbsp;&tab;&copy;';
myString.replace(/&\w+?;/g, function( e ) {
    switch(e) {
        case '&nbsp;': 
            return ' ';
        case '&tab;': 
            return '\t';
        case '&copy;': 
            return String.fromCharCode(169);
        default: 
            return e;
    }
});

However, I do urge you to consider your situation. If you're receiving &nbsp; and &copy; and other HTML entities in your text values, do you really want to replace them? Should you be converting them afterwards?

Just something to keep in mind.

Cheers!

A modern variation that doesn't use painful switch/case statements:

const toEscape = `<code> 'x' & "y" </code> <\code>`

toEscape.replace(
  /[&"'<>]/g,
  (char) => ({
      "&": '&amp;',
      "\"": '&quot;',
      "'": '&#39;',
      "<": '&lt;',
      ">": '&gt;',
    })[char]
)

Or, since this really should be turned into a function:

const encodeHTML = function(str) {
    const charsToEncode = /[&"'<>]/g
    const encodeTo = {
      "&": '&amp;',
      "\"": '&quot;',
      "'": '&#39;',
      "<": '&lt;',
      ">": '&gt;',
    }
    return str.replace(charsToEncode, char => encodeTo[char])
}

(This list of characters is chosen based on the list of XML-escape-char-codes available on wikipedia.)

a more better approach for replace any HTML tags & HTML special characters would be to just replace these with REGEX

str.replace(/<[^>]*>/g, '').replace(/[^\w\s]/gi, '')

本文标签: javascript regex replace html charsStack Overflow