admin管理员组

文章数量:1389809

I have a page with UTF-8 header:

<meta charset="utf-8" />

And in the page I use the umbraco dictionary to fetch content in various languages. When I print this in German on the page it appears fine:

<h1>@library.GetDictionaryItem("A")</h1>

resolves to:

<h1>Ä</h1> in German

However if I enter it via a script:

<script type="text/javascript" charset="utf-8">
  var a = "@library.GetDictionaryItem("A")";
  alert(a);
</script>

The alert prints:

&#228;

If I do

<script type="text/javascript" charset="utf-8">
  var a = "Ä";
  alert(a);
</script>

The alert prints:

Ä

So what could explain this behaviour and how can I fix the alert? As far as I can see everything is UTF-8 and the dictionary and the page encoding is fine. The problem happens within Javascript.

From what I can see from the table here, Javascript resolves the character into it's Numeric value. I used "escape, encodeUrl, decodeUrl" etc with no luck.

chr  HexCode  Numeric   HTML entity     escape(chr)  encodeURI(chr) 

ä    \xE4     &#228;    &auml;          %E4          %C3%A4 

I have a page with UTF-8 header:

<meta charset="utf-8" />

And in the page I use the umbraco dictionary to fetch content in various languages. When I print this in German on the page it appears fine:

<h1>@library.GetDictionaryItem("A")</h1>

resolves to:

<h1>Ä</h1> in German

However if I enter it via a script:

<script type="text/javascript" charset="utf-8">
  var a = "@library.GetDictionaryItem("A")";
  alert(a);
</script>

The alert prints:

&#228;

If I do

<script type="text/javascript" charset="utf-8">
  var a = "Ä";
  alert(a);
</script>

The alert prints:

Ä

So what could explain this behaviour and how can I fix the alert? As far as I can see everything is UTF-8 and the dictionary and the page encoding is fine. The problem happens within Javascript.

From what I can see from the table here, Javascript resolves the character into it's Numeric value. I used "escape, encodeUrl, decodeUrl" etc with no luck.

chr  HexCode  Numeric   HTML entity     escape(chr)  encodeURI(chr) 

ä    \xE4     &#228;    &auml;          %E4          %C3%A4 
Share Improve this question asked Apr 1, 2014 at 10:51 NickNick 2,9072 gold badges38 silver badges65 bronze badges 5
  • "Javascript resolves the character into its Numeric value" — No. The character reference will be generated by your server side code. It looks OK when you output it as HTML because the character reference has special meaning in HTML (but doesn't in JavaScript). – Quentin Commented Apr 1, 2014 at 10:54
  • Okay so how can I get the alert to behave as if I typed it in? – Nick Commented Apr 1, 2014 at 10:56
  • That's (presumably) a umbraco dictionary problem and I've never heard of that before. – Quentin Commented Apr 1, 2014 at 10:58
  • 1 This has nothing to do with character encoding. – T.J. Crowder Commented Apr 1, 2014 at 10:58
  • It could very well be an umbraco issue, but the dictionary works well everywhere in all languages and the alert also works well even in Japanese, Chinese, Arabic etc. The problem is only with accented characters within the javascript – Nick Commented Apr 1, 2014 at 11:02
Add a ment  | 

1 Answer 1

Reset to default 3

(FWIW: Character entity &#228; is ä, not Ä.)

This has nothing to do with character encoding. You're outputting an HTML entity to a JavaScript string, and then asking the browser to display that JavaScript string without doing anything to interpret HTML (via alert). It's exactly as though you actually typed:

<h1>&#228;</h1>

...(which will show ä on the page), and

<script>
var a = "&#228;";
alert(a);
</script>

...which won't. The HTML entity isn't being used anywhere that understands HTML entities. alert doesn't interpret HTML.

But if you did this:

<script>
var a = "&#228;";
var div = document.createElement('div');
div.innerHTML = a;
document.body.appendChild(div);
</script>

...you'd see the character on the page, because we're giving the entity to something (innerHTML) that will interpret HTML. And so if you make that first line:

var a = "@library.GetDictionaryItem("A")";

...and then use a in an HTML context (as above), you'll get the ä in the document.

If you always get a decimal numeric character entity (like &#228;) from Umbraco, since those define unicode code points and JavaScript (mostly) uses unicode code points in its strings*, you can parse the entity easily enough:

function characterFromDecimalNumericEntity(str) {
    var decNumEntRex = /^\&#(\d+);$/;
    var match = decNumEntRex.exec(str);
    var codepoint = match ? parseInt(match[1], 10) : null;
    var character = codepoint ? String.fromCharCode(codepoint) : null;
    return character;
}
alert(characterFromDecimalNumericEntity("&#228;")); // ä

Live Example

* Why "mostly": JavaScript strings are made up of 16-bit "characters" that correspond to UTF-16 code units, not Unicode code points (you can't store a Unicode code point in 16 bits, you need 21). All characters from the Basic Multilingual Plane fit within one UTF-16 code unit, but characters from the Supplementary Multilingual Plane, Supplementary Ideographic Plane, and so on require two UTF-16 code units for a character. One of those characters will occupy two "characters" in a JavaScript string. The function above would fail for them. More in the JavaScript spec and the Unicode FAQ.

本文标签: utf 8javascript encoding issue with accented charactersStack Overflow