How do I check equality of Unicode strings in Javascript? - Stack Overflow

IT技术

更新时间：2025-04-093

admin管理员组
文章数量:1399944

I have two strings in Javascript: "_strange_chars_µö¬é@zendesk.eml" (f1) and "_strange_chars_µö¬é@zendesk.eml" (f2). At first glance, they look identical (and, indeed, on StackOverflow, they may be; I'm not sure what happens when they are pasted into a form like this.) In my application, however,

f1[16] // ö
f2[16] // o
f1[17] // ¬
f2[17] // ̈

That is, where f1 uses the ö character, f2 uses an o and a diacritic ¨ as a separate character. What parison can I do that will show these two strings to be "equal"?

I have two strings in Javascript: "_strange_chars_µö¬é@zendesk..eml" (f1) and "_strange_chars_µö¬é@zendesk..eml" (f2). At first glance, they look identical (and, indeed, on StackOverflow, they may be; I'm not sure what happens when they are pasted into a form like this.) In my application, however,

f1[16] // ö
f2[16] // o
f1[17] // ¬
f2[17] // ̈

That is, where f1 uses the ö character, f2 uses an o and a diacritic ¨ as a separate character. What parison can I do that will show these two strings to be "equal"?

Share Improve this question edited Sep 19, 2015 at 3:59 Deduplicator 45.8k7 gold badges72 silver badges123 bronze badges asked Aug 17, 2011 at 18:49 James A. Rosen 65.3k62 gold badges184 silver badges263 bronze badges

4 One solution -- perhaps the only one -- would be to "canonicalize" (in the Unicode sense) the two strings, but I haven't been able find a library or function for that yet. – James A. Rosen Commented Aug 17, 2011 at 18:53
1 Are you sure that you have declared UTF-8 in your meta tags? – cwallenpoole Commented Aug 17, 2011 at 18:56
Great question, @cwallenpoole. I'm not, but I'll double-check now. The two strings I've described definitely can both be valid Unicode, but I'm not certain they are. – James A. Rosen Commented Aug 17, 2011 at 19:02
@cwallenpoole the page declares <meta charset="utf-8"> and the form (a file input is the source of the first string) declares accept-charset="UTF-8". And, of course, the HTTP request and response are also UTF-8. I think this is just a case of different systems (browser vs. server) using different Unicode canonicalization. (Or using versus not using canonicalization.) – James A. Rosen Commented Aug 17, 2011 at 19:13

Add a ment |

1 Answer 1

Sorted by: Reset to default 8

f1 uses the ö character, f2 uses an o and a diacritic ¨ as a separate character.

f1 is in Normal Form C (posed) and f2 in Normal Form D (deposed). In general Normal Form C is the most mon on Windows and the web, with the Unicode FAQ describing it as “the best form for general text”. Unfortunately the Apple world plumped for Normal Form D in order to be gratuitously different.

The strings are canonically equivalent by the rules of Unicode equivalence.

What parison can I do that will show these two strings to be "equal"?

In general, you convert both strings to one Normal Form of your choosing and then pare them. For example in Python:

>>> import unicodedata
>>> a= u'\u00F6'  # ö posed
>>> b= u'o\u0308' # o then bining umlaut
>>> unicodedata.normalize('NFC', a)==unicodedata.normalize('NFC', b)
True

Similarly Java has the Normalizer class, .NET has String.Normalize, and may languages have bindings available to the ICU library which also offers this feature.

Unfortunately, JavaScript has no native Unicode normalisation ability. This means either:

doing it yourself, carting around large Unicode data tables to cover it all in JavaScript (see eg here for an example implementation); or
sending it back to the server-side (eg via XMLHttpRequest), where you've got a better-equipped language to do it.

本文标签： How do I check equality of Unicode strings in JavascriptStack Overflow

版权声明：本文标题：How do I check equality of Unicode strings in Javascript? - Stack Overflow 内容由网友自发贡献，该文观点仅代表作者本人，转载请联系作者并注明出处：http://www.betaflare.com/web/1744140631a2592601.html，本站仅提供信息存储空间服务，不拥有所有权，不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容，一经查实，本站将立刻删除。

编程频道|软件玩家 - 软件改变生活！

How do I check equality of Unicode strings in Javascript? - Stack Overflow

1 Answer 1

更多相关文章