admin管理员组文章数量:1325155
I'm planning to use a client-side AES encryption for my web-app.
Right now, I've been looking for ways to break multibyte characters into one byte-'non-characters' ,encrypt (to have the same encrypted text length),
de-crypt them back, convert those one-byte 'non-characters' back to multibyte characters.
I've seen the wiki for UTF-8 (the supposedly-default encoding for JS?) and UTF-16, but I can't figure out how to detect "fragmented" multibyte characters and how I can bine them back.
Thanks : )
I'm planning to use a client-side AES encryption for my web-app.
Right now, I've been looking for ways to break multibyte characters into one byte-'non-characters' ,encrypt (to have the same encrypted text length),
de-crypt them back, convert those one-byte 'non-characters' back to multibyte characters.
I've seen the wiki for UTF-8 (the supposedly-default encoding for JS?) and UTF-16, but I can't figure out how to detect "fragmented" multibyte characters and how I can bine them back.
Thanks : )
Share Improve this question asked Aug 5, 2013 at 11:54 user1894397user1894397 3432 gold badges4 silver badges9 bronze badges 5- 1 I'm planning to use a client-side AES encryption for my web-app. -- why? Is HTTPS not applicable? – Halcyon Commented Aug 5, 2013 at 11:55
- Are you sure your AES library doesn't already have some methods to convert strings to/from UTF8? Which library are you using? – xanatos Commented Aug 5, 2013 at 12:09
- @FritsvanCampen I'm doing some experiment here - not anything production, but something like a demo page – user1894397 Commented Aug 6, 2013 at 9:00
- @xanatos I'm using cryptoJS, but can't figure out what encoding it's using & etc. – user1894397 Commented Aug 6, 2013 at 9:00
- @xanatos updates response, added jsfiddle example – xanatos Commented Aug 6, 2013 at 9:24
2 Answers
Reset to default 8JavaScript strings are UTF-16 stored in 16-bit "characters". For Unicode characters ("code points") that require more than 16 bits (some code points require 32 bits in UTF-16), each JavaScript "character" is actually only half of the code point.
So to "break" a JavaScript character into bytes, you just take the character code and split off the high byte and the low byte:
var code = str.charCodeAt(0); // The first character, obviously you'll have a loop
var lowbyte = code & 0xFF;
var highbyte = (code & 0xFF00) >> 8;
(Even though JavaScript's numbers are floating point, the bitwise operators work in terms of 32-bit integers, and of course in our case only 16 of those bits are relevant.)
You'll never have an odd number of bytes, because again this is UTF-16.
You could simply convert to UTF8... For example by using this trick
function encode_utf8(s) {
return unescape(encodeURIComponent(s));
}
function decode_utf8(s) {
return decodeURIComponent(escape(s));
}
Considering you are using crypto-js, you can use its methods to convert to utf8 and return to string. See here:
var words = CryptoJS.enc.Utf8.parse('
本文标签:
unicodeJavascript encoding breaking amp combining multibyte charactersStack Overflow
版权声明:本文标题:unicode - Javascript encoding breaking & combining multibyte characters? - Stack Overflow 内容由网友自发贡献,该文观点仅代表作者本人,
转载请联系作者并注明出处:http://www.betaflare.com/web/1742152747a2423499.html,
本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容,一经查实,本站将立刻删除。
发表评论