unicode - Javascript encoding breaking & combining multibyte characters? - Stack Overflow

IT技术

更新时间：2025-03-171

admin管理员组
文章数量:1325155

I'm planning to use a client-side AES encryption for my web-app.

Right now, I've been looking for ways to break multibyte characters into one byte-'non-characters' ,encrypt (to have the same encrypted text length),

de-crypt them back, convert those one-byte 'non-characters' back to multibyte characters.

I've seen the wiki for UTF-8 (the supposedly-default encoding for JS?) and UTF-16, but I can't figure out how to detect "fragmented" multibyte characters and how I can bine them back.

Thanks : )

I'm planning to use a client-side AES encryption for my web-app.

Right now, I've been looking for ways to break multibyte characters into one byte-'non-characters' ,encrypt (to have the same encrypted text length),

de-crypt them back, convert those one-byte 'non-characters' back to multibyte characters.

I've seen the wiki for UTF-8 (the supposedly-default encoding for JS?) and UTF-16, but I can't figure out how to detect "fragmented" multibyte characters and how I can bine them back.

Thanks : )

Share Improve this question asked Aug 5, 2013 at 11:54 user1894397 3432 gold badges4 silver badges9 bronze badges

1 I'm planning to use a client-side AES encryption for my web-app. -- why? Is HTTPS not applicable? – Halcyon Commented Aug 5, 2013 at 11:55
Are you sure your AES library doesn't already have some methods to convert strings to/from UTF8? Which library are you using? – xanatos Commented Aug 5, 2013 at 12:09
@FritsvanCampen I'm doing some experiment here - not anything production, but something like a demo page – user1894397 Commented Aug 6, 2013 at 9:00
@xanatos I'm using cryptoJS, but can't figure out what encoding it's using & etc. – user1894397 Commented Aug 6, 2013 at 9:00
@xanatos updates response, added jsfiddle example – xanatos Commented Aug 6, 2013 at 9:24

Add a ment |

2 Answers 2

Sorted by: Reset to default 8

JavaScript strings are UTF-16 stored in 16-bit "characters". For Unicode characters ("code points") that require more than 16 bits (some code points require 32 bits in UTF-16), each JavaScript "character" is actually only half of the code point.

So to "break" a JavaScript character into bytes, you just take the character code and split off the high byte and the low byte:

var code = str.charCodeAt(0); // The first character, obviously you'll have a loop
var lowbyte = code & 0xFF;
var highbyte = (code & 0xFF00) >> 8;

(Even though JavaScript's numbers are floating point, the bitwise operators work in terms of 32-bit integers, and of course in our case only 16 of those bits are relevant.)

You'll never have an odd number of bytes, because again this is UTF-16.

You could simply convert to UTF8... For example by using this trick

function encode_utf8(s) {
  return unescape(encodeURIComponent(s));
}

function decode_utf8(s) {
  return decodeURIComponent(escape(s));
}

Considering you are using crypto-js, you can use its methods to convert to utf8 and return to string. See here:

var words = CryptoJS.enc.Utf8.parse('
                本文标签：
                unicodeJavascript encoding breaking amp combining multibyte charactersStack Overflow

                        版权声明：本文标题：unicode - Javascript encoding breaking &amp; combining multibyte characters? - Stack Overflow 内容由网友自发贡献，该文观点仅代表作者本人，
                        转载请联系作者并注明出处：http://www.betaflare.com/web/1742152747a2423499.html，
                        本站仅提供信息存储空间服务，不拥有所有权，不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容，一经查实，本站将立刻删除。

`更多相关文章`

unicode - Javascript encoding breaking &amp; combining multibyte characters? - Stack OverflowIT技术
12小时前
I'm planning to use a client-side AES encryption for my web-app.Right now, I've been looking

编程频道|软件玩家 - 软件改变生活！

unicode - Javascript encoding breaking & combining multibyte characters? - Stack Overflow

2 Answers 2

`更多相关文章`