admin管理员组

文章数量:1325155

I'm planning to use a client-side AES encryption for my web-app.

Right now, I've been looking for ways to break multibyte characters into one byte-'non-characters' ,encrypt (to have the same encrypted text length),

de-crypt them back, convert those one-byte 'non-characters' back to multibyte characters.

I've seen the wiki for UTF-8 (the supposedly-default encoding for JS?) and UTF-16, but I can't figure out how to detect "fragmented" multibyte characters and how I can bine them back.

Thanks : )

I'm planning to use a client-side AES encryption for my web-app.

Right now, I've been looking for ways to break multibyte characters into one byte-'non-characters' ,encrypt (to have the same encrypted text length),

de-crypt them back, convert those one-byte 'non-characters' back to multibyte characters.

I've seen the wiki for UTF-8 (the supposedly-default encoding for JS?) and UTF-16, but I can't figure out how to detect "fragmented" multibyte characters and how I can bine them back.

Thanks : )

Share Improve this question asked Aug 5, 2013 at 11:54 user1894397user1894397 3432 gold badges4 silver badges9 bronze badges 5
  • 1 I'm planning to use a client-side AES encryption for my web-app. -- why? Is HTTPS not applicable? – Halcyon Commented Aug 5, 2013 at 11:55
  • Are you sure your AES library doesn't already have some methods to convert strings to/from UTF8? Which library are you using? – xanatos Commented Aug 5, 2013 at 12:09
  • @FritsvanCampen I'm doing some experiment here - not anything production, but something like a demo page – user1894397 Commented Aug 6, 2013 at 9:00
  • @xanatos I'm using cryptoJS, but can't figure out what encoding it's using & etc. – user1894397 Commented Aug 6, 2013 at 9:00
  • @xanatos updates response, added jsfiddle example – xanatos Commented Aug 6, 2013 at 9:24
Add a ment  | 

2 Answers 2

Reset to default 8

JavaScript strings are UTF-16 stored in 16-bit "characters". For Unicode characters ("code points") that require more than 16 bits (some code points require 32 bits in UTF-16), each JavaScript "character" is actually only half of the code point.

So to "break" a JavaScript character into bytes, you just take the character code and split off the high byte and the low byte:

var code = str.charCodeAt(0); // The first character, obviously you'll have a loop
var lowbyte = code & 0xFF;
var highbyte = (code & 0xFF00) >> 8;

(Even though JavaScript's numbers are floating point, the bitwise operators work in terms of 32-bit integers, and of course in our case only 16 of those bits are relevant.)

You'll never have an odd number of bytes, because again this is UTF-16.

You could simply convert to UTF8... For example by using this trick

function encode_utf8(s) {
  return unescape(encodeURIComponent(s));
}

function decode_utf8(s) {
  return decodeURIComponent(escape(s));
}

Considering you are using crypto-js, you can use its methods to convert to utf8 and return to string. See here:

var words = CryptoJS.enc.Utf8.parse('

本文标签: unicodeJavascript encoding breaking amp combining multibyte charactersStack Overflow