admin管理员组

文章数量:1201814

At the moment I have a large JavaScript string I'm attempting to write to a file, but in a different encoding (ISO-8859-1). I was hoping to use something like downloadify. Downloadify only accepts normal JavaScript strings or base64 encoded strings.

Because of this, I've decided to compress my string using JSZip which generates a nicely base64 encoded string that can be passed to downloadify, and downloaded to my desktop. Huzzah! The issue is that the string I compressed, of course, is still the wrong encoding.

Luckily JSZip can take a Uint8Array as data, instead of a string. So is there any way to convert a JavaScript string into a ISO-8859-1 encoded string and store it in a Uint8Array?

Alternatively, if I'm approaching this all wrong, is there a better solution all together? Is there a fancy JavaScript string class that can use different internal encodings?

Edit: To clarify, I'm not pushing this string to a webpage so it won't automatically convert it for me. I'm doing something like this:

var zip = new JSZip();
zip.file("genSave.txt", result);

return zip.generate({compression:"DEFLATE"});

And for this to make sense, I would need result to be in the proper encoding (and JSZip only takes strings, arraybuffers, or uint8arrays).

Final Edit (This was -not- a duplicate question because the result wasn't being displayed in the browser or transmitted to a server where the encoding could be changed):

This turned out to be a little more obscure than I had thought, so I ended up rolling my own solution. It's not nearly as robust as a proper solution would be, but it'll convert a JavaScript string into windows-1252 encoding, and stick it in a Uint8Array:

var enc = new string_transcoder("windows-1252");
var tenc = enc.transcode(result); //This is now a Uint8Array

You can then either use it in the array like I did:

//Make this into a zip
var zip = new JSZip();   
zip.file("genSave.txt", tenc);   
return zip.generate({compression:"DEFLATE"});

Or convert it into a windows-1252 encoded string using this string encoding library:

var string = TextDecoder("windows-1252").decode(tenc);

To use this function, either use:

<script src="//www.eu4editor/string_transcoder.js"></script>

Or include this:

function string_transcoder (target) {

    this.encodeList = encodings[target];
    if (this.encodeList === undefined) {
        return undefined;
    }

    //Initialize the easy encodings
    if (target === "windows-1252") {
        var i;
        for (i = 0x0; i <= 0x7F; i++) {
            this.encodeList[i] = i;          
        }
        for (i = 0xA0; i <= 0xFF; i++) {
            this.encodeList[i] = i;
        }
    }

}

string_transcoder.prototype.transcode = function (inString) {


    var res = new Uint8Array(inString.length), i;


    for (i = 0; i < inString.length; i++) {
        var temp = inString.charCodeAt(i);
        var tempEncode = (this.encodeList)[temp];
        if (tempEncode === undefined) {
            return undefined; //This encoding is messed up
        } else {
            res[i] = tempEncode;
        }
    }

    return res;
};

encodings = {

    "windows-1252": {0x20AC:0x80, 0x201A:0x82, 0x0192:0x83, 0x201E:0x84, 0x2026:0x85, 0x2020:0x86, 0x2021:0x87, 0x02C6:0x88, 0x2030:0x89, 0x0160:0x8A, 0x2039:0x8B, 0x0152:0x8C, 0x017D:0x8E, 0x2018:0x91, 0x2019:0x92, 0x201C:0x93, 0x201D:0x94, 0x2022:0x95, 0x2013:0x96, 0x2014:0x97, 0x02DC:0x98, 0x2122:0x99, 0x0161:0x9A, 0x203A:0x9B, 0x0153:0x9C, 0x017E:0x9E, 0x0178:0x9F}     

};

At the moment I have a large JavaScript string I'm attempting to write to a file, but in a different encoding (ISO-8859-1). I was hoping to use something like downloadify. Downloadify only accepts normal JavaScript strings or base64 encoded strings.

Because of this, I've decided to compress my string using JSZip which generates a nicely base64 encoded string that can be passed to downloadify, and downloaded to my desktop. Huzzah! The issue is that the string I compressed, of course, is still the wrong encoding.

Luckily JSZip can take a Uint8Array as data, instead of a string. So is there any way to convert a JavaScript string into a ISO-8859-1 encoded string and store it in a Uint8Array?

Alternatively, if I'm approaching this all wrong, is there a better solution all together? Is there a fancy JavaScript string class that can use different internal encodings?

Edit: To clarify, I'm not pushing this string to a webpage so it won't automatically convert it for me. I'm doing something like this:

var zip = new JSZip();
zip.file("genSave.txt", result);

return zip.generate({compression:"DEFLATE"});

And for this to make sense, I would need result to be in the proper encoding (and JSZip only takes strings, arraybuffers, or uint8arrays).

Final Edit (This was -not- a duplicate question because the result wasn't being displayed in the browser or transmitted to a server where the encoding could be changed):

This turned out to be a little more obscure than I had thought, so I ended up rolling my own solution. It's not nearly as robust as a proper solution would be, but it'll convert a JavaScript string into windows-1252 encoding, and stick it in a Uint8Array:

var enc = new string_transcoder("windows-1252");
var tenc = enc.transcode(result); //This is now a Uint8Array

You can then either use it in the array like I did:

//Make this into a zip
var zip = new JSZip();   
zip.file("genSave.txt", tenc);   
return zip.generate({compression:"DEFLATE"});

Or convert it into a windows-1252 encoded string using this string encoding library:

var string = TextDecoder("windows-1252").decode(tenc);

To use this function, either use:

<script src="//www.eu4editor.com/string_transcoder.js"></script>

Or include this:

function string_transcoder (target) {

    this.encodeList = encodings[target];
    if (this.encodeList === undefined) {
        return undefined;
    }

    //Initialize the easy encodings
    if (target === "windows-1252") {
        var i;
        for (i = 0x0; i <= 0x7F; i++) {
            this.encodeList[i] = i;          
        }
        for (i = 0xA0; i <= 0xFF; i++) {
            this.encodeList[i] = i;
        }
    }

}

string_transcoder.prototype.transcode = function (inString) {


    var res = new Uint8Array(inString.length), i;


    for (i = 0; i < inString.length; i++) {
        var temp = inString.charCodeAt(i);
        var tempEncode = (this.encodeList)[temp];
        if (tempEncode === undefined) {
            return undefined; //This encoding is messed up
        } else {
            res[i] = tempEncode;
        }
    }

    return res;
};

encodings = {

    "windows-1252": {0x20AC:0x80, 0x201A:0x82, 0x0192:0x83, 0x201E:0x84, 0x2026:0x85, 0x2020:0x86, 0x2021:0x87, 0x02C6:0x88, 0x2030:0x89, 0x0160:0x8A, 0x2039:0x8B, 0x0152:0x8C, 0x017D:0x8E, 0x2018:0x91, 0x2019:0x92, 0x201C:0x93, 0x201D:0x94, 0x2022:0x95, 0x2013:0x96, 0x2014:0x97, 0x02DC:0x98, 0x2122:0x99, 0x0161:0x9A, 0x203A:0x9B, 0x0153:0x9C, 0x017E:0x9E, 0x0178:0x9F}     

};
Share Improve this question edited Sep 18, 2013 at 21:51 David asked Sep 18, 2013 at 18:39 DavidDavid 5531 gold badge5 silver badges12 bronze badges 6
  • Wouldn't something like utfstring = unescape(encodeURIComponent(originalstring)); work? – Joren Commented Sep 18, 2013 at 18:44
  • 1 Unfortunately not. My goal is to see 'Île' when viewing the final file as ISO-8859-1. When writing the file normally it writes as UCS-2 which results in 'ÃŽle' when viewed as ISO-8859-1. When using your method, it results in 'Île'. This is not the same issue as the proposed duplicate as I'm not asking the browser to display this, and thus changing the HTML5 meta tag won't solve the issue. – David Commented Sep 18, 2013 at 19:05
  • Did you answer your own question? or am I missing something? – Enigmadan Commented Sep 18, 2013 at 22:31
  • Yeah, I did. It was incorrectly closed as a duplicate and I didn't want to leave it hanging there unanswered. – David Commented Sep 18, 2013 at 23:08
  • 6 @David: If that edit was answer, please rollback it and it post it as a self-answer (which you can accept then) – Bergi Commented Sep 18, 2013 at 23:55
 |  Show 1 more comment

3 Answers 3

Reset to default 8

This turned out to be a little more obscure than [the author] had thought, so [the author] ended up rolling [his] own solution. It's not nearly as robust as a proper solution would be, but it'll convert a JavaScript string into windows-1252 encoding, and stick it in a Uint8Array:

var enc = new string_transcoder("windows-1252");
var tenc = enc.transcode(result); //This is now a Uint8Array

You can then either use it in the array like [the author] did:

//Make this into a zip
var zip = new JSZip();   
zip.file("genSave.txt", tenc);   
return zip.generate({compression:"DEFLATE"});

Or convert it into a windows-1252 encoded string using this string encoding library:

var string = TextDecoder("windows-1252").decode(tenc);

To use this function, either use:

<script src="//www.eu4editor.com/string_transcoder.js"></script>

Or include this:

function string_transcoder (target) {

    this.encodeList = encodings[target];
    if (this.encodeList === undefined) {
        return undefined;
    }

    //Initialize the easy encodings
    if (target === "windows-1252") {
        var i;
        for (i = 0x0; i <= 0x7F; i++) {
            this.encodeList[i] = i;          
        }
        for (i = 0xA0; i <= 0xFF; i++) {
            this.encodeList[i] = i;
        }
    }

}

string_transcoder.prototype.transcode = function (inString) {


    var res = new Uint8Array(inString.length), i;


    for (i = 0; i < inString.length; i++) {
        var temp = inString.charCodeAt(i);
        var tempEncode = (this.encodeList)[temp];
        if (tempEncode === undefined) {
            return undefined; //This encoding is messed up
        } else {
            res[i] = tempEncode;
        }
    }

    return res;
};

encodings = {

    "windows-1252": {0x20AC:0x80, 0x201A:0x82, 0x0192:0x83, 0x201E:0x84, 0x2026:0x85, 0x2020:0x86, 0x2021:0x87, 0x02C6:0x88, 0x2030:0x89, 0x0160:0x8A, 0x2039:0x8B, 0x0152:0x8C, 0x017D:0x8E, 0x2018:0x91, 0x2019:0x92, 0x201C:0x93, 0x201D:0x94, 0x2022:0x95, 0x2013:0x96, 0x2014:0x97, 0x02DC:0x98, 0x2122:0x99, 0x0161:0x9A, 0x203A:0x9B, 0x0153:0x9C, 0x017E:0x9E, 0x0178:0x9F}     

};

Test the following script:

<script type="text/javascript" charset="utf-8">

The best solution for me was posted here and this is my one-liner:

<!-- Required for non-UTF encodings (quite big) -->
<script src="encoding-indexes.js"></script>

<script src="encoding.js"></script>
...
// windows-1252 is just one typical example encoding/transcoding
let transcodedString = new TextDecoder( 'windows-1252' ).decode( 
                         new TextEncoder().encode( someUtf8String ))

or this if the transcoding has to be applied on multiple inputs reusing the encoder and decoder:

let srcArr = [ ... ]  // some UTF-8 string array
let encoder = new TextEncoder()
let decoder = new TextDecoder( 'windows-1252' )
let transcodedArr = srcArr.forEach( (s,i) => { 
                      srcArr[i] = decoder.decode( encoder.encode( s )) })

(The slightly modified other answer from related question:)

This is what I found after a more specific Google search than just UTF-8 encode/decode. so for those who are looking for a converting library to convert between encodings, here you go.

github.com/inexorabletash/text-encoding

var uint8array = new TextEncoder().encode(str);
var str = new TextDecoder(encoding).decode(uint8array);

Paste from repo readme

All encodings from the Encoding specification are supported:

utf-8 ibm866 iso-8859-2 iso-8859-3 iso-8859-4 iso-8859-5 iso-8859-6 
iso-8859-7 iso-8859-8 iso-8859-8-i iso-8859-10 iso-8859-13 iso-8859-14 
iso-8859-15 iso-8859-16 koi8-r koi8-u macintosh windows-874 windows-1250 
windows-1251 windows-1252 windows-1253 windows-1254 windows-1255 
windows-1256 windows-1257 windows-1258 x-mac-cyrillic gb18030 hz-gb-2312 
big5 euc-jp iso-2022-jp shift_jis euc-kr replacement utf-16be utf-16le 
x-user-defined

(Some encodings may be supported under other names, e.g. ascii, iso-8859-1, etc. See Encoding for additional labels for each encoding.)

本文标签: jqueryChange JavaScript string encodingStack Overflow