admin管理员组

文章数量:1289529

*"Efficient" here basically means in terms of smaller size (to reduce the IO waiting time), and speedy retrieval/deserialization times. Storing times are not as important.

I have to store a couple of dozen arrays of integers, each with 1800 values in the range 0-50, in the browser's localStorage -- that is, as a string.

Obviously, the simplest method is to just JSON.stringify it, however, that adds a lot of unnecessary information, considering that the ranges of the data is well known. An average size for one of these arrays is then ~5500 bytes.

Here are some other methods I've tried (resultant size, and time to deserialize it 1000 times at the end)

  • zero-padding the numbers so each was 2 characters long, eg:

    [5, 27, 7, 38] ==> "05270738"
    
  • base 50 encoding it:

    [5, 11, 7, 38] ==> "5b7C"
    
  • just using the value as a character code (adding 32 to avoid the weird control characters at the start):

    [5, 11, 7, 38] ==> "%+'F" (String.fromCharCode(37), String.fromCharCode(43) ...)
    

Here are my results:

                  size     Chrome 18   Firefox 11
-------------------------------------------------
JSON.stringify    5286          60ms         99ms
zero-padded       3600         354ms        703ms
base 50           1800         315ms        400ms
charCodes         1800          21ms        178ms

My question is if there is an even better method I haven't yet considered?

Update
MДΓΓБДLL suggested using pression on the data. Combining this LZW implementation with the base 50 and charCode data. I also tested aroth's code (packing 4 integers into 3 bytes). I got these results:

                  size     Chrome 18   Firefox 11
-------------------------------------------------
LZW base 50       1103         494ms        999ms
LZW charCodes     1103         194ms        882ms
bitpacking        1350        2395ms        331ms

*"Efficient" here basically means in terms of smaller size (to reduce the IO waiting time), and speedy retrieval/deserialization times. Storing times are not as important.

I have to store a couple of dozen arrays of integers, each with 1800 values in the range 0-50, in the browser's localStorage -- that is, as a string.

Obviously, the simplest method is to just JSON.stringify it, however, that adds a lot of unnecessary information, considering that the ranges of the data is well known. An average size for one of these arrays is then ~5500 bytes.

Here are some other methods I've tried (resultant size, and time to deserialize it 1000 times at the end)

  • zero-padding the numbers so each was 2 characters long, eg:

    [5, 27, 7, 38] ==> "05270738"
    
  • base 50 encoding it:

    [5, 11, 7, 38] ==> "5b7C"
    
  • just using the value as a character code (adding 32 to avoid the weird control characters at the start):

    [5, 11, 7, 38] ==> "%+'F" (String.fromCharCode(37), String.fromCharCode(43) ...)
    

Here are my results:

                  size     Chrome 18   Firefox 11
-------------------------------------------------
JSON.stringify    5286          60ms         99ms
zero-padded       3600         354ms        703ms
base 50           1800         315ms        400ms
charCodes         1800          21ms        178ms

My question is if there is an even better method I haven't yet considered?

Update
MДΓΓБДLL suggested using pression on the data. Combining this LZW implementation with the base 50 and charCode data. I also tested aroth's code (packing 4 integers into 3 bytes). I got these results:

                  size     Chrome 18   Firefox 11
-------------------------------------------------
LZW base 50       1103         494ms        999ms
LZW charCodes     1103         194ms        882ms
bitpacking        1350        2395ms        331ms
Share Improve this question edited May 23, 2017 at 11:58 CommunityBot 11 silver badge asked Apr 12, 2012 at 0:12 nickfnickf 546k198 gold badges658 silver badges727 bronze badges 4
  • 3 Try bining base 50 or charCodes method with deflate/gzip. I'm not sure that there's even enough data for pression to be worth the overhead, but it can't hurt to check. – Matt Ball Commented Apr 12, 2012 at 0:16
  • @MДΓΓБДLL I just tried it with LZW pression... updating the results table now – nickf Commented Apr 12, 2012 at 0:21
  • 61% reduction, I wouldn't have expected that - not bad! – Matt Ball Commented Apr 12, 2012 at 0:41
  • 1 A single JavaScript character takes up two bytes (per the ecma specification.) Make sure to factor that in when you figure the size of the data. – gilly3 Commented Apr 13, 2012 at 7:55
Add a ment  | 

3 Answers 3

Reset to default 4

If your range is 0-50, then you can pack 4 numbers into 3 bytes (6 bits per number). This would allow you to store 1800 numbers using ~1350 bytes. This code should do it:

window._firstChar = 48;

window.decodeArray = function(encodedText) {
    var result = [];
    var temp = [];

    for (var index = 0; index < encodedText.length; index += 3) {
        //skipping bounds checking because the encoded text is assumed to be valid
        var firstChar = encodedText.charAt(index).charCodeAt() - _firstChar;
        var secondChar = encodedText.charAt(index + 1).charCodeAt() - _firstChar;
        var thirdChar = encodedText.charAt(index + 2).charCodeAt() - _firstChar;

        temp.push((firstChar >> 2) & 0x3F);    //6 bits, 'a'
        temp.push(((firstChar & 0x03) << 4) | ((secondChar >> 4) & 0xF));  //2 bits + 4 bits, 'b'
        temp.push(((secondChar & 0x0F) << 2) | ((thirdChar >> 6) & 0x3));  //4 bits + 2 bits, 'c'
        temp.push(thirdChar & 0x3F);  //6 bits, 'd'

    }

    //filter out 'padding' numbers, if present; this is an extremely inefficient way to do it
    for (var index = 0; index < temp.length; index++) {
        if(temp[index] != 63) {
            result.push(temp[index]);
        }            
    }

    return result;
};

window.encodeArray = function(array) {
    var encodedData = [];

    for (var index = 0; index < dataSet.length; index += 4) {
        var num1 = dataSet[index];
        var num2 = index + 1 < dataSet.length ? dataSet[index + 1] : 63;
        var num3 = index + 2 < dataSet.length ? dataSet[index + 2] : 63;
        var num4 = index + 3 < dataSet.length ? dataSet[index + 3] : 63;

        encodeSet(num1, num2, num3, num4, encodedData);
    }

    return encodedData;
};

window.encodeSet = function(a, b, c, d, outArray) {
    //we can encode 4 numbers in 3 bytes
    var firstChar = ((a & 0x3F) << 2) | ((b >> 4) & 0x03);   //6 bits for 'a', 2 from 'b'
    var secondChar = ((b & 0x0F) << 4) | ((c >> 2) & 0x0F);  //remaining 4 bits from 'b', 4 from 'c'
    var thirdChar = ((c & 0x03) << 6) | (d & 0x3F);          //remaining 2 bits from 'c', 6 bits for 'd'

    //add _firstChar so that all values map to a printable character
    outArray.push(String.fromCharCode(firstChar + _firstChar));
    outArray.push(String.fromCharCode(secondChar + _firstChar));
    outArray.push(String.fromCharCode(thirdChar + _firstChar));
};

Here's a quick example: http://jsfiddle/NWyBx/1

Note that storage size can likely be further reduced by applying gzip pression to the resulting string.

Alternately, if the ordering of your numbers is not significant, then you can simply do a bucket-sort using 51 buckets (assuming 0-50 includes both 0 and 50 as valid numbers) and store the counts for each bucket instead of the numbers themselves. That would likely give you better pression and efficiency than any other approach.

Assuming (as in your test) that pression takes more time than the size reduction saves you, your char encoding is the smallest you'll get without bitshifting. You're currently using one byte for each number, but if they're guaranteed to be small enough you could put two numbers in each byte. That would probably be an over-optimization, unless this is a very hot piece of your code.

You might want to consider using Uint8Array or ArrayBuffer. This blogpost shows how it's done. Copying his logic, here's an example, assuming you have an existing Uint8Array named arr.

function arrayBufferToBinaryString(buffer, cb) {
    var blobBuilder = new BlobBuilder();
    blobBuilder.append(buffer);
    var blob = blobBuilder.getBlob();
    var reader = new FileReader();
    reader.onload = function (e) {
        cb(reader.result);
    };
    reader.readAsBinaryString(blob);
}
arrayBufferToBinaryString(arr.buffer, function(s) { 
  // do something with s
});

本文标签: htmlMost efficient way to store large arrays of integers in localStorage with JavascriptStack Overflow