admin管理员组文章数量:1334278
I have two arrays with user id's and I want to check different items in them.
arr1 = [123, 456, 789];
arr2 = [123, 456, 789, 098];
The problem is: these arrays can have 10 or 20 millions of items.
I'm trying with underscore.difference()
but it took 10 mins to plete.
Is there any faster way to do this?
I have two arrays with user id's and I want to check different items in them.
arr1 = [123, 456, 789];
arr2 = [123, 456, 789, 098];
The problem is: these arrays can have 10 or 20 millions of items.
I'm trying with underscore.difference()
but it took 10 mins to plete.
Is there any faster way to do this?
Share Improve this question edited May 1, 2014 at 18:25 user2864740 62k15 gold badges157 silver badges227 bronze badges asked May 1, 2014 at 18:19 user3175226user3175226 3,6698 gold badges30 silver badges48 bronze badges 4- Are the arrays sorted? – Cameron Commented May 1, 2014 at 18:20
- 2 if your elements work as object keys, it would probably be quicker to turn one array's elements into object keys with a value of 1, then filter your second array according to the object having or not having the elmement as a key. sorting will be slow with that many items, but object lookup is very fast and avoids the loop-de-loop that underscore's contains() method utilizes. – dandavis Commented May 1, 2014 at 18:22
-
2
Oh, it looks like
_.difference
doesn't even use a set .. so using a set/probe will be much better bounds. See stackoverflow./questions/13147278/… – user2864740 Commented May 1, 2014 at 18:23 - 3 If you're working with 10 million items then maybe you should put your data in a database. – mu is too short Commented May 1, 2014 at 18:49
4 Answers
Reset to default 3How about converting the arrays to object to reduce the sorting plexity:
var arr1 = [123, 456, 789], arr2 = [123, 456, 789, 098];
function toObject(arr){
return arr.reduce(function(o, v, i) {
o[v] = i;
return o;
}, {});
}
var o1 = toObject(arr1), o2 = toObject(arr2), diff = [];
for(var prop in o2){
if(o1[prop] === undefined)
diff.push(prop);
}
console.log(diff);
You would obviously need to start on the biggest set.
http://jsfiddle/sHUw5/
Another thing to consider is to sort your collections and do a binary search to reduce the plexity from (O)N
to (O)log2N
for each array (if I'm thinking straight).
Use native js, rather than a library that tries to acmodate lots of scenarios / inputs.
- Skip all the function calls and scope changes.
- Minimize property lookup/namespace walking
- Don't keep getting the array length on each loop
Simple optimization:
var array1 = [];
var array2 = [];
var difference = [];
for(var i = 0; len = array1.length; i < len; i++)
{
var value = array1[i];
if(value == array2[i])
{
continue;
}
if(array2.indexOf(value) == -1)
{
difference.push(value);
}
}
This considers you do not have the numbers 0 or 1 in the arrays:
var arr1 = [123, 456, 789,3],
arr2 = [123, 456, 789, 098],
has = {},
different=[],
length1=arr1.length,
length2=arr2.length;
for(var i=0;i<length1;i++){
has[arr1[i]]=true;
}
for(var i=0;i<length2;i++){
var val=arr2[i];
if(has[val] === undefined){
has[val]=val;
}
else{
if(has[val]!=val){
has[val]=false;
}
}
}
for(var i in has){
if (has[i]) different.push(i);
}
If you want to check also for 0 and 1:
for(var i=0;i<length1;i++){
has[arr1[i]]=NaN;
}
for(var i=0;i<length2;i++){
var val=arr2[i];
if(has[val] === undefined){
has[val]=null;
}
else{
if(has[val]!=null){
has[val]=true;
}
}
}
for(var i in has){
if (!has[i]) different.push(i);
}
here is a fast way of cheating the nested iteration that _.difference will invoke:
var arr1 = [123, 456, 789],
arr2 = [123, 456, 789, 098],
has = {};
arr1.forEach(function(a){ this[a]=1;}, has);
alert( arr2.filter(function(a){return !this[a]; }, has) );
by using this in the iteration, we hand a pure function to JS that can be executed at the maximum possible speed.
note that this won't work for arrays of objects, or mixed-type arrays like [1, "1"], but it should work for the problem described and demonstrated by the question.
edit: it you want bi-directional pares (like arr1 having, arr2 missing or vice-versa), reverse and repeat the code above. you'll still only be at 40million putations pared to 100trillion that an indexOf()-using method will cost...
本文标签: javascriptdifference of two arrays with 10 million itemsdifference is too slowStack Overflow
版权声明:本文标题:javascript - difference of two arrays with 10 million items - _.difference is too slow - Stack Overflow 内容由网友自发贡献,该文观点仅代表作者本人, 转载请联系作者并注明出处:http://www.betaflare.com/web/1742371481a2462324.html, 本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容,一经查实,本站将立刻删除。
发表评论