admin管理员组文章数量:1410689
I need to iterate over a full MongoDb collection with ~2 million documents. Therefore I am using the cursor feature and the eachAsync
function. However I noticed that it's pretty slow (it takes more than 40 minutes). I tried different batchSizes up to 5000 (which would be just 400 queries against MongoDB).
The application doesn't take much CPU (0.2% - 1%), nor does it take much RAM or IOPs. So apparently my code can be optimized to speed up this process.
The code:
const playerProfileCursor = PlayerProfile.find({}, { tag: 1 }).cursor({ batchSize: 5000 })
const p2 = new Promise<Array<string>>((resolve, reject) => {
const playerTags:Array<string> = []
playerProfileCursor.eachAsync((playerProfile) => {
playerTags.push(playerProfile.tag)
}).then(() => {
resolve(playerTags)
}).catch((err) => {
reject(err)
})
})
When I set a breakpoint inside of the eachAsync function body it will immediately hit. So there is nothing stuck, it's just so slow. Is there a way to speed this up?
I need to iterate over a full MongoDb collection with ~2 million documents. Therefore I am using the cursor feature and the eachAsync
function. However I noticed that it's pretty slow (it takes more than 40 minutes). I tried different batchSizes up to 5000 (which would be just 400 queries against MongoDB).
The application doesn't take much CPU (0.2% - 1%), nor does it take much RAM or IOPs. So apparently my code can be optimized to speed up this process.
The code:
const playerProfileCursor = PlayerProfile.find({}, { tag: 1 }).cursor({ batchSize: 5000 })
const p2 = new Promise<Array<string>>((resolve, reject) => {
const playerTags:Array<string> = []
playerProfileCursor.eachAsync((playerProfile) => {
playerTags.push(playerProfile.tag)
}).then(() => {
resolve(playerTags)
}).catch((err) => {
reject(err)
})
})
When I set a breakpoint inside of the eachAsync function body it will immediately hit. So there is nothing stuck, it's just so slow. Is there a way to speed this up?
Share asked Oct 10, 2017 at 16:13 kentorkentor 18.6k27 gold badges99 silver badges153 bronze badges 10- Have you tried to profile your app? e.g. with github./v8/v8/wiki/V8-Profiler ? – Alex Blex Commented Oct 10, 2017 at 17:24
- @AlexBlex Well I get 97.8% "Unaccounted" and I have no idea what that's supposed to be. See here: i.imgur./wV9i8cL.png – kentor Commented Oct 11, 2017 at 22:53
-
What are you trying to achieve? Get all
tag
s from all documents inPlayerProfile
? – Styx Commented Oct 12, 2017 at 17:47 -
@Styx Yes exactly, I need all
tag
s from two different collection and find the union of these tag arrays. – kentor Commented Oct 12, 2017 at 18:11 - 1 Well, it seems as XY problem to me. There is a better way to get unique values of particular fields of documents, rather than iterate cursor over all of them. – Styx Commented Oct 12, 2017 at 19:58
1 Answer
Reset to default 7 +50That feature was added in version 4.12 (most up to date atm) and isn't really documented yet.
eachAsync
runs with a concurrency of 1 by default, but you can change it in the parameter 'parallel'. (as seen here)
Thus your code could look something like this:
const playerProfileCursor = PlayerProfile.find({}, { tag: 1 }).cursor({ batchSize: 5000 })
const p2 = new Promise<Array<string>>((resolve, reject) => {
const playerTags:Array<string> = []
playerProfileCursor.eachAsync((playerProfile) => {
playerTags.push(playerProfile.tag)
}, { parallel: 50 }).then(() => {
resolve(playerTags)
}).catch((err) => {
reject(err)
})
})
本文标签: javascriptIterating over MongoDB cursor is slowStack Overflow
版权声明:本文标题:javascript - Iterating over MongoDB cursor is slow - Stack Overflow 内容由网友自发贡献,该文观点仅代表作者本人, 转载请联系作者并注明出处:http://www.betaflare.com/web/1744809265a2626338.html, 本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容,一经查实,本站将立刻删除。
发表评论