admin管理员组

文章数量:1410689

I need to iterate over a full MongoDb collection with ~2 million documents. Therefore I am using the cursor feature and the eachAsync function. However I noticed that it's pretty slow (it takes more than 40 minutes). I tried different batchSizes up to 5000 (which would be just 400 queries against MongoDB).

The application doesn't take much CPU (0.2% - 1%), nor does it take much RAM or IOPs. So apparently my code can be optimized to speed up this process.

The code:

  const playerProfileCursor = PlayerProfile.find({}, { tag: 1 }).cursor({ batchSize: 5000 })
  const p2 = new Promise<Array<string>>((resolve, reject) => {
    const playerTags:Array<string> = []
    playerProfileCursor.eachAsync((playerProfile) => {
      playerTags.push(playerProfile.tag)
    }).then(() => {
      resolve(playerTags)
    }).catch((err) => {
      reject(err)
    })
  })

When I set a breakpoint inside of the eachAsync function body it will immediately hit. So there is nothing stuck, it's just so slow. Is there a way to speed this up?

I need to iterate over a full MongoDb collection with ~2 million documents. Therefore I am using the cursor feature and the eachAsync function. However I noticed that it's pretty slow (it takes more than 40 minutes). I tried different batchSizes up to 5000 (which would be just 400 queries against MongoDB).

The application doesn't take much CPU (0.2% - 1%), nor does it take much RAM or IOPs. So apparently my code can be optimized to speed up this process.

The code:

  const playerProfileCursor = PlayerProfile.find({}, { tag: 1 }).cursor({ batchSize: 5000 })
  const p2 = new Promise<Array<string>>((resolve, reject) => {
    const playerTags:Array<string> = []
    playerProfileCursor.eachAsync((playerProfile) => {
      playerTags.push(playerProfile.tag)
    }).then(() => {
      resolve(playerTags)
    }).catch((err) => {
      reject(err)
    })
  })

When I set a breakpoint inside of the eachAsync function body it will immediately hit. So there is nothing stuck, it's just so slow. Is there a way to speed this up?

Share asked Oct 10, 2017 at 16:13 kentorkentor 18.6k27 gold badges99 silver badges153 bronze badges 10
  • Have you tried to profile your app? e.g. with github./v8/v8/wiki/V8-Profiler ? – Alex Blex Commented Oct 10, 2017 at 17:24
  • @AlexBlex Well I get 97.8% "Unaccounted" and I have no idea what that's supposed to be. See here: i.imgur./wV9i8cL.png – kentor Commented Oct 11, 2017 at 22:53
  • What are you trying to achieve? Get all tags from all documents in PlayerProfile? – Styx Commented Oct 12, 2017 at 17:47
  • @Styx Yes exactly, I need all tags from two different collection and find the union of these tag arrays. – kentor Commented Oct 12, 2017 at 18:11
  • 1 Well, it seems as XY problem to me. There is a better way to get unique values of particular fields of documents, rather than iterate cursor over all of them. – Styx Commented Oct 12, 2017 at 19:58
 |  Show 5 more ments

1 Answer 1

Reset to default 7 +50

That feature was added in version 4.12 (most up to date atm) and isn't really documented yet.

eachAsync runs with a concurrency of 1 by default, but you can change it in the parameter 'parallel'. (as seen here)

Thus your code could look something like this:

const playerProfileCursor = PlayerProfile.find({}, { tag: 1 }).cursor({ batchSize: 5000 })
const p2 = new Promise<Array<string>>((resolve, reject) => {
const playerTags:Array<string> = []
playerProfileCursor.eachAsync((playerProfile) => {
  playerTags.push(playerProfile.tag)
}, { parallel: 50 }).then(() => {
  resolve(playerTags)
}).catch((err) => {
  reject(err)
})
})

本文标签: javascriptIterating over MongoDB cursor is slowStack Overflow