admin管理员组

文章数量:1401849

In the last example of Mark Harris' webinar I don't understand the indexing before the parallel reduction part. In "Reduction #6" the gridSize/number of dispatches was ceil[N (the size of the data) divided by blockSize/WorkgroupSize and by 2 because we accessed two items at once].

Now "it's as many as necessary". But what is the gridSize? It can't be ceil[N/2/workgroupSize] like before because then in the while loop it would exit on first iteration. What's the number of dispatches when calling the kernel then?

本文标签: cudaNVIDIA webinar on parallel reduction gridsizeStack Overflow