admin管理员组文章数量:1432645
I have a really big CUDA kernel which does a lot of stuff. Like
_global_ void bigkernel(args)
{
func1();
func2();
func3();
func4();
func5();
....
}
I want to profile each one of those functions and visualize them in a Nsight.
When I run this in Nsight, it only shows the bigkernel and not the details of the func1()
and the rest.
Right now I use the built-in clock64()
to time each of the functions, use a structure to keep track and store.
struct time_stuff
{
uint64_t start, end,
func1, func2, func3, func4,...
};
To visualize I use python but I would like to inquire if there is better method?
I can use Nsight Compute and Systems to understand my program and how it affects functions but using clock seems the easiest.
nsys profile --trace=nvtx,cuda --sample=cpu -o cu_trace ./cu_alg /datasets/collisions.txt 15000000 \\s 96 128
本文标签: profilingHow do I profile the inside of a CUDA kernelStack Overflow
版权声明:本文标题:profiling - How do I profile the inside of a CUDA kernel? - Stack Overflow 内容由网友自发贡献,该文观点仅代表作者本人, 转载请联系作者并注明出处:http://www.betaflare.com/web/1744514746a2610092.html, 本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容,一经查实,本站将立刻删除。
发表评论