admin管理员组

文章数量:1432645

I have a really big CUDA kernel which does a lot of stuff. Like

_global_ void bigkernel(args)
{
func1();
func2();
func3();
func4();
func5();
....
}

I want to profile each one of those functions and visualize them in a Nsight. When I run this in Nsight, it only shows the bigkernel and not the details of the func1() and the rest.

Right now I use the built-in clock64() to time each of the functions, use a structure to keep track and store.

struct time_stuff
{
    uint64_t start, end,
        func1, func2, func3, func4,...
};

To visualize I use python but I would like to inquire if there is better method?

I can use Nsight Compute and Systems to understand my program and how it affects functions but using clock seems the easiest.

nsys profile --trace=nvtx,cuda --sample=cpu -o cu_trace ./cu_alg /datasets/collisions.txt 15000000 \\s 96 128

本文标签: profilingHow do I profile the inside of a CUDA kernelStack Overflow