admin管理员组文章数量:1307024
I am profiling some code using the cppgraphgqlgen library - which uses C++20 coroutines extensively in its internals.
I have profiled an application and found that I have some called-into methods that have a higher hit count than their calling parents
I have searched for clone .actor
with reference to profiling and found nothing useful.
It is easy to tell that for classic synchronous code elsewhere - children are always <= their parent costs, in comparison to the coroutine code.
What is clone .actor
in this context and why do the "children" cost more than their parents in this case? Is there anyway to tell what this operation actually is doing?
For context on how I gathered my profiling data
- Get a profiling dump by running
LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libprofiler.so.0 CPUPROFILE=./prof.out ./my-program
- Run that through Google Perftools (gperftools) to make a callgrind style file
/usr/bin/google-pprof --callgrind "$(realpath ./my-program)" ./prof.out > ./callgrind.out
- Open that dump using
kcachegrind
I am profiling some code using the cppgraphgqlgen library - which uses C++20 coroutines extensively in its internals.
I have profiled an application and found that I have some called-into methods that have a higher hit count than their calling parents
I have searched for clone .actor
with reference to profiling and found nothing useful.
It is easy to tell that for classic synchronous code elsewhere - children are always <= their parent costs, in comparison to the coroutine code.
What is clone .actor
in this context and why do the "children" cost more than their parents in this case? Is there anyway to tell what this operation actually is doing?
For context on how I gathered my profiling data
- Get a profiling dump by running
LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libprofiler.so.0 CPUPROFILE=./prof.out ./my-program
- Run that through Google Perftools (gperftools) to make a callgrind style file
/usr/bin/google-pprof --callgrind "$(realpath ./my-program)" ./prof.out > ./callgrind.out
- Open that dump using
kcachegrind
1 Answer
Reset to default 1Please (strongly) consider switching to much better and more capable go pprof implementation (github/google/pprof). Sadly, distros continue to ship our old perl implementation, but upcoming 2.17 release already had that pprof implementation amputated. So, hopefully, it will encourage distros some more.
.clone thingy is artifact of optimizations and demangling. Sometimes compilers create optimized copies of certain functions (e.g. constant propagating some things). pprof is supposed to remove this detail from function name. (But sometimes you want to see those details and more, such as template arguments; see --symbolize option for that)
if or when you see mixed up parent/child relations, consider checking your stack trace capturing method. Skipping "first parent" stack frame is a known issue with frame-pointers-based stacktrace capturing. See here: https://github/gperftools/gperftools/wiki/gperftools'-stacktrace-capturing-methods-and-their-issues#frame-pointers
版权声明:本文标题:C++ Profiling - Called method from coroutine function has a higher hit count than its caller - Stack Overflow 内容由网友自发贡献,该文观点仅代表作者本人, 转载请联系作者并注明出处:http://www.betaflare.com/web/1741797934a2398047.html, 本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容,一经查实,本站将立刻删除。
发表评论