admin管理员组文章数量:1122846
I was looking for an answer but found nothing definitive. How do I interpret the first line in perf report
output. It goes like this:
Samples: 173M of event 'cache-misses', Event count (approx.): 461731712088
Practically every tutorial I've seen goes over everything but this first line. This page explains differences between samples and counts but I want to be 100% sure I'm not making wrong conclusions here: in my example, what do 173M
and 461731712088
mean? From what I've read, I'm guessing the second one is the total number of cache misses that have occurred during the runtime and the first one is the number of cache misses that were recorded and accounted for displaying statistics. Is this right or am I misinterpreting the output?
I was looking for an answer but found nothing definitive. How do I interpret the first line in perf report
output. It goes like this:
Samples: 173M of event 'cache-misses', Event count (approx.): 461731712088
Practically every tutorial I've seen goes over everything but this first line. This page explains differences between samples and counts but I want to be 100% sure I'm not making wrong conclusions here: in my example, what do 173M
and 461731712088
mean? From what I've read, I'm guessing the second one is the total number of cache misses that have occurred during the runtime and the first one is the number of cache misses that were recorded and accounted for displaying statistics. Is this right or am I misinterpreting the output?
1 Answer
Reset to default 1You're correct. For hardware events (not software events like page faults and context switches), perf record
works by programming HW counter in the PMU to record a sample every n
occurrences of the hardware event. (Where "recording a sample" means writing the PEBS buffer1 or just raising an interrupt on the spot.)
n
is chosen to give some sample frequency that doesn't cause so many interrupts that it hugely distorts performance, but also collects a reasonable number of samples for that event over a few seconds to minutes of running whatever you're profiling. That might mean adjusting n
on the fly, or having defaults for different events. (Like instructions
which typically happens more than 1 per clock vs. very rare events like machine_clears.count
.)
The actual PMU hardware gets programmed with n
, and counts down toward 0 (or maybe up towards n
and compares).
perf stat
works by setting a huge n
, as large as the HW supports, so it only has to interrupt for counter rollover as infrequently as possible. And by collecting the final counter value at the end. (Software can read/write the exact count at any time; that's how the kernel virtualizes the counters on context-switches.) This might be a simplification of some details, but AFAIK is accurate in explaining why perf stat
has basically no overhead and is able to give very precise and repeatable counts.
But perf record
with a file of samples and data on the n
used to collect them can only extrapolate total counts from samples * n
, hence the "approx."
Footnote 1: The PEBS buffer is apparently usually very small, like 1 sample, so it doesn't save on interrupts much/at all, but it does precisely attribute the sample to an instruction, rather than one nearby: see the skids and PEBS section in https://www.brendangregg.com/perf.html . Great for events like mem_load_retired.l3_miss
. Events like cycles
still have to pick one instruction to "blame", and that's usually the one waiting for a slow input, e.g. the instruction trying to use the result of a cache-miss load, not the load itself.
BTW, the :u
/ :k
filters to count only in user-space or kernel-space are something the hardware supports. So the kernel isn't having to manage the counters on every interrupt. (And in fact doesn't, so if you don't use :u
or --all-user
, your profiling will include interrupt handlers that ran while your task was current
on a core.)
本文标签: performanceperf report Understanding the first line of outputStack Overflow
版权声明:本文标题:performance - perf report: Understanding the first line of output - Stack Overflow 内容由网友自发贡献,该文观点仅代表作者本人, 转载请联系作者并注明出处:http://www.betaflare.com/web/1736303606a1931944.html, 本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容,一经查实,本站将立刻删除。
发表评论