c++ - Why is calling clock_gettime with CLOCK_THREAD_CPUTIME_ID relatively expensive compared to other clock types? - Stack Over

IT技术

更新时间：2025-04-080

admin管理员组
文章数量:1357620

When I run my code to measure how much time a program thread spends in CPU (both user and system) I use the API such as

struct timespec start,
                end;
clock_gettime(CLOCK_THREAD_CPUTIME_ID, &start);
// do something here
// ...
// ...
clock_gettime(CLOCK_THREAD_CPUTIME_ID, &end);
const double elapsed = (end.tv_sec-start.tv_sec)*1e9 + (end.tv_nsec-start.tv_nsec);

I was wondering what is the minimum elapsed time and I tried the following code:

#include <iostream>
#include <time.h>

int main() {
    const int   c_type[] = {CLOCK_REALTIME, CLOCK_MONOTONIC, CLOCK_THREAD_CPUTIME_ID};
    struct timespec start,
            end;
    const size_t    n_iter = 1024*1024;

    for(int i = 0; i < sizeof(c_type)/sizeof(int); ++i) {
        double accum = 0.0;
        for(int j = 0; j < n_iter; ++j) {
            clock_gettime(c_type[i], &start);
            clock_gettime(c_type[i], &end);

            const double elapsed = (end.tv_sec-start.tv_sec)*1e9 + (end.tv_nsec-start.tv_nsec);
            accum += elapsed;
        }
        std::cout << "[" << i << "] elapsed: " << accum/n_iter << std::endl;
    }
}

To my surprise I get the following timings:

[0] elapsed: 19.8536 // CLOCK_REALTIME
[1] elapsed: 19.8697 // CLOCK_MONOTONIC
[2] elapsed: 88.3246 // CLOCK_THREAD_CPUTIME_ID

Which means that the minimum time between two CLOCK_THREAD_CPUTIME_ID calls is approximately 88 nsec (these stats come from a 9950x3d, Ubuntu 24.04).

Why is it the case? Why is this slower than CLOCK_REALTIME for example?

I was under the impression that CLOCK_THREAD_CPUTIME_ID was implemented roughly by reading the rtdsc register and apply some adjustments?

When I run my code to measure how much time a program thread spends in CPU (both user and system) I use the API such as

struct timespec start,
                end;
clock_gettime(CLOCK_THREAD_CPUTIME_ID, &start);
// do something here
// ...
// ...
clock_gettime(CLOCK_THREAD_CPUTIME_ID, &end);
const double elapsed = (end.tv_sec-start.tv_sec)*1e9 + (end.tv_nsec-start.tv_nsec);

I was wondering what is the minimum elapsed time and I tried the following code:

#include <iostream>
#include <time.h>

int main() {
    const int   c_type[] = {CLOCK_REALTIME, CLOCK_MONOTONIC, CLOCK_THREAD_CPUTIME_ID};
    struct timespec start,
            end;
    const size_t    n_iter = 1024*1024;

    for(int i = 0; i < sizeof(c_type)/sizeof(int); ++i) {
        double accum = 0.0;
        for(int j = 0; j < n_iter; ++j) {
            clock_gettime(c_type[i], &start);
            clock_gettime(c_type[i], &end);

            const double elapsed = (end.tv_sec-start.tv_sec)*1e9 + (end.tv_nsec-start.tv_nsec);
            accum += elapsed;
        }
        std::cout << "[" << i << "] elapsed: " << accum/n_iter << std::endl;
    }
}

To my surprise I get the following timings:

[0] elapsed: 19.8536 // CLOCK_REALTIME
[1] elapsed: 19.8697 // CLOCK_MONOTONIC
[2] elapsed: 88.3246 // CLOCK_THREAD_CPUTIME_ID

Which means that the minimum time between two CLOCK_THREAD_CPUTIME_ID calls is approximately 88 nsec (these stats come from a 9950x3d, Ubuntu 24.04).

Why is it the case? Why is this slower than CLOCK_REALTIME for example?

I was under the impression that CLOCK_THREAD_CPUTIME_ID was implemented roughly by reading the rtdsc register and apply some adjustments?

Share Improve this question edited Mar 27 at 23:06 asked Mar 27 at 22:47 Emanuele 1,4972 gold badges19 silver badges42 bronze badges

1 I tried this on my Mac M1, and also added CLOCK_PROCESS_CPUTIME_ID. I get 25, 20, 156, 250 as the results. So it's not Linux-specific. – Barmar Commented Mar 27 at 23:03
1 Have you looked at the relevant libc / vdso / kernel source code? – Nate Eldredge Commented Mar 27 at 23:51

Add a comment |

1 Answer 1

Sorted by: Reset to default 6

Looking at the VDSO implementation, the relevant function is __cvdso_clock_gettime_common defined in lib/vdso/gettimeofday.c:

static __always_inline int
__cvdso_clock_gettime_common(const struct vdso_data *vd, clockid_t clock,
                 struct __kernel_timespec *ts)
{
    u32 msk;

    /* Check for negative values or invalid clocks */
    if (unlikely((u32) clock >= MAX_CLOCKS))
        return -1;

    /*
     * Convert the clockid to a bitmask and use it to check which
     * clocks are handled in the VDSO directly.
     */
    msk = 1U << clock;
    if (likely(msk & VDSO_HRES))
        vd = &vd[CS_HRES_COARSE];
    else if (msk & VDSO_COARSE)
        return do_coarse(&vd[CS_HRES_COARSE], clock, ts);
    else if (msk & VDSO_RAW)
        vd = &vd[CS_RAW];
    else
        return -1;

    return do_hres(vd, clock, ts);
}

If -1 is returned, the VDSO can't handle it and must perform a syscall instead, which is much slower.

Elsewhere, in datapage.h, we see the exact list of clocks that those masks cover:

#define VDSO_HRES   (BIT(CLOCK_REALTIME)        | \
             BIT(CLOCK_MONOTONIC)       | \
             BIT(CLOCK_BOOTTIME)        | \
             BIT(CLOCK_TAI))
#define VDSO_COARSE (BIT(CLOCK_REALTIME_COARSE) | \
             BIT(CLOCK_MONOTONIC_COARSE))
#define VDSO_RAW    (BIT(CLOCK_MONOTONIC_RAW))

This does not include BIT(CLOCK_THREAD_CPUTIME_ID) so it's clearly going to be slow.

Now, is it possible to implement it in the VDSO? Probably. But nobody has bothered yet (and perhaps it would imply overhead every time the scheduler is invoked?).

本文标签：

版权声明：本文标题：c++ - Why is calling clock_gettime with CLOCK_THREAD_CPUTIME_ID relatively expensive compared to other clock types? - Stack Over 内容由网友自发贡献，该文观点仅代表作者本人，转载请联系作者并注明出处：http://www.betaflare.com/web/1744064802a2584790.html，本站仅提供信息存储空间服务，不拥有所有权，不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容，一经查实，本站将立刻删除。

发表评论

全部评论 0

暂无评论

编程频道|软件玩家 - 软件改变生活！

c++ - Why is calling clock_gettime with CLOCK_THREAD_CPUTIME_ID relatively expensive compared to other clock types? - Stack Over

1 Answer 1

更多相关文章

tomcat - Server not found in Kerberos database **remaining name &#39;DC=mydomain,DC=com&#39;** - Stack Overflow

javascript - Leaflet.js: WMS Layer Styles - Stack Overflow

javascript - HTTPS to HTTP JSONP request - Stack Overflow

javascript - How to use css @keyframes inside an cshtml page? - Stack Overflow

temporal randomly some of the activities task are getting stuck forever state : PENDING_ACTIVITY_STATE_SCHEDULED - Stack Overflo

python - Second client connect() on unix domain socket succeeds even when listen() set to zero and server only accepts a single

performance - Which is faster for finding element in javascript array? - Stack Overflow

node.js - How to store images in MySQL hosted in Hostinger thorough Nodejs? - Stack Overflow

c# - Rfc2898DeriveBytes giving error salt is not at least eight bytes - Stack Overflow

sql - Selecting a dynamically supplied table name using Sea Orm in Rust - Stack Overflow

javascript - TinyMCE with ReactJs - onChange event only firing once - Stack Overflow

python - How to Extract and Map JSON Values to PDF Coordinates - Stack Overflow

javascript - jQuery.remove(), detach DOM elements, but I still can access the elements from code. How to avoid leaks? - Stack Ov

javascript - How to find all div elements with specific class using jquery - Stack Overflow

javascript - Multiplying any number with null should return null instead of 0 in typescript - Stack Overflow

javascript - Relaxing Chrome&#39;s CSP while running tests (webdriver) (Content-Security-policy) - Stack Overflow

javascript - How to load JS file in html using express server - Stack Overflow

sorting - Why is -1 sorted before -Infinity in Javascript? - Stack Overflow

javascript - render React component from a string - Stack Overflow

javascript - Raphael 2.0 - how to correctly set the rotation point - Stack Overflow

发表评论

推荐文章

python 3.x - An exception is correct, but prints the wrong one? (answered in edit) - Stack Overflow

javascript - Bootstrap dropdown not closing on clicking outside - Stack Overflow

python 3.x - Error installing &quot;mediapipe&quot; in google colab - Stack Overflow

html - Access cross-domain iframe elements using JavaScript - Stack Overflow

Rendering blazor component to a string, render mode not supported - Stack Overflow

热门文章

javascript - node-pre-gyp install --fallback-to-build - Stack Overflow

华硕全系列WIN11 23H2 24H2版本原厂系统工厂模式安装教程

javascript - Finding Closest Points to a certain Point Given its Coordinates and Maximum Distance - Query Result Undefined using

Is there a way to automatically retry failed stages in an Azure DevOps YAML pipeline? - Stack Overflow

javascript - change :hover to clicktap function on mobiletouch devices not working - Stack Overflow

javascript - Redirect to new page but remember original location - Stack Overflow

javascript - fire event on tab clicked - Stack Overflow

javascript - How to change (-,+) symbol with a button Bootstrap Accordion? - Stack Overflow

javascript - for loop string each word - Stack Overflow

javascript - Switch between tabs without reloading - Stack Overflow

最新文章

windows设置断电重启开机后自动输入锁屏密码登录

Windows系统设置开机默认开启数字小键盘

Windows11 开机自动同步时间（开机时间不更新问题）

windows配置开机自启动软件或脚本

【Redis】Windows设置Redis为开机自启动

javascript - Random jquery ajax error readyState=0 - Stack Overflow

javascript - Node.jsSqlite 3: Alternative for INSERT query syntax - Stack Overflow

javascript - Pass object with changed property - Stack Overflow

javascript - Can Chrome content and background scripts share access to blob: URLs? - Stack Overflow

javascript - submit child form that was inside another form and prevent submitting parent form, react-hook-form - Stack Overflow

惠普OMEN 15-CE001TX 2EF91PA参数报价

苹果新款MacBook Pro 15英寸 i732GB1TBVega Pro 20参数报价

联想Y330A-PSE L参数报价

神舟战神Z7 D6 i7-12650H16GB512GBRTX4050旗舰版参数报价

神舟战神Z7 D6 i7-12650H16GB1TBRTX4050参数报价

tomcat - Server not found in Kerberos database remaining name 'DC=mydomain,DC=com' - Stack Overflow

javascript - Relaxing Chrome's CSP while running tests (webdriver) (Content-Security-policy) - Stack Overflow

python 3.x - Error installing "mediapipe" in google colab - Stack Overflow