c - Understanding the difference in timing of two functions that increment each element of an integer array - Stack Overflow-软件玩家

admin管理员组
文章数量:1406060

In the code pasted below, I am running two functions f1 and f2 which perform the same exact job. Take a number T and integer array arr (that has been initialized to 0 everywhere) and then increment T times each element of arr Thus by the end of both f1 and f2, the input 0,0...0 should become T,T...T.

What I don't understand is why f1 runs so much slower than f2 (about 1.76 times slower when T is, say, 1 billion). Here is my output

➜ Desktop gcc timing-difference.c && ./a.out 1000000000 16

-----> Running f1 <------- Time taken: 15.511 seconds

-----> Running f2 <------- Time taken: 8.887 seconds%

Here T is supplied to the program as argv[1] and the array length as argv[2]. The timing-difference.c file is pasted below at the end of this post.

Basically, f1 is directly incrementing each arr[i], whereas f2 is using a temporary variable tmp for the increment, and then assigning it to arr[i] when done.

#include<stdio.h>
#include<unistd.h>
#include<stdlib.h>
#include<time.h>

/*Directly increment arr[i] for each i*/
void f1(int T, int* arr, int arrlength){

  printf("\n-----> Running f1 <-------\n");
  clock_t end, start;
  double cpu_time_used;

  // For each element of arr, increment it T times.
  start = clock();
  for(int i = 0 ;  i<arrlength ;++i){
    for (int j=0 ; j<T ; ++j){
      arr[i] += 1;
    }
  }
  end = clock();
  cpu_time_used = ((double) (end - start)) / CLOCKS_PER_SEC; 

  printf("Time taken: %.3f seconds", cpu_time_used);

}

/*increment arr[i] using temporary variable tmp*/
void f2(int T, int* arr, int arrlength){

  printf("\n-----> Running f2 <-------\n");
  clock_t end, start;
  double cpu_time_used;

  // For each element of arr, increment it T times.
  start = clock();
  for(int i = 0 ;  i<arrlength ;++i){

    int tmp=arr[i];
    for (int j=0 ; j<T ; ++j){
      tmp += 1;
    }
    arr[i] = tmp; 
  }

  end = clock();
  cpu_time_used = ((double) (end - start)) / CLOCKS_PER_SEC; 
  printf("Time taken: %.3f seconds", cpu_time_used);

}

void print_arr(int* arr, int arrlength){
  for(int i = 0 ; i<arrlength ; ++i){
    printf("%d  ", arr[i]);
  }
  printf("\n\n");

}

/*Zero initialize array*/
void initialize_array(int* arr, int arrlength){
  for (int i=0 ; i<arrlength ; ++i){
    arr[i] = 0;
  }

}

int main(int argc, char** argv){

  int T         = atoi(argv[1]);
  int arrlength = atoi(argv[2]);
  int arr[arrlength];
  
  initialize_array(arr, arrlength);
  f1(T,arr,arrlength);  // --> why does this run slower than f2?
  //print_arr(arr, arrlength);
    
  printf("\n\n");

  initialize_array(arr, arrlength);
  f2(T,arr,arrlength); 
  //print_arr(arr, arrlength);
  
  return 0;
}

EDIT: I get the same time measurements when I call f2 first and f1 later or even if I run it multiple times.

With -O2 enabled I get the timings as 0.000 on both. I was curious about the default settings gcc was using for compiling and why there was such a big difference in performance. The answer as wohlstad suggests must of course be in the assembly, but unfortunately, I cannot read x86 assembly well at all :-( for an informed understanding

What I don't understand is why f1 runs so much slower than f2 (about 1.76 times slower when T is, say, 1 billion). Here is my output

➜ Desktop gcc timing-difference.c && ./a.out 1000000000 16

-----> Running f1 <------- Time taken: 15.511 seconds

-----> Running f2 <------- Time taken: 8.887 seconds%

Here T is supplied to the program as argv[1] and the array length as argv[2]. The timing-difference.c file is pasted below at the end of this post.

Basically, f1 is directly incrementing each arr[i], whereas f2 is using a temporary variable tmp for the increment, and then assigning it to arr[i] when done.

#include<stdio.h>
#include<unistd.h>
#include<stdlib.h>
#include<time.h>

/*Directly increment arr[i] for each i*/
void f1(int T, int* arr, int arrlength){

  printf("\n-----> Running f1 <-------\n");
  clock_t end, start;
  double cpu_time_used;

  // For each element of arr, increment it T times.
  start = clock();
  for(int i = 0 ;  i<arrlength ;++i){
    for (int j=0 ; j<T ; ++j){
      arr[i] += 1;
    }
  }
  end = clock();
  cpu_time_used = ((double) (end - start)) / CLOCKS_PER_SEC; 

  printf("Time taken: %.3f seconds", cpu_time_used);

}

/*increment arr[i] using temporary variable tmp*/
void f2(int T, int* arr, int arrlength){

  printf("\n-----> Running f2 <-------\n");
  clock_t end, start;
  double cpu_time_used;

  // For each element of arr, increment it T times.
  start = clock();
  for(int i = 0 ;  i<arrlength ;++i){

    int tmp=arr[i];
    for (int j=0 ; j<T ; ++j){
      tmp += 1;
    }
    arr[i] = tmp; 
  }

  end = clock();
  cpu_time_used = ((double) (end - start)) / CLOCKS_PER_SEC; 
  printf("Time taken: %.3f seconds", cpu_time_used);

}

void print_arr(int* arr, int arrlength){
  for(int i = 0 ; i<arrlength ; ++i){
    printf("%d  ", arr[i]);
  }
  printf("\n\n");

}

/*Zero initialize array*/
void initialize_array(int* arr, int arrlength){
  for (int i=0 ; i<arrlength ; ++i){
    arr[i] = 0;
  }

}

int main(int argc, char** argv){

  int T         = atoi(argv[1]);
  int arrlength = atoi(argv[2]);
  int arr[arrlength];
  
  initialize_array(arr, arrlength);
  f1(T,arr,arrlength);  // --> why does this run slower than f2?
  //print_arr(arr, arrlength);
    
  printf("\n\n");

  initialize_array(arr, arrlength);
  f2(T,arr,arrlength); 
  //print_arr(arr, arrlength);
  
  return 0;
}

EDIT: I get the same time measurements when I call f2 first and f1 later or even if I run it multiple times.

Share Improve this question edited Mar 6 at 22:21 asked Mar 6 at 16:14 smilingbuddha 14.8k37 gold badges120 silver badges206 bronze badges

Just a speculation: tmp was put in a register, but arr[i] was not. To know for sure you need to inspect the assembly - e.g. on Godbolt. – wohlstad Commented Mar 6 at 16:17
Also, try compiling with optimization enabled. – dbush Commented Mar 6 at 16:22
@dbush Thanks for the suggestion, yes with -O2 enabled I get the timings as 0.000 on both. I was curious about the default settings gcc was using for compiling and why there was such a big difference in performance. The answer as wohlstad suggests must of course be in the assembly, but unfortunately, I cannot read x86 assembly well at all :-( for an informed understanding – smilingbuddha Commented Mar 6 at 16:26
Do you get the same time measurements when you change the order of calling f1 and f2? Or when you run the program with the same input multiple times? – Bodo Commented Mar 6 at 16:35
@Bodo Yes, I get the same time measurements when I call f2 first and f1 later or even if I run it multiple times. Just tried doing both. – smilingbuddha Commented Mar 6 at 16:43

| Show 6 more comments

1 Answer 1

Sorted by: Reset to default 4

It turns out there's quite a bit going on at the assembly level in f1 for this instruction:

arr[i] += 1;

This compiles to:

        mov     eax, DWORD PTR [rbp-4]
        cdqe
        lea     rdx, [0+rax*4]
        mov     rax, QWORD PTR [rbp-48]
        add     rax, rdx
        mov     edx, DWORD PTR [rax]
        mov     eax, DWORD PTR [rbp-4]
        cdqe
        lea     rcx, [0+rax*4]
        mov     rax, QWORD PTR [rbp-48]
        add     rax, rcx
        add     edx, 1
        mov     DWORD PTR [rax], edx

Which is run on each iteration of the inner loop.

While this line in f2:

tmp += 1;

Complies down to this:

        add     DWORD PTR [rbp-8], 1

And the assignment back:

arr[i] = tmp;

Compiles to this:

        mov     eax, DWORD PTR [rbp-4]
        cdqe
        lea     rdx, [0+rax*4]
        mov     rax, QWORD PTR [rbp-64]
        add     rdx, rax
        mov     eax, DWORD PTR [rbp-8]
        mov     DWORD PTR [rdx], eax

Which only runs in the outer loop. This explains the difference in runtime.

本文标签：

版权声明：本文标题：c - Understanding the difference in timing of two functions that increment each element of an integer array - Stack Overflow 内容由网友自发贡献，该文观点仅代表作者本人，转载请联系作者并注明出处：http://www.betaflare.com/web/1744963370a2634791.html，本站仅提供信息存储空间服务，不拥有所有权，不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容，一经查实，本站将立刻删除。

编程频道|软件玩家 - 软件改变生活！

c - Understanding the difference in timing of two functions that increment each element of an integer array - Stack Overflow

1 Answer 1

更多相关文章

javascript - JSMin usage problem - Stack Overflow

javascript - Regex for a word can be plural - Stack Overflow

c# - javascript single quote issue - Stack Overflow

Issue with using UniformUncertainifier on UKMean in ELKI - Stack Overflow

network programming - How do I detect whether a given IP is in a given subnet in Javascript? - Stack Overflow

Javascript - remove repeating characters when there are more than 2 repeats - Stack Overflow

Python UDF in Flink - Stack Overflow

functions - auto generate title of custom post type which concludes id not working

amazon web services - Only retrieve specific data from a row in DynamoDB using JavaScript - Stack Overflow

javascript - How to speed up getServerSideProps with next js? - Stack Overflow

javascript - How can my web app communicate with serial port - Stack Overflow

javascript - How would I get text from an input &#39;type=text&#39; element using jquery? - Stack Overflow

theme customizer - My Wordpress Plugin code breaks customize.php viewpoint

how to implement automatic sliding images through array in javascript - Stack Overflow

How do you delete a html input element using JavaScript? - Stack Overflow

javascript - How to export React component with props? - Stack Overflow

sanitization - Sanitizing output that contains quotes?

c# - Change label inner text that contains link using Ajax - Stack Overflow

javascript - Checking for changes in a given form before unloadleave page - Stack Overflow

plugin development - WC REST API Error Handling

发表评论

推荐文章

this - javascript, access other members during anonymous object creation? - Stack Overflow

categories - Restoring default article pagination on archives – Removing custom limits

quarkus - Is there a way to use @ConfigMapping in a test without @QuarkusTest? - Stack Overflow

javascript - Google maps error Uncaught InvalidValueError: setPosition: not a LatLng or LatLngLiteral: in property lat: not a nu

plugins - Custom Field used to allow a Free Story; no longer works

热门文章

c# - Select row in GridView with JavaScript - Stack Overflow

javascript - How to generate PDF from dynamically loading data with AngularJS? - Stack Overflow

javascript - Will having the &lt;script&gt; tag in html effect the result of the html in a browser? - Stack Overflow

javascript - Sort UL numerically and alphabetically, using each items data-char value - Stack Overflow

Running JavaScript in PHP file - Stack Overflow

How to use Javascript &quot;export&quot; and &quot;import&quot; functions properly? - Stack Overflow

javascript - SECURITY_ERR: DOM Exception 18 when invoking toDataURL method of the Canvas - Stack Overflow

javascript - Checkbox functionality inside popover - Stack Overflow

javascript - Reading from an NFC reader directly into a desktop web application? - Stack Overflow

javascript - Mootools appending html after an ajax request - Stack Overflow

最新文章

windows设置断电重启开机后自动输入锁屏密码登录

Windows系统设置开机默认开启数字小键盘

Windows11 开机自动同步时间（开机时间不更新问题）

windows配置开机自启动软件或脚本

【Redis】Windows设置Redis为开机自启动

plugin development - WC REST API Error Handling

python - RuntimeError: Given groups=1, weight of size [64, 3, 3, 7, 7], expected input[1, 8, 3, 112, 112] to have 3 channels, bu

javascript - disable or enable multiple inputs - Stack Overflow

javascript - Onclick SyntaxError: identifier starts immediately after numeric literal - Stack Overflow

python 3.x - How to effectively create an alias attribute for a given attribute - Stack Overflow

惠普OMEN 15-CE001TX 2EF91PA参数报价

苹果新款MacBook Pro 15英寸 i732GB1TBVega Pro 20参数报价

联想Y330A-PSE L参数报价

神舟战神Z7 D6 i7-12650H16GB512GBRTX4050旗舰版参数报价

神舟战神Z7 D6 i7-12650H16GB1TBRTX4050参数报价

javascript - How would I get text from an input 'type=text' element using jquery? - Stack Overflow

javascript - Will having the <script> tag in html effect the result of the html in a browser? - Stack Overflow

How to use Javascript "export" and "import" functions properly? - Stack Overflow