admin管理员组文章数量:1344297
The task was to implement various matrix multiplication algorithms using OpenMP. It turned out that with num_threads(1), the program runs faster than with any other number of threads. Is this due to the overhead of data transfer taking too much time, or is the issue something else?
#include <omp.h>
#include <stdio.h>
#include <cstdlib>
#include <ctime>
int main(){
const int n = 1024;
const int m = 128;
int** A = new int* [n];
int** B = new int* [n];
int** C = new int* [n];
for (int i=0; i<n; i++){
A[i] = new int[n];
B[i] = new int[n];
C[i] = new int[n];
}
for (int i=0; i<n; i++){
for (int j=0; j<n; j++){
A[i][j] = rand() % 10;
B[i][j] = rand() % 10;
}
}
clock_t start = clock();
#pragma omp parallel for num_threads(6) schedule (dynamic,m)
for (int j=0; j<n; j++){
for (int i=0; i<n; i++){
for (int k=0; k<n; k++)
C[i][j] += A[i][k]*B[k][j];
}
}
clock_t end = clock();
printf("Time for j-i-k: %f\n", (double) (end - start) / CLOCKS_PER_SEC);
start = clock();
#pragma omp parallel for num_threads(6) schedule (dynamic,m)
for (int k=0; k<n; k++){
for (int i=0; i<n; i++){
for (int j=0; j<n; j++)
C[i][j] += A[i][k]*B[k][j];
}
}
end = clock();
printf("Time for k-i-j: %f\n", (double) (end - start) / CLOCKS_PER_SEC);
for (int i=0; i<n; i++){
delete[] A[i];
delete[] B[i];
delete[] C[i];
}
delete[] A;
delete[] B;
delete[] C;
}
The task was to implement various matrix multiplication algorithms using OpenMP. It turned out that with num_threads(1), the program runs faster than with any other number of threads. Is this due to the overhead of data transfer taking too much time, or is the issue something else?
#include <omp.h>
#include <stdio.h>
#include <cstdlib>
#include <ctime>
int main(){
const int n = 1024;
const int m = 128;
int** A = new int* [n];
int** B = new int* [n];
int** C = new int* [n];
for (int i=0; i<n; i++){
A[i] = new int[n];
B[i] = new int[n];
C[i] = new int[n];
}
for (int i=0; i<n; i++){
for (int j=0; j<n; j++){
A[i][j] = rand() % 10;
B[i][j] = rand() % 10;
}
}
clock_t start = clock();
#pragma omp parallel for num_threads(6) schedule (dynamic,m)
for (int j=0; j<n; j++){
for (int i=0; i<n; i++){
for (int k=0; k<n; k++)
C[i][j] += A[i][k]*B[k][j];
}
}
clock_t end = clock();
printf("Time for j-i-k: %f\n", (double) (end - start) / CLOCKS_PER_SEC);
start = clock();
#pragma omp parallel for num_threads(6) schedule (dynamic,m)
for (int k=0; k<n; k++){
for (int i=0; i<n; i++){
for (int j=0; j<n; j++)
C[i][j] += A[i][k]*B[k][j];
}
}
end = clock();
printf("Time for k-i-j: %f\n", (double) (end - start) / CLOCKS_PER_SEC);
for (int i=0; i<n; i++){
delete[] A[i];
delete[] B[i];
delete[] C[i];
}
delete[] A;
delete[] B;
delete[] C;
}
Share
Improve this question
asked yesterday
Илья АнненковИлья Анненков
112 bronze badges
New contributor
Илья Анненков is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
11
|
Show 6 more comments
1 Answer
Reset to default 0I believe the problem might be with you clock
usage. OpenMP has a function omp_get_wtime()
that gets the wall clock time and allows time measurements in parallel regions which is more accurate. Furthermore after running you code with omp_get_wtime()
and modifying the code with num_threads(1)
and (6)
the time difference is quite drastic.
Comparison on times
You can also set the number of threads using OMP_NUM_THREADS=1 or OMP_NUM_THREADS=6 instead of in the code, since it allows for faster testing.
Also make sure to compile with the -fopenmp flag
本文标签: cWhy do OpenMP programs run faster on a single process than on multiple processesStack Overflow
版权声明:本文标题:c++ - Why do OpenMP programs run faster on a single process than on multiple processes? - Stack Overflow 内容由网友自发贡献,该文观点仅代表作者本人, 转载请联系作者并注明出处:http://www.betaflare.com/web/1743779689a2537604.html, 本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容,一经查实,本站将立刻删除。
std::vector<std::vector<int>>
(you're already leaking memory). As for usingclock
to measure performance... I would recommend you use a profiler, that will tell you where your bottlenecks really are. – Pepijn Kramer Commented yesterdaynew
/new[]
anddelete
/delete[]
doesn't belong in modern C++. Use smart pointers, RAII etc. Don't write C++ like we did in 1998. – Jesper Juhl Commented yesterdayclock
indeed. It measure the CPU time which means it increase with the number of threads running. Note OpenMP spawn threads and not processes. Oh and dynamic schedule is generally not a good idea, especially for matrix multiplications with a very stable uniform execution. Besides usingnum_threads(6)
is a bad practice here. You should really use the environment variableOMP_NUM_THREADS
instead. – Jérôme Richard Commented yesterday