admin管理员组文章数量:1355522
When you split an algorithm/function/whatever to run as separate threads, let's say I launch 8 threads, you don't know that each thread is going to run on a separate one of my 8 cores, as it's the job of the scheduler to decide which threads are given to which cores. Still, if I have 8 cores and I do split a job into 8 threads, I pretty much expect that to happen, each of my 8 cores (roughly) will take one eighth of the workload. In the case of Intel processors that have P and E cores (performance and efficiency cores), the P cores might be clocked at 5.4Ghz and the E cores might be clocked at 4.2Ghz. Does this type of processor with two different types of processors make multithreaded programming more unpredictable or less desirable? The two-tier system is common in other devices like smartphones and Apple CPUs, and the same question applies. As a programmer how are you supposed to account for the fact that when you run something on a different thread, say you spawn a new thread or another thread is waiting in a thread pool for a job, it may run on a performance core or an efficiency core? Do you have any choice?
When you split an algorithm/function/whatever to run as separate threads, let's say I launch 8 threads, you don't know that each thread is going to run on a separate one of my 8 cores, as it's the job of the scheduler to decide which threads are given to which cores. Still, if I have 8 cores and I do split a job into 8 threads, I pretty much expect that to happen, each of my 8 cores (roughly) will take one eighth of the workload. In the case of Intel processors that have P and E cores (performance and efficiency cores), the P cores might be clocked at 5.4Ghz and the E cores might be clocked at 4.2Ghz. Does this type of processor with two different types of processors make multithreaded programming more unpredictable or less desirable? The two-tier system is common in other devices like smartphones and Apple CPUs, and the same question applies. As a programmer how are you supposed to account for the fact that when you run something on a different thread, say you spawn a new thread or another thread is waiting in a thread pool for a job, it may run on a performance core or an efficiency core? Do you have any choice?
Share Improve this question asked Mar 30 at 18:38 ZebrafishZebrafish 14.8k3 gold badges66 silver badges152 bronze badges 8- I think intel onetbb is most optimal in this regard, you distribute work based on the chunk size, say 1000 work item per chunk, if you have 1 million items then those are 1000 tasks, but it has a lower limit on task size of say 500 items, so 1800 items are 2 of 900 items tasks, and for < 1000 items you get only 1 worker. You control the chunk size. I have found this chunk size based approach much more repeatable than a thread-count based one. As you can maxmize temporal coherence of cache – Ahmed AEK Commented Mar 30 at 21:08
- For most algorithms you can try different chunk sizes and draw a "bath tub curve" to get the optimal range for the chunk size. – Ahmed AEK Commented Mar 30 at 21:20
- It is also not a given that increasing the number of virtual CPUs running in parallel always increases performance. Once you saturate memory bandwidth or disk IO capacity (whichever happens first) adding extra workers can even slow things down and certainly generates more heat. You have to experiment to find the performance sweet spot for your problem. – Martin Brown Commented Mar 31 at 7:48
- @AhmedAEK OK, I hadn't heard of Intel onetbb. I also hadn't heard of OpenMP. To tell you the truth I'm tired of learning APIs and libraries because you have to relearn everything when you change library/API. I would prefer to stick to raw code doing it. Would the equivalent of what you do with Intel onetbb or OpenMP be doable is plain C++ code? C++ has the facility to start threads and manipulate atomic variables. You can build anything with that, can't you? Also, at this level I think I'm learning something valuable. – Zebrafish Commented Mar 31 at 12:05
- @Zebrafish yes, you can write that in pure c++, but at least learn why those people made something that way and what pros/cons of their approach is, to be able to build something of comparable quality instead of repeating old mistakes, also according to steamDB, intel onetbb is among the most used C++ SDKs, it provides concurrent containers, parallel version of common algorithms like prefix sum and sort, and good graph schedulers, and good tasks distribution. it is like the goto for utilizing multiple cores efficiently in complex applications – Ahmed AEK Commented Mar 31 at 12:11
1 Answer
Reset to default 2If you divide your workload into equal-size parts then the P-cores will finish first. But if you divide into smaller parts and have threads grab another chunk when they're done their first, like OpenMP schedule=dynamic instead of static, you can keep all cores busy until all the work is done.
Or if there are lots of parallelizable tasks to be done, and later ones can start while some threads of the first are still finishing, that makes it easy to sent work to a thread pool.
Dividing your work into 8 equal-sized parts for an 8-core CPU can be bad even on a homogeneous CPU if there's any other load: if some threads are descheduled for a while they won't finish as early as threads that ran the whole time. (Especially if the total time is on the same scale as a scheduling granule, e.g. 10 ms for Linux with HZ=100.)
So there's already reason to divide up work into moderate-size chunks for threads to consume, especially if you're using a sophisticated thread-pool system like OpenMP which can do that for you without having to write a lot of extra code.
本文标签: multithreadingP and E cores when doing multithreaded programmingStack Overflow
版权声明:本文标题:multithreading - P and E cores when doing multithreaded programming - Stack Overflow 内容由网友自发贡献,该文观点仅代表作者本人, 转载请联系作者并注明出处:http://www.betaflare.com/web/1743978084a2570959.html, 本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容,一经查实,本站将立刻删除。
发表评论