cpu - Execution stages in a superscalar microarchitecture - Stack Overflow

IT技术

更新时间：2025-04-082

admin管理员组
文章数量:1394526

In this article it is stated that (under Multiple issue - Superscalar)

the fetch and decode/dispatch stages must be enhanced so they can decode multiple instructions in parallel and send them out to the "execution resources"... Of course, now that there are independent pipelines for each functional unit, they can even have different numbers of stages.

So now, my question is, when we say that a superscalar processor having 14-19 stages (i.e. as in Intel Skylake) do these execution functional units count as seperate stages.

Skylake Core

That is in this skylake core, are the INT ALU, INT DIV and so on (in the first functional unit of the EUs) considered seperate stages?

In this article https://www.lighterra/papers/modernmicroprocessors it is stated that (under Multiple issue - Superscalar)

the fetch and decode/dispatch stages must be enhanced so they can decode multiple instructions in parallel and send them out to the "execution resources"... Of course, now that there are independent pipelines for each functional unit, they can even have different numbers of stages.

So now, my question is, when we say that a superscalar processor having 14-19 stages (i.e. as in Intel Skylake) do these execution functional units count as seperate stages.

Skylake Core

That is in this skylake core, are the INT ALU, INT DIV and so on (in the first functional unit of the EUs) considered seperate stages?

Share Improve this question edited Mar 27 at 10:22 asked Mar 27 at 9:33 Rishi 415 bronze badges

I think they're descriptions of (the classes of) the instructions that this particular execution unit can execute. Every instruction uses one of them. This is not related to pipeline stages. – Bergi Commented Mar 27 at 10:05

Add a comment |

1 Answer 1

Sorted by: Reset to default 1

14-19 stages is a measurement of length. (And is assuming 1-cycle latency integer instructions like add, so it's counting exec as only 1 stage).

The number of parallel execution units is a measure of width of the pipeline's execution capabilities.

The 14-19 stage variation in length comes from uop-cache hit vs. legacy decode to feed the front-end with decoded instructions (uops). uop-cache hits have better branch-miss latency due to the shorter pipeline. See https://www.realworldtech/sandy-bridge/3/ (Skylake is a later generation of the Sandybridge-family, with the same basic design.)

The narrowest point in the pipeline is normally the issue/rename stage, such as Skylake with where it's 4 fused-domain uops wide. vs. uop-cache fetch being 6-wide also in the fused domain. And retirement being 4 per logical core IIRC, so up to 8 per cycle if both hyperthreads are active. In the unfused domain (scheduler and execution ports), Skylake has 8 ports.
With some work in the scheduler after a slow instruction completes, all 8 can be busy every cycle for a while, but the highest sustained unfused-domain throughput possible on Skylake is 7 uops/cycle (https://www.agner./optimize/blog/read.php?i=581#857) with a loop that's 4 fused-domain uops: two load+ALU uops and one store.

本文标签： cpuExecution stages in a superscalar microarchitectureStack Overflow

版权声明：本文标题：cpu - Execution stages in a superscalar microarchitecture - Stack Overflow 内容由网友自发贡献，该文观点仅代表作者本人，转载请联系作者并注明出处：http://www.betaflare.com/web/1744099834a2590823.html，本站仅提供信息存储空间服务，不拥有所有权，不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容，一经查实，本站将立刻删除。

编程频道|软件玩家 - 软件改变生活！

cpu - Execution stages in a superscalar microarchitecture - Stack Overflow

Skylake Core

Skylake Core

1 Answer 1

更多相关文章

cpu - Execution stages in a superscalar microarchitecture - Stack Overflow

发表评论

推荐文章

javascript - D3.js zoom with nested svg breaks viewport in Internet Explorer - Stack Overflow

javascript - How to remove span onclick or on modify? - Stack Overflow

is it possible to use post-type as part of a css selector in block editor stylesheet?

javascript - How to make jquery autocomplete search on click - Stack Overflow

Slurm job kill intimation - Stack Overflow

热门文章

sql - Compare entire rows of data based on a single matching column value - Stack Overflow

javascript - Open link from variable in iframe - Stack Overflow

javascript - .getElementById(&#39;&#39;).getElementById(&#39;&#39;).style doesnt work - Stack Overflow

javascript - merge two objects in lodash - Stack Overflow

javascript - jQuery UI number spinner event question - Stack Overflow

javascript - Enable or disable kendo grid columns based on another column value - Stack Overflow

javascript - Zoom to cursor position PIXI.js - Stack Overflow

plugins - Get page slug in Admin menu

javascript - Does setAttribute automatically escape HTML characters? - Stack Overflow

How to use Azure AD for authentication?

最新文章

windows设置断电重启开机后自动输入锁屏密码登录

Windows系统设置开机默认开启数字小键盘

Windows11 开机自动同步时间（开机时间不更新问题）

windows配置开机自启动软件或脚本

【Redis】Windows设置Redis为开机自启动

php - Using smarty variable within a jQuery Function - Stack Overflow

python - Is there a callable for generating ids with `pytest.fixture(param=...)` the same way it would be generated with `pytest

javascript - Jquery Datepicker in AJAX page - Stack Overflow

javascript - Date parse string in `dd-M-yyyy` format - Stack Overflow

azure data factory - ADF dynamic row starting position - Stack Overflow

惠普OMEN 15-CE001TX 2EF91PA参数报价

苹果新款MacBook Pro 15英寸 i732GB1TBVega Pro 20参数报价

联想Y330A-PSE L参数报价

神舟战神Z7 D6 i7-12650H16GB512GBRTX4050旗舰版参数报价

神舟战神Z7 D6 i7-12650H16GB1TBRTX4050参数报价

javascript - .getElementById('').getElementById('').style doesnt work - Stack Overflow