Senior Intel Engineer Explains the Radical Shift in CPU Design

[-]

Geddagod@reddit

Such a bummer that Intel backed out of their commitment for a LNL reddit Q&A on the r/intel subreddit. They used to do it for previous launches, such as RKL, but it seems like they no longer will. This interview seems great though.

Something interesting they mentioned is that by moving to large partitions they were able to increase cell utilization and area efficiency a good bit, area efficiency being a large problem for previous Intel cores.

[-]

Exist50@reddit

Something interesting they mentioned is that by moving to large

This is otherwise known as what everyone else, including Intel's own Atom team, has been doing for 10-20 years prior.

[-]

Geddagod@reddit

The Intel dude in the interview was insistent that no one did that years ago, at least those who were running the cores at as high frequencies as Intel were. He claims that no one had the design tools to create partitions as big while still hitting the same frequencies and high voltages they were getting even 10 years ago.

The point you mentioned was brought up by KitGuru in his question about it as well (15:48).

[-]

Exist50@reddit

The Intel dude in the interview was insistent that no one did that years ago

If that's the case, it's only by a technicality of no one else having hit such high frequencies. But if you ignore frequency and look at the design methodology for other high perf CPUs (including ones that outperform prior Intel P-cores), then yeah, it's abundantly clear that they were simply behind the times.

And it's doubly ironic given the P-Core team didn't want to update their design methodology to begin with. Keller forced them to.

[-]

jaaval@reddit

10 years ago was right when skylake launched. AMD competition ran even higher frequencies. And I don’t think any core outperformed it. It would be about the time AMD was laying out zen1. Apple was designing 2ghz chips at the time, which were already impressively good but not high frequency by the standards of the time.

[-]

braiam@reddit

It's probably a case of "I haven't heard anybody doing it, so it isn't happening". CPU design can be a bit of information silos where things that are common place are never shared in ways that competitors are aware of.

[-]

BrightCandle@reddit

It makes sense that one you get more and more cores that the single threaded part starts to dominate performance, Amdahl’s law always applies. What I think a lot of people haven't realise is that SMT is costing the single threaded performance as well, because it makes the core bigger and more power hungry and you could use that transistor and power budget to make the single threaded part go faster.

So at a certain point where the core count is quite high SMT/hyperthreading stops being a big 30% win and instead becomes an overall loss. I am not surprised to see that happens at about 16 cores and we are seeing that in a lot of games they are already avoiding SMT with thread affinities so its actually for a lot of the gaming workloads a negative impact.

I think Intel is right here and I think AMD is going to be wrong and should consider removing SMT as soon as they can. It was a great cheap way to add extra threading and utilise ports better in the past but once you have a lot of cores actually you want to use those resources to either getting more full cores and also faster cores for the serial part to run on.

[-]

VenditatioDelendaEst@reddit

So at a certain point where the core count is quite high SMT/hyperthreading stops being a big 30% win and instead becomes an overall loss by harming the serial performance part of the computation.

I didn't follow this argument, except the weak form that when concurrency < core count, you'd rather have faster SMT-less cores.

Otherwise, it seems like it's implicitly assuming near-ideal scheduling, where you know which thread(s) are on the serialized critical path, and put it/them on the fastest core(s). Possible in theory for cyclical workloads like gaming -- each frame should be a lot like the last (ignoring asset loading, etc.) -- but in the general case it's the halting problem. The ninja build system has built its reputation on performance, and even getting it to start critical-path jobs first has been a decade-plus bikeshed.

Intentionally idling the SMT siblings of threads on the critical path is a thing a Sufficiently Advanced Scheduler could do as well.

[-]

cyperalien@reddit

he said the removal of SMT allowed them to iterate faster on per thread performance so we'll see how that pans out.

[-]

Flaimbot@reddit

unless their SMT implementation contains some elvish runes and dark magic, SMT is just an additional program counter, so i don't get what he's even talking about

[-]

EmergencyCucumber905@reddit

Not just program counter. It's an additional register file, additional instruction queue, and probably a bunch of other things.

[-]

symmetry81@reddit

Worse, a single big physical register file being mapped to architectural registers of two different threads plus all the intermediate work for the out of order work being done. My prof once said that the hardest hardware bug he'd ever had to solve was a register leak in this scenario where the chip lost track of who was supposed to own a particular physical register, so the number of registers available to the threads slowly decreased over time.

[-]

Flaimbot@reddit

the answer i was looking for. thanks!

[-]

VenditatioDelendaEst@reddit

There was a whole bit about how hard it is to verify that there aren't any information leaks between threads.

[-]

Cortisol-Junkie@reddit

Thinking that anything in a modern CPU is "just" [something] means that you don't know enough about CPU design.

[-]

Plank_With_A_Nail_In@reddit

Can you explain how its not just a counter instead of being an elitist asshat?

[-]

VenditatioDelendaEst@reddit

Read again what it was a reply to. Derogatory elitism gets derogatory elitism in return. Especially when the 1st guy in confidently wrong.

[-]

jaaval@reddit

It’s also at least splitting the register file and a system for how to dynamically split OoO resources.

[-]

Die4Ever@reddit

and security testing/validations/fixes/mitigations 😱

[-]

Cortisol-Junkie@reddit

You can look into chapter 5 of this book, which talks about some of the changes you need to make to add SMT to MIPS R10K, which is almost 30 years old.

[-]

1600vam@reddit

Senior Principal Engineer vs Some Dude on Reddit. Just because you don't understand doesn't mean he's wrong. There's a lot of hardware involved in supporting SMT, and it's a nightmare for side channel vulnerabilities.

[-]

Flaimbot@reddit

i never said he's wrong. i specifically outlined that i don't understand what he's talking about in regards to his claim being able to iterate faster, as my understanding goes as far as SMT being just an additional program counter, which would not be a major slowdown for that if that were the case.

[-]

mrgorilla111@reddit

That is certainly not how you came across lol. In the original comment you sound like you’re call BS on his claims.

[-]

hackenclaw@reddit

simplifying the Software scheduling too.

Now we only got Big core threads and E core threads.

I like this direction we are going.

[-]

ConsistencyWelder@reddit

I love how this sub seems to be the only sub left on Reddit keeping the dream of Intel alive. Not even r/intel believes in the company as much as r/hardware does.

[-]

BrightCandle@reddit

There is no doubt Intel engineers know what they doing, they have been the top or second best CPU designer and manufacturer for nearly 50 years. They have missed big moves a few times as something about how AMD sees things means they turn up with giant leaps in performance that take Intel many years to respond to (since they are working 3-4 generations ahead of the consumer market) but they adopt and come back.

They have made some bad calls in how they tied manufacture to design and that has already been corrected years ago. I don't ever rule Intel out, they still sell an absurd amount of silicon even when they are doing badly.

[-]

imaginary_num6er@reddit

Why are they even still at Intel? Like they got coffee and fruits removed while working in Silicon Valley and that level of disrespect would have them go walk next door to Qualcomm, AMD, or Nvidia

[-]

mrgorilla111@reddit

Intel brought coffee back. All they’re missing out on is bananas and very mid apples
Not everyone works in Silicon Valley lol.
The hardware office life/culture is not nearly as glamorous as software companies.

[-]

BrightCandle@reddit

If you have ever been in an organisation where the board and investors are constantly choosing bean counter idiots to run the organisation a lot of it is knowing you'll outlast their stupid arse, the coffee and fruits will be back. This sort of work isn't that common but many will have looked elsewhere but a lot of what makes work good or bad is directly in your team rather than the wider organisation.

[-]

auradragon1@reddit

This sub doesn't believe in Intel. This sub is full of gamers, who desire competition to AMD & Nvidia in order to lower $/fps.

[-]

LickIt69696969696969@reddit

Wait until they discover photonic computing, any decade now ...

[-]

Scary-Mode-387@reddit

Intel photonics will go into some future xeons I think diamond rapids itself. At least a test chip I think has it.

[-]

steak4take@reddit

What are you talking about?

https://www.intel.com/content/www/us/en/research/integrated-photonics.html https://community.intel.com/t5/Blogs/Tech-Innovation/Data-Center/Intel-Labs-Researcher-Spotlight-James-Jaussi-and-Integrated/post/1541580 https://download.intel.com/newsroom/archive/2025/en-us-2021-12-08-intel-launches-integrated-photonics-research-center.pdf

[-]

GTS81@reddit

Sure is fun when convergence happens between 6 partitions vs 300 fubs but once an ECO (Engineering Change Order) hits or a bug requires just 50 gates to be edited, you're touching a partition with 5M stdcells. You don't want to be that person 2 weeks before base layer tape-in.

[-]

makistsa@reddit

Great video. He is not afraid to talk about how they work.