Why aren't new consumer CPU generations getting physically bigger?
Posted by JL-gLimpse@reddit | hardware | View on Reddit | 43 comments
[removed]
Posted by JL-gLimpse@reddit | hardware | View on Reddit | 43 comments
[removed]
teutorix_aleria@reddit
3D VCache is manufactured as a separate die. It is bonded to the top (bottom on the 9000 series) of the CCD and connected with TSVs to minimize latency.
Complete negates the point of a large L3 cache, if it sits off package or as a seperate die somewhere else on the same package the latency makes it completely ineffective as an L3 cache.
Enterprise chips dont have a decreased density, they just have way more hardware on board and are therefor larger.
HEDT systems like threadripper already exist with coolers designed for them.
yakovlevtx@reddit
I design enterprise chips for a living. There are plenty of machines with off die cache that isn't stacked. You're right that the latency is higher, but that just means the cache has to be larger to be useful, not that it's useless (especially as a victim cache or possibly as a snoop filter in a large SMP.) Chip stacking allows a bigger performance boost from a smaller cache die, but off chip caches without stacking have been around for decades.
Pristine-Woodpecker@reddit
Is this for stuff like ccNUMA clusters?
yakovlevtx@reddit
It's far more mundane than that. Just wanting more cache than fits on a die so adding a separate cache chip. You're right that having a local cache can make a bigger difference if memory is farther away.
That said, ccNUMA isn't that uncommon in server platforms, and on some systems designed for large single image workloads like IBM POWER the cost to go to another node isn't necessarily that high.
Strazdas1@reddit
as long as its big enough and lower latency then memory then it will be useful. How useful per dollar is another question.
IOnlyPostIronically@reddit
Further data needs to travel, the slower the chips are as well
Strazdas1@reddit
in-core latency isnt that big of an issue compared to benefits fatter cores can give you. See: apple M chips.
Giggleplex@reddit
Larger dies are more expensive.
COMPUTER1313@reddit
Also worse yields, which was why AMD went with the chiplet strategy almost a decade ago, and now Intel is following suit.
The only significant company that took the opposite approach of using an entire wafer is Cerebras, which they specifically designed their wafers to be very fault tolerant of the inevitable random defects.
Strazdas1@reddit
Cerebras also have a very specific use case for their wafer-chips.
Plank_With_A_Nail_In@reddit
worse yields/expensive are the same picture.
JL-gLimpse@reddit (OP)
Enterprise CPUs are huge in comparison to the consumer ones. Those server CPUs (while not necessarily pushing the same speeds), need to be a lot more reliable! There must be a middle ground.
Nicholas-Steel@reddit
There is/was a middle ground, AMD's Threadripper CPU's.
Just_Maintenance@reddit
Those CPUs are also very expensive. If you are up to buying an $18k CPU you can right now.
COMPUTER1313@reddit
If you looked at AMD's Eypc server CPUs, they use rows of the same compute dies as the desktop CPUs. The only "big" component is the I/O die, which is fabbed on an older and cheaper process.
the_dude_that_faps@reddit
Let me give you three reasons.
First, connecting dies through traces like what AMD does with their chiplets costs power and has bandwidth limitations. Intel had a big cache die that was connected like this way back when broadwell was launched. That connection presented limitations that stacking does not, making an L3 cache with that configuration unlikely to perform as needed.
Stacking dies is a way to eliminate both of these issues at the expense of increasing thermal density. You could connect these with other means like an interposer or a bridge of some kind, but that increases costs too.
Secondly, adding dies increases costs, though not as much as actually packing more transistors into a single die. Costs in packaging and also costs due to failures in packaging. It's an added step in the process that will not be executed perfectly all the time.
Lastly, there's a limit to how physically large a single die can be and those dies are usually very expensive because they are very likely to have defects. Defects are distributed over the surface of the silicon wafer. If the dies are small, defects affect a lower proportion of the output.
So, in essence, while it could theoretically be possible, the issue stems from the fact that all of it is costly. AMD gets away with it because they reuse the io die in multiple generations and because they reuse the CCD across segments. They have not been able to scale this down to mobile parts, which is where you can see the trade-offs they're making.
CummingDownFromSpace@reddit
>why aren't the chips increased in size to accommodate more tech?
They are. Every CPU generation contains more instructions (or 'tech') than the last. See this graph for the number of transistors per chip:
https://en.wikipedia.org/wiki/Transistor_count#/media/File:Moore's_Law_Transistor_Count_1970-2020.png
In terms of physical size, CPUs are limited by the 'reticle limit'. This is the limit of the optics used to generate a flat CPU image to print the CPU core onto the silicon. Any bigger, and the image starts to get curved/distorted and the traces wont work. High-NA EUV has a reticle limit of 26mm x 16.5mm. CPU designers have to work within these limits when designing chips.
> decrease chip density allowing hugely improved cooling.
Density is a factor of the process node (5nm, 3nm). If you somehow did double-space your gates, you'd hit the reticle limit twice as quickly, so your CPU now has half as many transistors. You also wouldn't get a straight 50% reduction in heat/power use, as the traces between the transistors use current and generate heat too. If they are twice as long, they are using twice as much current.
> Or a chip where some of the die's are no longer stacked, to decrease chip density allowing hugely improved cooling to push the limits of the die's even further, just like what is done on a lot of enterprise chips.
Stacking is done to reduce latency and increase performance. Enterprise chips are built for specific tasks that have different performance metrics than most consumer tasks. Gaming for example requires a high frequency, low latency CPU core, where as enterprise DB tasks can be massively multi-threaded and require more RAM, and not as much cache, so stacking doesn't benefit them. CPUs can do all tasks, but some CPUs are built to do different tasks better. Its a trade off in the design.
3G6A5W338E@reddit
As size of die increases linearly, probability of fault increases exponentially.
Or, same thing, yield drops and cost skyrockets.
markhachman@reddit
Which is why some Intel CPUs have integrated graphics and some "don't"
SignificantEarth814@reddit
Also why some are suitable for over clocking, others better for mobile
ExtendedDeadline@reddit
What
Grodd@reddit
Not op but I assume they're talking about competitive over clocking.
The folks that do it will buy a dozen of the same CPU and check them all to find the ones with the fewest faults so they can handle the load.
ExtendedDeadline@reddit
For sure, but the person I responded to is wrong.
The iGPU parent comment refers to -F SKUs where the iGPU is disabled, most for binjing/defects.
The equivalent for the OP's comment would be -K/S SKUs vs non-K SKUs when discussing overclocking potential. Intel isn't just taking their defective/unsatisfactory -K SKU chips and putting them in laptops en masse...
kyralfie@reddit
High leakage dies clock better but are less efficient and consume more energy. Those go into desktop and K SKUs. Low leakage dies clock worse but are more efficient. Those generally go into laptop SKUs.
ExtendedDeadline@reddit
What
DonTaddeo@reddit
Aside from the cost of larger dies, there is the issue of keeping electrical interconnections short to manage signal losses and timing skews. interconnection length. I remember the dual inline packages that were popular in the 1970s for TTL logic chips. Even with the much slower technology of the time, there were problems maintaining the integrity of power voltages and logic signals. In particular, there were some octal register chips in 20 pin packages that suffered data corruption under some conditions.
Concillian@reddit
AMD desktop CPUs must be common with server or laptop CPUs for AMD's business model to work. Server and laptop want low leakage, efficient dies, but desktop is usually good with high leakage, they usually clock higher. So AMD's business model bakes in the commonality so they can save on scrap, and have better binning for each application. Also desktop margins are lower, so they prioritize their higher margin markets and desktop kinda gets what's leftover.
They'll never make any kind of specialty desktop CPU under current management. Best you can hope for is x3d style where they modify existing dies. The unfortunate reality is that desktop is neither high enough volume nor margin to warrant anything exclusive.
Elegant_Hearing3003@reddit
Why on earth would they need to, what are you, what is anyone but someone in a very specific profession, even doing that requires such?
greggm2000@reddit
The basic reason is the cost to make the die vs. what people will pay for it in a finished product. This is why GPUs are physically a lot bigger usually than CPUs, since people have shown they'll pay more for GPUs than CPUs, enough more to make it worthwhile.
I do see reasons why the CPU size might get a lot bigger in future generations: One would be if AMD or Intel decide to incorporate a lot more "GPU" as integrated graphics, basically making APUs a desktop offering. Another would be if they get serious about having a NPU on their CPUs.. Intel already now does this with Arrow Lake on the desktop, but it's NPU is really weak.. Or, maybe it turns out that a lot of extra cache (including perhaps SRAM) is the way to extract a lot more performance. Additional/better IO could be another factor to increase overall CPU size. My point is that while the CPU dies may or may not get bigger, the overall CPU package (composed of chiplets and other components) probably will.
karatekid430@reddit
Yield is on the order of an inverse exponential function of number of transistors in a die
SJGucky@reddit
CPUs are already using more power per mm² then even the 4090.
pceimpulsive@reddit
You can't put the vcache on a seperate die (well it already is a separate die).
If you move it away from tied to the CCD you add too much latency to make it worth anything...
Adding an NPU has already been done in some laptop chips. Once NPUs become standard and have actual real uses we will surely see them slip in, I'd guess on the IOD rather than another compute die.. let's see though yeah?
Simone1998@reddit
Bigger dies means less die per wafer. Bigger dies mean more defects per dies, which means even less working dies per wafer. Also, modern processes have a reticle limit of about 800 mm\^2, and getting even close is difficult. Bigger dies are exponentially more expensive (iirc, it goes as area \^ 4).
SERIVUBSEV@reddit
Lack of demand.
Even heavy workloads like 3D rendering can be offloaded to some service or script to run on a server somewhere. So most people get slim laptops and devices with better thermals.
Doesn't benefit gaming either, because they need faster cores more than more cores. Threadrippers have same fps as 8-16 core chips.
On servers the demand is massive, considering every individual core can be virtualized and sold separately. And Colo rent costs per unit, so if they can fit whole thing in 4u instead of whole rack, it's a big win.
AMv8-1day@reddit
Die sizes are already maxing out yield limits. Why do you think AMD and more Intel are shifting to multi-die "glued together" CPUs?
stonktraders@reddit
You are not wrong, but are you willing to pay for the price similar to Apple’s max and ultra SoC?
kyralfie@reddit
I'm picturing. Non-stacked means it will be much higher latency - no longer L3 level and less beneficial for gaming.
Aggrokid@reddit
CPU tasks are usually latency sensitive and size = latency.
nanonan@reddit
They are getting bigger in that they are using more transistors, but they also get smaller as node technology advances. Stacking dies has advantages mostly due to keeping everything close together that can outweigh any thermal or other advantage from spreading them out.
ResponsibleJudge3172@reddit
Money.
A CPU gets to cost like a 4070 with a 4060 die size with customers feeling it's worth it wholile no on board RAM, PCB, etc are included. Heck, not even a fan sometimes. This means a CPU at a given die size has MUCH better margins than a GPU.
Now adding all sorts of things the increase heat, cooling complexity and so on without the corresponding price increase because no consumer would buy the CPU, just assuming it's data center level is how you flush billions down the drain.
This is where vcache comes in. It stacks on top or below the already existing structure. It's cost to include translates well with its perceived value increase by customers, so an 8 core CPU at that price is shrugged off because it's good.
Not_Yet_Italian_1990@reddit
Probably because it requires a lot more power and heat.
I think they will actually get bigger eventually, once YoY improvements slow down even more.
Sluipslaper@reddit
Bigger cpu, then bigger ihs and better thermals ? I have no idea
AutoModerator@reddit
Hello JL-gLimpse, unfortunately your submission has been removed because of your new account or lack of comment karma. Accounts that do not meet these requirements make up the majority of posts that do not follow the rules for /r/hardware and spam. If your post is asking for help or asking about building a PC, please consider posting to /r/techsupport or /r/buildapc instead. Thanks!
I am a bot, and this action was performed automatically. Please contact the moderators of this subreddit if you have any questions or concerns.