AMD Threadripper 9980X + 9970X Linux Benchmarks: Incredible Workstation Performance
Posted by Kryohi@reddit | hardware | View on Reddit | 93 comments
makistsa@reddit
u/michaellarabel
The llama.cpp tests on the 285k need fixing. The prompt is at least 6 times faster in llama 1b. The inference is faster too.
I got the same results as you, after compiling llama.cpp in an AMD system. Compiling it in the arrow lake system without any special flags fixed the issue.
Artoriuz@reddit
Incredible performance, as expected.
Recently, I've been thinking about how desktop CPUs seem to be lagging behind when it comes to core count. Strix Halo ships with up to 16 cores (same as Granite Ridge), and mobile Arrow Lake-HX goes up to 8+16 (same as desktop Arrow Lake-S)...
It's nice to see AMD keeping HEDT alive. "Normal" consumer CPUs have gotten so small when compared to consumer GPUs they're almost funny to look at.
No-Relationship8261@reddit
It's still only 64 cores.
Since Intel is no longer competition, AMD stopped caring and increasing margins as well.
It seems 16 is the new 4 cores. And 64 is the new 12.
SirActionhaHAA@reddit
Ya know that 96core tr exists right? It's on the pro octachannel platform because the memory bandwidth is holding them back. This is why amd is goin to 16 channels with venice.
No-Relationship8261@reddit
If I wrote this message about Intel back in the day you would be so mad.
You know Xeons with more core exists right......
I can't be bothered, continue living in your own bubble
Helpdesk_Guy@reddit
You know that Xeons weren't available for Joe Average, right?
Yes, Intel had way more cores in the server-space, yet limited the desktop effectively to 4 cores only.
No-Relationship8261@reddit
You could buy it and use it?
I have done so, many people I knew also did.
What do you mean not available to Average Joe.
They just needed a different motherboard just like Thread ripper does.
Helpdesk_Guy@reddit
What then? The often picked quad-core Xeon-E 1234?
No, most higher Xeons of that era were needing actual incredible expensive SERVER-boards with given sockets and hardware, which came mostly only in rack form-factor. So no.
Everyone can by a Threadripper, as it's a workstation-class CPU and hardware, which is freely available.
How do you NOT know what that phrase means?! These parts were NOT freely available. Period.
Anything higher than 4-core chips were so ridiculously priced, that it was unaffordable for 98% of the market.
996forever@reddit
The 6 core i7-5820k was $390 three years before first gen ryzen arrived with qual channel memory and 28 PCIe lanes at a time the 4790k had 16 lanes.
You people have selective memory.
Helpdesk_Guy@reddit
Yes, so? Am I wrong with my assessment? No. Since my former statement is true nonetheless. Pay-walled, intentionally.
Since the CPU itself may have been "rather" cheap, yet it was still effectively pay-walled behind a overtly expensive HEDT-platform of 2011-3 mainboard with outrageous price-tags for that time. $250–$450 USD was not seldom.
996forever@reddit
And the exact same is said of Threadripper except the entry level is far higher priced still both in cpu and board price.
Helpdesk_Guy@reddit
You seem to forget, that even AMD's mainstream topping out at 16c/32t is already more than enough for 99% of normal people using PCs anyway and will be so and future-proof, for easily the next 5 years if not more already (as software evolves way slower in taking advantage of increased core-counts).
So this time, HEDT is really for actual professionals and businesses actually *needing* it, so the significance of a pay-pall is way smaller today to begin with anyway – AMD pushed the mainstream of desktop way into the realm of what was once old HEDT already. Yet back then with Intel's HEDT, it was only for reasons if keeping the desktop on quad-cores.
996forever@reddit
Quad core eight threads absolutely, absolutely WAS "more than enough for 99% of normal people using PCs anyway" in the early to mid 2010s.
No-Relationship8261@reddit
You are right about the price. But you can easily buy an epyc cpu today and back in the day it wasn't different.
I certainly didn't pay 5000$ for a cpu like this thread ripper but there were options for even more I remember.
Helpdesk_Guy@reddit
Geez, are you constantly misunderstanding and mixing up things on purpose?!
With availability, I was talking about Xeons you clown! Not today's offerings.
Back then, you couldn't get a Xeon, even if you had the money
No-Relationship8261@reddit
And I am telling you, that is not correct. I have bought and used single xeon systems when 2770k was around.
Sure it was an insane price but it was also what companies paid for it (I didn't pay a premium for it.)
BleaaelBa@reddit
People did make that comment back then.
Helpdesk_Guy@reddit
Here's some data over actual carelessness Intel vs AMD …
nauxiv@reddit
Why are you counting Threadripper for AMD but not X58-X299 for Intel? Intel offered many higher core count CPU options on HEDT.
Helpdesk_Guy@reddit
Since for a start, Intel has been deliberately stalled advancements in the desktop for a decade, and that is usually associated with people, when talking about their mandated stagnation of "quad-cores for a decade".
Secondly, yes, I compared Intel's common desktop-offerings (instead of HEDT) for reasons above, while putting in into perspective of the pretty nonsensical take of u/No-Relationship8261 (which I actually replied upon!) with his remark and argument of AMD allegedly "stopped caring" – His comments are nonsense, when you think about how Intel didn't increases core-count for a decade (on desktop, while locking everything beyond quad-core behind a paywall), and when you look to what actual levels AMD increased core-count in even less time.
So the whole table is just putting core-count increases (of Intel vs AMD over time) into perspective (and aimed for nothing else really), just to show how laughable his take was, that AMD 'stopped caring' …
Yes, you're absolutely right insofar, that it *would* be insincere to compare Intel desktop vs AMD HEDT (which I didn't, but core-count increments over time), yet that was NOT what I was actually trying to do to begin with anyway …
Yes, objectively you *cannot* compare core-count increases of desktop with HEDT (which wasn't even what I was trying to do anyway here), but solely to put into perspective Intel's ten years of evidently happened offerings of mandated stagnation where they intentionally kept desktop at just 4 cores and most people were fine with it …
… with a comparable time-frame of AMD allegedly 'stopping caring', yet evidently increasing core-count tremendously.
Also, the x58-platform you bring up (or x299-platform for that matter), only supports the stark contrast between both here, as Intel was locking even effing six-cores behind a paywall, when the first Intel hexa-core i7-990X (Gulftown on LGA1366) had a price-tag of no less than $999! – 50% cores (2×) for a 400% (4×) price-increase, when the common Intel-quad-core were around ~250 USD. So just +2 cores for +$750 USD!
So when enthusiasts were rightfully complaining about the blatant stagnation from Intel, Intel reacted halfway through that decade in 2011 in the typical Intel-fashion: They erected their costy paywall for everything above quad-cores at $999 USD and even *increased* it over a ~5 years time-span to even ~$1,600–$1,800 USD (i7-5960X) in 2016.
Remember the ludicrous joke of Skylake-X (7980X at $1,999 USD), which AMD undercut by halve with $999 USD.
u01728@reddit
Are you even measuring the increase in core count over time of the two companies? That Intel has been stagnant on desktop on core count from Kentsfield to Kaby Lake does not negate the increasing core counts on their HEDT/workstation models.
In addition, TR 1950X (2017) has 16 cores, and Intel doesn't have (non-HEDT) desktop quad-cores in 2006 (Kentsfield was Jan '07, Kentsfield XE was Nov '06).
If you are to demonstrate the stagnation in core count on Intel's mainstream desktop segment, AMD's mainstream desktop segment would've been a relatively like-for-like comparison. The 9955WX with its 96 cores is not on the same segment as the 1700X.
I disagree with the statement that AMD stopped caring: core count isn't everything anyway. Even then, that comparison was blatantly unfair.
No-Relationship8261@reddit
Can you tune down your bias a bit.
2017 1950x 16 core 2025 9950x 16 cores One is a thread ripper other is not you say?
2020 3990x 64 cores 2025 9980x 64 cores.
Let's not talk about the fact that prices just keep rising way above inflation as well.
AMD is already the new Intel.
Helpdesk_Guy@reddit
There's no bias, if you read it actually CORRECT for a change!
Since the whole damn table is just putting core-count increases over time—REGARDLESS of platforms, market-segment or price-tags—of Intel vs AMD into perspective and aimed for really nothing else, just to show how laughable your take was, that AMD 'stopped caring' …
No offense, but if you're just too incompetent to effing read a damn table, that's NOT my fault!
No-Relationship8261@reddit
Your damn table is wrong. 1950x had 16 cores in 2017.
So start by not making up stuff if you don't want people calling you out on your bs.
In fact nothing in your damn table is correct regardless on how you look at it.
So please entertain mi and explain how you arrived at it. Honestly this is 2x2 =15 levels of stupid so I can't even fathom your thought process on creating this table.
Where have you gone so wrong?
Helpdesk_Guy@reddit
No, it isn't. Just because YOU fail to get what the table was meant to represent, doesn't makes it wrong.
I already did, twice. Yet it looks you have a very hard time to actually read and especially comprehend things being written by others replying – You might as well just pretend to do for bothering people though.
No-Relationship8261@reddit
I have already proved you wrong.
You are just tripling down.
AMD already had 16 core cpus in 2017, your table implies it was only 8 cores.
Go fix that and come back. I will teach you step by step.
You are too prideful to take it all at once.
Helpdesk_Guy@reddit
Well, there it is.
soggybiscuit93@reddit
There are just economic realities that make this more difficult than "add more cores!"
AM5 Zen5 is already memory bandwidth constrained at 16 cores. Zen 6 is introducing a new IOD/MC to improve bandwidth to allow for 24 cores - and that'll likely also be somewhat memory bottlenecked with DDR5.
We can say "well, move to 256b CPUs in consumer" but that raises the price of the entire platform, across the board, which hurts the volume market who now need to accommodate "quad" channel.
And core count limits are also just a function of node improvements slowing down. Cost per transistor is barely improving. Density improvements are taking longer. New nodes are substantially more expensive than the last.
Intel/AMD just literally can't increase core counts substantially at the same prices due to these two reasons.
No-Relationship8261@reddit
Finally a proper answer.
This was the case for Intel and 4 cores as well BTW.
There wasn't enough bandwidth forit with ddr3 and ddr2.
Their mistake was sticking to it even after bandwidth was there. Which we don't know with AMD yet.
But AMD has been increasing margins quite a bit. We certainly started paying monopoly tax and that is despite still only making 50% of sales.
I really hate how many monopolies are there in semiconductors. We just can't seem to have competition.
Helpdesk_Guy@reddit
Yeah, let's pretend as if software even these days would remotely take advantage of moar cores.
Just look how long it took to get away from the mantra of game-fueled single-thread-sh!t!
Even when Ryzen came to up the ante on cores and AMD was kicking off the ~~Corean War~~ War on Cores™ with four/eight cores as minimum for the desktop, most software was still heavily single-threaded.
Ryzen came pretty much already ten years after dual-cores (2006–2016), yet even by 2017, more than one thread were still seldom used basically a full decade – That hasn't even changed much today.
Now we have virtually two full decades later, yet most software STILL gives a flying f—k about multi-thread.
No-Relationship8261@reddit
So you are saying that Intel Ceo was right and no consumer needs more than 4 cores?
I never saw an app that uses exactly 16 core or 8 cores and no more.
They are either are single threaded, dual threaded or consume as many threads as there is.
The next stop seems to be Numa zones
SoTOP@reddit
Impressively wrong.
VenditatioDelendaEst@reddit
It's closer to the truth than the idea that programs are written "for x number of cores".
Single thread: duh.
Dual thread:
buffered | pipeline | with a | CPU-intensive | limiting step
that uses at least half the total CPU time.As many as there is:
find | xargs
,make -j $(nproc)
.Scaling of the last runs out at the width of the dependency graph, and there are counterexamples involving parallel algorithms with lots of all-to-all communication, but I bet you could come up with a pretty darn good predictive model of CPU performance using only 1T, 2T, and nT benchmarks.
SoTOP@reddit
All it would take is watching one CPU review of past 5 years to know that most programs are in the middle between 2T and nT, something that u/No-Relationship8261 claims does not exist. Even with pretty basic program it's not too difficult to parallelize workload into more than 2 treads, while it's extremely complex to have programs use all available treads.
VenditatioDelendaEst@reddit
When something is easily parallelized, the default obvious thing is to use all available threads.
If you are manually identifying non-dependent subtasks and running them concurrently, that is both harder, and feels like "using more than 2 threads", but in the usual case one of the subtasks is at least as heavy as everything else combined, so it's functionally equivalent to 2T. You could schedule the heavy thread on core 1 and all the others on cores 2-n, and the run time would be not be any shorter with 4 cores than with 2.
If a workload has some 1T parts and some nT parts, and all you have to go on is average CPU utilization and benchmarks from machines with different core counts, that can look kind of like a workload that uses more than 2 and less than n cores, but it isn't. You have to actually sample the number of cores awake at the same time and plot the histogram (and make sure you're only counting the one app, not uncorrelated OS background noise that isn't part of the workload).
It's kind of like how a 5-wide CPU is faster than a 4-wide one, even though it's ludicrously rare for code to sustain 4+ IPC.
No-Relationship8261@reddit
Impressively wrong
Helpdesk_Guy@reddit
What?! No, of course not! I meant the exact contrary of that, naturally.
Intel is the main reason WHY the whole industry was concentrating only to single-thread.
That's what I'm saying, most software even released today, is still single-threaded.
The only widespread notable exception from that rule, are browsers with Google's Blink.
… and if it weren't for outlet's reviews basically slam-dunking every game past Ryzen in 2017, which wasn't able to use more than 1–2 threads and being severely performance-limited DESPITE a lots of unused cores at hand (and with that, directly affecting $$$ through sales!), most game-engines today still wouldn't actually utilize more than 1–2 threads or 4 at the most.
SoTOP@reddit
Nonsense, most stuff released today use more than one tread. The performance is single tread dependent, but that is different thing than being single treaded. Lots of modern games wouldn't even launch on CPU with 2 threads.
Nice fairy tale, executives from gaming companies all over the world watched CPU reviewers complaining about Ryzen being underutilized and because of that told devs to make games multitreaded /s.
In reality consoles being multicore are the most apparent reason why PC games started using more treads. PC version of GTA4 from 2008 already used 3 treads, while most PCs were at best 2C/2T, simply because that's how many cores Xbox 360 had. PS4 generation had 8 very weak cores and when games made to push everything from those systems started releasing in the latter half of console generation even much faster 4C/4T CPUs started getting left behind.
No-Relationship8261@reddit
If there were any point to 16 cores.
There is a point to more cores.
I am not seeing how your statement disagrees with this. But your first comment makes me think otherwise
Helpdesk_Guy@reddit
Yet here we are, with plenty of cores being still not actually really used by much, since most coders out-there are effing lazy and just don't care. Yes, I know about the difficulties to threading/scheduling.
My first sentence in my initial comment about "Yeah, lets pretend…" was meant ironic and sarcastically, hence the polar opposite was meant, obviously
tagubro@reddit
The AMD glazing on Reddit is rampant. I’d almost say the techtubers space is even worse.
puffz0r@reddit
Boohoo?
Kryohi@reddit (OP)
Not sure there is a reason to complain about the number of cores if the performance increase is good regardless, as shown here.
Moreover, we know the next gen is the one with an increase in the number of cores per chiplet and better memory controllers, so both Ryzen and Threadripper will presumably have more cores.
EloquentPinguin@reddit
Zen 6 is expected with 24 Core Desktop but who knows...
future_lard@reddit
And 3 pcie lanes? ;)
surf_greatriver_v4@reddit
The rumour is zen6 will have 12core ccds but it feels like it's been a long time coming
mduell@reddit
Will the 12 core CCD be Zen6 or Zen6C?
I thought I saw a rumor the higher core count CCDs would come with gimped cores.
wintrmt3@reddit
The C cores aren't gimped, they are full Zen cores with all the features just synthesized for small area and pay with maximum clocks.
mduell@reddit
Right, 6C is gimped.
But rereading the rumors it looks like 12 core Z6 and 16 core Z6C.
masterfultechgeek@reddit
For non-cache sensitive workloads not really.
If you have 100ish cores on a package, your clock speed is limited by thermals.
Designing a smaller, cheaper core that uses less power but isn't optimized for TOP SPEEDS could actually get you slightly more clock speed if you're thermally limited.
Don't tell me that the 7995WX isn't limited by power/thermals in nearly every real world deployment.
mduell@reddit
At 100 cores, sure.
But the roadmap rumors include single CCD parts.
masterfultechgeek@reddit
I mean... in practice current Zen desktop parts start to throttle with just two CCDs in them...
The amount of "gimping" is pretty minimal. Keep in mind Zen 5 has something like 2-3x the IPC and about 2x the clock speed of cores from 20ish years ago.
That isn't to say that there aren't use cases for the bigger, fatter versions of the cores. I suspect that it's EASIER to design these, which helps with iteration speed (aka time to market). It's also useful for a handful of workloads that rely on cache OR are lightly threaded.
In practice we're talking VERY minor performance differences, per core.
mduell@reddit
If that was the case, why are they doing both?
masterfultechgeek@reddit
A nearly logically equivalent question to what you had would have been "why did AMD do Zen when they could have done Zen +" or "What did AMD did Zen 2 when they could have done Zen 3" or "why did intel release then 386 when they could have made pentiums?"
It takes time to design stuff and taking a first shot at an architecture and being LESS concerned about density can be a winning approach.
Geddagod@reddit
It's not the cores themselves that make something compatible with 3D V-cache.
Not really. The dense cores have far lower Fmax than the classic cores, the classic cores easily still easily have a large and necessary role in AMD's lineup.
masterfultechgeek@reddit
The "cheap" compact cores don't have the TSVs in them. This is presumably a die-area saving measure... which enables MOAR COARS.
Cache doesn't really matter for most use cases and on balance the more highly threaded the use case, the less cache matters.
>Not really. The dense cores have far lower Fmax than the classic cores, the classic cores easily still easily have a large and necessary role in AMD's lineup.
You're not going to hit the FMAX for any reasonable time span if you have \~100ish cores. The higher FMAX only really matters for "low end" desktop products.
Pretty much the only use cases for the "big cores" are things like HFT, fluid simulations and gaming. The first two are a relatively small chunk of the market and the latter one is chasing after a bunch of small purchases, which is generally NOT the way to go when you could be going after higher margin, $1M+ POs from the enterprise.
Geddagod@reddit
With Zen 3, the TSVs are no where in the cores, and with Zen 4 due to the area restraints some of them got moved onto the L2 block, but clearly the location of the TSVs are flexible to an extent.
If AMD wanted to create a 3D V-cache sku with all dense cores, there's nothing stopping them.
This is a bold generalization lol. Bad cache hierarchies have sunk products and performance before. Cache capacity and hierarchy is a major part of a products architecture.
What I suspect you mean however is that the halving of L3 per core isn't a big deal for Zen dense cores. To which... maybe? Halving the L3 causes an \~10% drop in IPC in specint2017 for Zen 4.
And here's a IT company buying server parts demonstrating that they do explicitly benefit from more cache per core (with Genoa-x) and claiming that's why they chose that rather than Genoa or Bergamo.
It is pretty interesting though that Zen6C in Venice Dense is rumored to bring the L3 cache capacity per core back to par with standard variants though.
Another problem is the decrease in memory bandwidth and capacity per core.
People love to downplay the client market for some reason. It's weird.
Check out this comment to highlight the strength of client. Note I'm referencing margins, operating income, and revenue.
All of client benefits from the much better ST performance of the standard cores. And much of server does too, stronger per core and vectorized perf are two of the strongest keys locking in x86 server CPUs from being completely phased out by home-grown ARM CPUs from hyperscalers.
masterfultechgeek@reddit
Touching on halving cache... going from "large" laptop cores to "C" cores in Zen 5 there's a bunch of use cases where IPC is basically tied - the desktop variant has its own strengths (also 4x the cache)
https://chipsandcheese.com/?attachment_id=31144
https://chipsandcheese.com/p/zen-5-variants-and-more-clock-for-clock <- bigger article. Most of the benchmarks have the clock speed capped on each CPU for IPC comparisons.
----
I will argue that the "best" solution is going to be invariably having a handful of higher clocking cores with more cache and then a bunch of "small" cores spammed. Which is generally what is done on laptops. It works pretty well. I say this as someone with a Strix Point CPU. This is also how it's done in phones... desktop/laptop OSes just need to catch up a bit... and even without a bunch of scheduler improvements it's STILL solid.
I kind of suspect that Zen 6 will have more of this, potentially in standard desktop parts. I'd LOVE the option for 12 "performance" cores and 24-36 "c" cores. Best of all worlds.
I'm also VERY amenable to a Zen 6c part with 3d-vcache.
There's also rumors of a future zen that has NO L3 cache and any extra is bolted on.
Geddagod@reddit
Current 2CCD Zen parts are hitting all core turbos above 5GHz. Only something like 10% below Fmax.
The highest Zen 4C boosts up to, when OC'd on desktop, is \~4GHz. This is still \~30% slower than a regular Zen 4 core. I would hardly call that pretty minimal.
Zen 5C is only 3.5GHz in retail products btw, but I feel like not allowing it to OC is unfair since those are in mobile products and likely power limited.
Why is the comparison to cores 20 years ago and not the classic variant of the core itself?
The difference here is likely very minimal.
This isn't a handful of workloads, this is most workloads for client, and many workloads in server too.
masterfultechgeek@reddit
The Zen 5C parts are getting "close enough" in clock speed.
Peak speeds aren't sustained for periods measured in minutes.
Consumer/client CPUs are low margin and BARELY matter.
the non-C parts are in some sense AMD's sloppy seconds for consumers. They're "rushed to market" and don't get the extra work to get more cores.
They also don't land on the more expensive, premium nodes.
They're basically the "poor person" parts.
Geddagod@reddit
Except the gap would be larger than 30%, from what we have seen. There's nothing, afaik, indicating that Zen 5C is closing the Fmax gap vs Zen 4C.
They are though. Check out 8:59.
So my previous comment in the other thread should explain why this is false. Near the bottom of my comment.
Except that the cores are very clearly designed differently. Where's the sloppy seconds in that?
How?
There are physical design differences and extra tuning to get the cores to clock that fast. AMD talks about how they optimized the critical path, targeted use of low vt gates, custom cells and cell variants, and even a specialized HPC focused node developed with TSMC in order to explicitly hit higher frequencies in desktop products. You can check it out in AMD's Zen 4 IEEE presentation.
Now ofc, Zen 4C has their own specializations. But the point is that AMD put a bunch of effort into both cores.
Funnily enough this only appears to be a Zen 5 thing. Wasn't the case with Zen 4, and isn't rumored to be the case with Zen 6.
While the dense server market prob does necessitate a more expensive node, Zen 5C exists in client with only N4 too.
ResponsibleJudge3172@reddit
They are gimped. They can't perform the same, so they are gimped
Vb_33@reddit
Zen 6.
steinfg@reddit
nah, usual cores
mduell@reddit
At gimped clocks.
Artoriuz@reddit
Intel is rumoured to go up to 16P+32E+4LPE with Nova Lake, but I'll only believe it when I see it.
Vb_33@reddit
No big last level cache for this SKU tho.
Muck113@reddit
Holy mother of multicore. I use a software that users all cores (auto desk suite). This will be game changer for our team.
Right now one revit file takes 2 mins to open on gen 4 nvme with 10th gen i7. I want to reduce it to 5 seconds.
Plank_With_A_Nail_In@reddit
I want more PCIe lane more than I want more cores.
fastheadcrab@reddit
Chipmakers have learned over a decade ago to keep it locked to HEDT. The issue is that Intel abandoned HEDT and even barely keeps workstation alive because their products were completely uncompetitive.
So AMD just increases their prices on HEDT with no competition. Intel kept Xeon-W prices insanely high because they had users by the balls and decided to never release consumer XE parts.
masterfultechgeek@reddit
What I want:
x16 for GPU
x4 for storage 1
x4 for storage 2
x4 for storage 3
x1/x4 for NIC
I'd be willing to live with nvme slots UNDER the motherboard and above the CPU if it means better performance. ATX was made for an era where the chipset was closer to the center of the board and these days the "chipset" is near the top, integrated into the CPU.
nauxiv@reddit
You have this already.
AM5 is 28 PCIe lanes.
16x GPU
4x M2
4x M2
4x -> chipset -> M2
1x for NIC etc. can be shared on the chipset.
Alive_Worth_2032@reddit
And Intel has 32X.
16x 4x 4x And DMI is 8X
While AMD's double chipsets are roughly equal to Z890 on paper coming out of chipset. What they lack is that double bandwidth upstream that Intel has.
Plank_With_A_Nail_In@reddit
I want x16 for two slots none of this dual x8 on top tier AM5 motherboards.
vlakreeh@reddit
Not that these chips are bad or that the code compilation benchmarks here are totally pointless, but I wish people did more realistic benchmarking in developer related workloads. Most developers aren’t doing tons of release builds with empty caches all day, something that’ll disproportionately benefit huge expensive large core count CPUs. Most developers are going to be working in a cycle of making changes, doing an incremental debug build, and then running the test suite over and over. For most of that cycle a dozen high performance cores will typically out perform a huge CPU that doesn’t have the same per-thread performance.
Unfortunately pretty much every publication focuses on time to do a release build with empty caches but ever since CI/CD became common place most professional developer don’t bother doing release builds locally for large applications.
Caffdy@reddit
can you expand on this? sounds interesting
vlakreeh@reddit
Nowadays developer workflows will typically look like this: You want to make a change to something so you go write a test that fails if the desired outcome does not happen, you then go try and implement that change, you run your tests and they inevitably fail, you go make a change and re-run the tests until your software passes the test.
When you have tested your change you submit those changes for review by a coworker and for additional automated testing in CI (continuous integration). In CI you typically run tests or various verification tools on submitted code changes to ensure you don’t have any regressions in your software and that some can’t merge in change that only works on their machine instead of this reproducible CI environment.
Once your changes have been approved and merged in you typically want to create a release, this will be a process similar to CI where you have CD (continuous deployment). CD is a reproducible environment where you can run a series of steps to build your software from a known state (instead of whatever the file system of an engineer’s laptop is), CD then uploads your software at the end for you to distribute or automatically uploads to some distribution platform.
During this entire loop, developers are typically not doing release builds of their software and are instead building debug builds where there’s more information (and less optimizations) inside the executable to make it easier to find out why the software is not behaving as expected.
VariousAd2179@reddit
For the Qwen2.5-14B-Instruct localscore benchmark where the 9980X scores 105, it's worth noting that an RTX 3060 12GB scores 222. Granted, if you're going to be using an RTX 3060 for your local AI inference, you have to factor in the price of the CPU needed to drive it.
Still, it might be a good idea to wait for Threadripper AI if AI is your thing.
Caffdy@reddit
what's that?
ElementII5@reddit
I really wish Michael would include Performance per Watt graphs. I find them very informative.
Sine observations.
The new threadrippers not just have a solid performance upgrade but more importantly didn't achieve this by just sucking up more power. They do have better performance per watt.
It is a nice generational gain in improvements of the architecture. In the case of the 9980X vs the 7980X 1.4x better performance per watt!
The IOD is made for a lot more chiplets. This makes 64 core part really shine in not just performance but also performance per watt.
What happened to the Core Ultra 9 285k? When they came out they were on par with the 9950X for applications. Now worse performance and performance per watt?! The 9950X has 23% better perf/watt!
michaellarabel@reddit
Hmm? There are perf-per-Watt graphs in there. I don't typically include them for every single result in the article itself since then it just becomes rather redundant and people complain of too much data, etc.
Pro tip: If you really want more perf per Watt graphs... last page of article -> click the result link ( https://openbenchmarking.org/result/2507290-PTS-THREADRI83&sgm=1&asm=1&ppw=1&hgv=Threadripper%2B9970X%2CThreadripper%2B9980X&sor#results ) -> click on the power consumption and efficiency orange 'tabs' above each graph.
Caffdy@reddit
there is no such thing as too much data, and much less on a technical-oriented website
Helpdesk_Guy@reddit
People are effing stoop!d, so never apologize/refrain for informing future bright bulbs with additional data-sets!
VenditatioDelendaEst@reddit
He essientially always includes the link to the full data on openbenchmarking.org at the end of the article.
Brevity is a virtue.
Helpdesk_Guy@reddit
Not even that. People are really so incredible dumb these days, to often complain about too MUCH data to form a solid opinion upon! Like what the heck?
Instead of just ignoring it and let other, competent people deal with it (like people being actually interested in stuff and compiled data-figures to pile over), dumb ones always try to ruin it for all involved …
I'll never forget the moment when I had compiled a extremely important thesis paper (one of the most cricual ones of my career), and the first reaction from someone outside the scope skipping over it, was;
michaellarabel@reddit
That's part of the reason at the end of each article I typically include my OB link for the full/raw data set in full for all my collected benchmarks and power metrics, etc.
Helpdesk_Guy@reddit
That's what I'm saying, you can never have enough data on every possible metric from every possible perspective.
Just think about how much actual data was missed, until fcat,OCAT, CapFrameX and others came along.
makistsa@reddit
/u/michaellarabel something is definitely wrong with the llama.cpp tests. I have a raptor lake with DDR4 and i have used a lot of times a 265k that is twice as fast as mine.
I tested in my 13600k with DDR4 the llama3.2 1b q4 with 6 threads. The first token was instant(<1s? i can't tell), the prompt processing 1350t/s(\~1000t prompt) and the generation of 1200tokens at 42t/s
The prompt processing at 280t/s compared to my 1350t/s doesn't make sense. I haven't tested the 265k with such a small model, but in bigger ones it's a lot faster than mine.
Kamishini_No_Yari_@reddit
Those compile times are delicious.
VariousAd2179@reddit
People still compile big projects on their own machines? (just asking -- I don't know the answer)
dagmx@reddit
Yes of course. When you’re iterating on code, you don’t want to keep sending it to a build server.
Oxire@reddit
9950 ddr5 6000 cl28
285k ddr5 6400 cl38
Jumpy_Equipment_889@reddit
https://youtube.com/shorts/_sRoVr3-RKA?feature=share