AMD Now Has More Compute On The Top500 Than Nvidia
Posted by 3G6A5W338E@reddit | hardware | View on Reddit | 43 comments
Posted by 3G6A5W338E@reddit | hardware | View on Reddit | 43 comments
koopahermit@reddit
So the majority of the machines use Nvidia, but the few that use AMD Instinct use a heaping lot of them. Helps that both the #1 and #2 supercomputer spots are both all AMD.
animealt46@reddit
Top tier supercomputing is a strange world where being at the very top matters a ton since the dropoff is so steep.
puffz0r@reddit
dropoff in what?
Brisngr368@reddit
Compute
mikaturk@reddit
Easier to justify writing different code when you’re saving a lot more money with the bigger clusters
SERIVUBSEV@reddit
Supercomputers aren't for "AI", they are for normal deterministic compute that does chemical simulation, weather pattern analysis, etc.
It works with standard drivers and doesn't really need code rewrite.
jnRven@reddit
What does this even mean? AI models are deterministic at their core? AI models are also used in both chemical simulation and weather analysis?
XelNika@reddit
Yeah, the rank 21 owner was revealed as an AI supercomputer. "AI" is currently machine learning and LLMs.
EmergencyCucumber905@reddit
Not re-written. The HIP API matches the CUDA API 1 to 1. You can literally replace all "cuda" with "hip" and be 99% the way there.
The big scientific packages already have both CUDA and HIP backends.
From-UoM@reddit
Nvidia and AMD went in different directions for data centre GPUs.
Amd went all in FP64 which great for Data Science, Simulation and HPC with CDNA. FP64 is the figure used for TOP500
Nvidia went on all in low precision Fp16 and FP8 for AI performance from Volta
So while Amd got the lead in the HPCs, Nvidia pretty much monopolized the AI GPU market and making them the making them the most valuable company in the world.
Strazdas1@reddit
Yeah. AMD bet on scientific precision and models. Nvidia bet on lower precision broader application. Nvidia ended up winning out in that but back 20 years ago when these directions started noone knew how it would end. Altrough AMD was really arrogant about it, saying that Nvidias choice will bankrupt them and whatnot.
b3081a@reddit
AMD wasn't quite all in FP64 in MI300X generation. They've actually surpassed H100 in terms of FP16/FP8 matrix throughput, so they definitely wanted both markets. That's why they can sell those MI300X to AI companies these days. It's just a lot more expensive to make when they want both in one package, so not as cost efficient as H100 for AI.
AtmosphericDepressed@reddit
Supercomputers typically need 64 bit float and NVIDIA is optimising hard for fp4, for AI.
gumol@reddit
I wish Top500 was still relevant. It’s such a cool concept.
So many clusters are not added to Top500, because if you paid hundreds of dollars for a cluster, every day is very expensive, and why waste time on hero runs?
Moreover, Top500 focuses on double precision, while the industry is going towards smaller data sizes.
Brisngr368@reddit
Top500 is for HPC which is only focused on FP64, lower precision isn't useful for HPC so it isn't really a big focus.
Jlocke98@reddit
Don't forget all the secret clusters run by state actors
puffz0r@reddit
It's actually kind of terrifying that there are dozens if not hundreds of private/secret supercomputing clusters that draw dozens of megawatts each probably being used to spy on all civilians, meanwhile global warming:
996forever@reddit
Can’t be all that secret if the hardware has to come from somewhere.
Strazdas1@reddit
The hardware can be under NDA contract. As in, you sign a contract in which you agree to not tell anyone that you made and sold the hardware to whatever agency. You can get jailtime and even treason charges for breaking it. This is why some companies use the cannary protocols. Basically everything is cool you see something innocuous being updated on a site, but if updating stops you know theres a problem, even though the company never actually informed anyone about it so they never broke the NDA.
996forever@reddit
The logistics required would be insane for the amount of compute power like literally 100k+ nvidia data centre gpus.
Most of the components will have to be internationally sourced, so it can hardly be a total secret within any country boundary. It’s not the 1930s where a secret bomber could be manufactured with mostly domestic parts and labour.
Manufacturing of computer hardware is extremely limited to a handful of nations on Earth, and the sheer quantity required for those numbers is extremely hard to hide.
Strazdas1@reddit
The international suppliers wouldnt be able to tell the difference whether Nvidia is oordering those parts for GPUs sold to Microsoft or for GPUs sold to NSA, though. And they are certainly not privy to individual contract sale numbers.
Sure it would be hard to do with custom hardware. Not so hard too with using same hardware thats already made for other purposes.
Strazdas1@reddit
I remmeber back in the day when NSA leaks happened. It would be physically impossible for humans to listen to all phone calls, so they had a supercomputer do this, find keywords and flag for human review. And im sure its a lot more advanced now.
Strazdas1@reddit
This is true, but unfortunate. I prefer slower but better precision options but for homelab thats... not really viable anymore.
Tommy7373@reddit
it's absolutely still relevant, just not for ai, and was never designed for ai. top500 and hpl/hpcg will never change from fp64 performance and testing scientific workloads, and is always aimed at scientific/research systems rather than private sector clusters and supercomputers wanting non-fp64 compute.
ResponsibleJudge3172@reddit
AI has encroached on traditional supercomputing already. Climate simuklations and fluid dynamics for example
Qesa@reddit
It's not for AI, but H100 isn't a slouch at fp64 either. We know multiple clusters of 100k+ H100s exist, which would hit ~4 EF in HPL. They just haven't bothered submitting a run
996forever@reddit
Which ones would hit 4 exaflops?
Qesa@reddit
Facebook/meta and twitter/eggs
cherryfree2@reddit
xAI's Colossus Cluster in Memphis.
auradragon1@reddit
It's not really relevant anymore. Some countries don't even submit their supercomputers to the list.
EJ19876@reddit
AI clusters have the phenomenal fp64 performance. They’re all using H100s, after all. Meta and xAI definitely have clusters with performance well in excess of #1 on the top500 as they’re both known to have 100,000xH100 clusters operating.
Computers don’t appear on the top500 unless the owner runs the LINPACK benchmark provided by top500. Maybe Musk could be convinced to run it on the xAI cluster? He likes that sort of stuff.
Numerlor@reddit
Yeah the headline sounds impressive until you realise Meta and other companies with their own big ass clusters have Nvidia GPUs at multiples of the rank 1 there
noiserr@reddit
Both #1 and #2 spots occupied by full AMD systems. Impressive.
Strazdas1@reddit
But thats because theres systems missing from the list. Meta training cluster is larger than number 1 for example, but its not listed because it never ran the test for this.
Brisngr368@reddit
How big is meta's training cluster?
noiserr@reddit
We will never know I guess.
Strazdas1@reddit
Yes, there are many systems that will never get listed here and many systems that are flat out secret and youll never even hear about.
Psyclist80@reddit
AMD starting to stretch its legs! having dominant performance in CPU and GPU HPC workloads. And pivoting toward lower precision workloads with Mi355x to compete better in that market. They are running a lot of successful products currently! Ha, only wish the gaming segment would see some love, hopefully soon with RDNA4 launch.
Strazdas1@reddit
RDNA4 will be bad. They wont compete in high end because they cant. Theres all the indicators something in RDNA4 didnt work and they didnt bother fixing. Instead they focused on next architecture and datacenter.
gluon-free@reddit
FP64 performance of supercomputers is now on red side, because green focusing on AI.
From-UoM@reddit
Nvidia and AMD went in different directions for data centre GPUs.
Amd went all in FP64 which great for Data Science, Simulation and HPC with CDNA. FP64 is the figure used for TOP500
Nvidia went on all in low precision Fp16 and FP8 for AI performance from Volta
So while Amd got the lead in the HPCs, Nvidia pretty much monopolized the AI GPU market and making them the making them the most valuable company in the world.
512165381@reddit
🤑
XYHopGuy@reddit
Nvidia doesn't try to compete in the fp64 space