AI startup Cohere found that Amazon's Trainium 1 and 2 chips were "underperforming" Nvidia's H100 GPUs, according to an internal "confidential" Amazon document

[-]

Kryohi@reddit

Kinda expected since you can't design chips like this in a couple years and expect to be competitive with the best. It took Google quite some time to make their TPUs good for training, same with AMD which will only reach complete parity with Nvidia with the MI400 next year.

And for anyone screaming software, no this has nothing to do with software. If these accelerators were fast enough they would be used at least by big companies, and you wouldn't see this article.

[-]

a5ehren@reddit

AMD marketing says MI400 will have parity. It won’t.

[-]

SailorBob74133@reddit

OpenAI and a bunch of other similar sized customers lining up for mi45Xx seems to indicate otherwise.

[-]

lostdeveloper0sass@reddit

AMD already has parity in a lot of workloads. I actually run some of these workloads like gpt-oss:120B on Mi300x for my startup.

Go check out inferenceMax by Semianalysis. All AMD lacks now is rackscale solution which comes with Mi400.

Also, Mi400 is going to be 2nm, VR is going to be 3nm. So might have some power advantage as well.

AMD lacks some important networking pieces for which it seems it going to rely on Broadcom but seems Mi400 looks to compete head on with VR200 NVL.

[-]

xternocleidomastoide@reddit

AMD lacks some important networking pieces

That's an understatement ;-)

[-]

ked913@reddit

You guys do know AMD own Solarflare (ultra low latency leaders) and Pensando right?

[-]

xternocleidomastoide@reddit

yeah and? neither solarflare nor Pensando address "some of the important networking pieces" missing

[-]

SailorBob74133@reddit

AMD has all the pieces. MI45Xx will use Broadcom Tomahawk Ultra which supports UALink over ethernet. The solution is comparable to Nvidia based solutions but emphasizes different things. Grok gave a good writeup: https://grok.com/share/bGVnYWN5LWNvcHk%3D_cae9a106-cdd4-461c-a584-51f10a35872b

[-]

lostdeveloper0sass@reddit

I'm fully aware. But they do lack serdes IP, nothing they can't find externally or license from others.

[-]

State_of_Affairs@reddit

AMD also has a partnership with Marvell for UALink components.

[-]

a5ehren@reddit

UA Link was killed in the cradle by NVL Fusion

[-]

Thistlemanizzle@reddit

Can you elaborate?

I was hopeful AMD might catch up, but skeptical too. It’s not far fetched that they are still a few years away. I’d like to understand what you’ve seen that makes you believe that.

[-]

a5ehren@reddit

If I knew for sure I’d be covered by like 300 NDAs. But AMD has saying the same thing for a decade and it’s never been true.

[-]

SirActionhaHAA@reddit

AMD marketing says MI400 will have parity and random redditor says that it won't

There's no reason to believe either.

[-]

State_of_Affairs@reddit

That "random redditor" provided his source. Here is the link.

[-]

Exist50@reddit

No one linked that before, nor does it include any results for MI400. The author of that blog isn't really reputable to begin with.

[-]

jv9mmm@reddit

Well AMD marketing has claimed it will achieve parity with Nvidia every year for the last 15 years. At some point we should start disregarding their claims of parity.

[-]

BarKnight@reddit

Poor Volta

[-]

mark_mt@reddit

No! Mi400 will be better than NVDA by quite a bit, 2nm vs 3nm and packs more compute units! Law of physics/semiconductor .... now you gonna claim cuda makes it faster - nonsense!

[-]

imaginary_num6er@reddit

Yeah if was easy, Pat wouldn’t have been fired from Intel

[-]

imaginary_num6er@reddit

Yeah if was easy, Pat wouldn’t have been fired from Intel

[-]

shadowtheimpure@reddit

It could also be a question of the models being optimized for Nvidia's architecture rather than Amazon's.

[-]

_Lucille_@reddit

It is really just a price issue.

Chips like trainium are supposed to offer a better ratio, where as if you want raw performance (low latency), you can still use nvidia.

Amazon can get people onboard by cutting the cost by a certain percentage to a point where it is clear that they have that price:performance ratio once again.

[-]

entarko@reddit

And even then, you are saying "which will", there's no guarantee of that.

[-]

Revolutionary_Tax546@reddit

That's great! I always like buying 2nd rate hardware that does the job still for a much lowervprice.

[-]

saboglitched@reddit

By 2nd rate hardware do you mean used h100s? Which are cheaper now. Also Tranium doesn't seem to "do the job" for cheaper in terms of price/perf or lack of the software stack.

[-]

FlyingBishop@reddit

I mean, maybe? The article kind of seems like a low-effort hit piece. Everyone knows that H100s are the best GPUs for training, it's why they're so expensive. Without figures and a comparison between H100/AWS Trainium/Google TPUs/AMD MI300X it just seems like a hit piece.

It's also something where I would want to hear the relative magnitudes. If AWS has a total of 100k H100s and 5k Trainiums then this is an "AWS has not yet began large-scale deployment of Trainium and still mostly just offers H100s"

The article says Trainium is oversubscribed which makes me think for training purposes you can't get enough H100s so Trainium exists and it's something you can use, there are no used H100s to rent when you need hundreds of them.

[-]

saboglitched@reddit

Everyone knows that H100s are the best GPUs for training

What? Nvidia has released multiple lines of products and improved refreshes since the original 80gb h100 was released over 3 years ago. The current ai optimized gb300s multiple times better than the original h100 which wasn't even primarily focused for LLM training. The article does bring up some points that aren't easily dismiss able like that AWS can't offer any of their own chips that can even match h100 now and Anthropic, a major AI player was using AWS announced a large partnership with google, and OpenAI announced a future $38B partnership with openai but with exclusively running nvidia gpus and no trainium use planned suggest they are basically unviable for any ai workload. The only thing the article says good about trainium is the amazon ceo boasting about trainium is a rapidly growing "multibillion business", but I wouldn't trust that given the evidence and all cloud providers are basically manipulating numbers to show AI growth while hiding losses everywhere to fool investors.

[-]

Revolutionary_Tax546@reddit

AI is failing, because it isn't really AI. It's not there yet. That's why Ukraine uses FPV drones to target Russian tanks.

[-]

Revolutionary_Tax546@reddit

No ... I mean using what you need, not buying the latest and greatest to play PAC-MAN. You Tube doesn't need an ultra fast PC either, the money can be used on a fast network connection instead. ... I know the big stores & companies want tou to spend more time on changing your hardware every two years, tather thsn using it.

[-]

From-UoM@reddit

Getting into Cuda and the latest Nvidia architecture is very very cheap and easy. For example a rtx 5050 has the Blackwell tensor cores as the B200.

So people have extremely cheap and easy gateway here. Nobody else has a entry point this cheap and also local.

If you want to go higher there are the higher end RTX and RTX Pro series. There is also DGX spark which is inline with GB200 and even comes with the same networking hardware used in data centres. Many universities also offer classes and cources on Cuda for students. So that's another bonus.

This understanding and familiarity are carried to the data centre.

Amd doesnt have CDNA on client gpus, Google and Amazon doesn't even have client options. Apple is good locally but they don't have data centre GPUs.

Maybe Intel might with Arc? But who knows with those even last with the Intel-Nvidia deal.

Maybe amd in the future with UDNA? But we have no idea what parts of the data centres they will be bring and if it will be the latest or not.

[-]

nohup_me@reddit (OP)

I think the advantage of custom chips is the software, so if you’re Amazon or Apple or google you can write your code optimized for these chips, instead, small startup can’t took all the advantages from them.

[-]

DuranteA@reddit

I think the advantage of custom chips is the software

I'd say the exact opposite is generally the case. The biggest disadvantage of custom chips is the software.

This simple fact is what has basically been driving HPC hardware development and procurement since the 80s.

[-]

a5ehren@reddit

Yeah. Writing non-portable code is a waste of time

[-]

max123246@reddit

Though it's pretty standard practice actually for GPUs since they've changed so much from generation to generation. PTX is no longer backwards compatible when it comes to tensor core instructions

[-]

nohup_me@reddit (OP)

It’s an advantage, see Apple’s M processors… because the software written only for custom hardware is way more efficient, but it has to be written from scratch almost.

[-]

hanotak@reddit

You know absolutely nothing about processor architecture or how software works, do you.

[-]

elkond@reddit

you significantly underestimate the effort required for writing low-level optimizations for low-latency/high-throughput workloads that need high reliability as a cherry on top

i worked on software like that and you need actual wizards to pull that off, and even then, it's hundreds of people working multiple years to get a code that's as easy to work with as writing for CUDA-enabled hardware

[-]

nohup_me@reddit (OP)

you significantly underestimate the effort required for writing low-level optimizations for low-latency/high-throughput workloads that need high reliability as a cherry on top

No I don’t understimate it, this is why custom chips with custom code is better and more efficient, but it requires lots of effort and it’s why the startup can’t afford to all of that.

Is what I’m writing since the beginning.

[-]

elkond@reddit

its not an advantage, you dont write code with a shelf life of an unpasteurized milk unless you are Apple

[-]

Earthborn92@reddit

Apple is probably the only American company that could do this, since they have all of their integrated walled garden in place before they started co-developing hardware for it.

[-]

From-UoM@reddit

Probablem is how do you teach devolopers and give them the environments to learn how to write these codes in the first place.

There is currently no way to take the latest Google TPUs and give it to students and devs to use in their laptaps or desktops.

[-]

Kryohi@reddit

This might be a problem for small companies or universities, not for the big ones. They can afford good developers who are not scared away the moment they see non-Python code.

[-]

From-UoM@reddit

It only works well for internal devs who basically have local access to the GPUs amd are paid to use it. Outside devs? Not so much.

There is a reason why Amazon and Google still have to offer GB200 servers on their cloud services despite their own chips.

People learn Cuda from the outside. Then will prefere to use Cuda in the data centres.

[-]

Kryohi@reddit

I agree, but again, it's also a matter of size and commitment. Depending on the company and what deal they get offered it might be very well worth it to, say, switch to Google's tpus, or to even take the drastic measure to develop their own chips. Then you pay a good team to learn and use the new hardware, whether it's yours or from another provider.

[-]

From-UoM@reddit

Time is extremely important now.

You can always make back money. You can never get back time.

External devs can start on cuda right now. For TPUs they have to spend time to learn, which is time lost and falling behind competitors who will use cuda. And that to if it even works

Deepseek learned it the hardway. They tried Huawie's GPU's and failed multiple times.

https://www.ft.com/content/eb984646-6320-4bfe-a78d-a1da2274b092

[-]

Salt_Construction681@reddit

got it, the key to success is bravery against non python code, thank you for enlightening us idiots.

[-]

Kryohi@reddit

https://en.wikipedia.org/wiki/Hyperbole https://en.wikipedia.org/wiki/Metaphor

[-]

ShadowsSheddingSkin@reddit

https://en.wikipedia.org/wiki/Asshole

We're all aware what you actually meant and are exactly as offended by it as the actual shitty words you used to convey it in an exaggerated manner. The fact is that this represents a major problem in acquiring and retaining sufficient numbers of people with the requisite skills and throwing out "That's only a problem if you're not rich enough to just hire the best possible developers who can easily familiarize themselves with a totally different model of low-level massively parallel computing that exists nowhere else and then build an entire software ecosystem themselves, in-house" as an answer is exactly as stupid as what you actually said, it's just less likely to irritate a significant subset of programmers.

[-]

Kryohi@reddit

Wtf.

I use python too, you know.

[-]

nohup_me@reddit (OP)

Yes... this is the issue, small startups can't afford to the resources of amazon, and probably Amazon is only giving some information, not all the access to low code info to its custom hardware.

[-]

Talon-ACS@reddit

Watching AWS get caught completely flat-footed this computing gen after it was comfortably in first for over a decade has been entertaining.

[-]

sylfy@reddit

They haven’t been caught flat footed, they can purchase plenty of Nvidia GPUs, and their customers are happy to pay for those. They simply want to cut costs, and they’re trying to push customers towards their own homemade solution that nobody wants.

[-]

Mrseedr@reddit

op is ignorant.... don't waste time on this thread

[-]

iBoMbY@reddit

The thing is, they also cost them about 10x less than NVidia GPUs.

[-]

Balance-@reddit

No bad products. Only bad prices.

[-]

DisjointedHuntsville@reddit

The headaches with a fully custom asic approach is, unless you’re Google with an entire country’s worth of scientists and literal Nobel laureates as employees. . . That silicon is as good as coal. Burn it all you want to keep yourself warm, but it’s mostly smoke at the end of the day.

This year is when the decision by Nvidia to go to an annual cadence kicks in. The models coming from the Blackwell generation (Grok 4.2 etc) are going to really show how wide the gap is.

[-]

jv9mmm@reddit

The Trainium GPUs are a response to the Nvidia chip shortages. These chip shortages are no longer the bottle neck they once were, and now the issue is deeper in the supply line for things like HBM and good luck beating nvidia out on that.

Nvidia has significantly more engineers for both hardware and software, the idea that a company can build from scratch a new product all together with a fraction of the R&D is questionable at best.

There goal was if we can make something 80% as good, but we don't need to pay Nvidia's 80% margin the development will pay for itself. And so far it has not.

[-]

shopchin@reddit

I didn't need them to tell me that

[-]

MoreGranularity@reddit

If some AWS customers don't want Trainium, and insist that AWS run their AI cloud workloads using Nvidia gear, that could undermine Amazon's future cloud profits because it will be stuck paying more for GPUs.

The customer complaints highlighted internally by Amazon reveal the steep challenge it faces in matching Nvidia's performance and getting profitable AI workloads running on AWS.