Why is NVDA leading in AI when the hyperscalers have their own ASICs?

[-]

ET3D@reddit

The AI chips are meant for inference. Training is still done on NVIDIA chips.

[-]

phil151515@reddit

Apple just announced they did training on Google TPUs.

[-]

dweller_12@reddit

hyperscalers all have homegrown AI ASICs that perform the same operations more cheaply

They don't. The costs associated with developing an AI chip from scratch are massive. Years of development, tens of millions in cost, and may or may not be a total waste of money in the end if the chip does not meet expectations. A hyperscaler needs a very good reason to develop a specialized chip for it to make any sense financially. It is not as simple as snapping your fingers and beating NVIDIA.

With NVIDIA you go and buy an already functioning product, which arrives at a somewhat guaranteed date, with an existing highly supported software ecosystem.

[-]

capn_hector@reddit

They don't.

wrong

amazon has trainium, google has Trillium, Alibaba has ACCEL, IBM has Northpole, Meta has Artemis... you can pretty much list off the hyperscalers that don't have their own chips and it's a much shorter list.

[-]

phil151515@reddit

Apple recently used Google TPUs to do training.

[-]

auradragon1@reddit

you are correct that in the end pretty much everyone has chosen NVIDIA to do the actual work... but that's the point OP is making and asking about. Why should this be the case, why do hyperscalers use NVIDIA rather than their own in-house solutions etc? That's a much more interesting question etc.

I think it's foundational model competition. Foundational models can cost hundreds of millions, and soon, billions of dollars to train. What's the point in cheapening out on the training hardware when it costs so much money to train it? Not to mention that right now, there is a race to AGI. Training takes 6 months for a foundational model. If your training doesn't work because you went with some second-rate training hardware such as AWS Trainium while your competition used Nvidia hardware, it could be the death of your company. You'll have no model or your model is far behind the leading model.

There is a huge race right now towards ever more intelligent foundational models.

[-]

Klinky1984@reddit

A lot of them suck though. Maybe a more accurate statement is "they're trying, but doing a poor job of it".

[-]

norcalnatv@reddit

Great link and post.

[-]

StoatStonksNow@reddit (OP)

Right, but Google did do that - they have the TPU line, and it supposedly outperforms the H100. Can they not produce it at scale?

[-]

djm07231@reddit

They do use it a lot internally they just have been pretty slow in marketing it.

If you use TPUs you are locked into Google’s XLA stack of Tensorflow/Jax which is different from more common Pytorch.

But when it comes to scalability and reliability TPUs are probably better than Google in that regard.

[-]

djm07231@reddit

If you look at Meta's Llama 3 paper (https://arxiv.org/pdf/2407.21783), particularly Table 5, you can see the various assortment of failure modes which can interfere with your 10,000+ GPU run.

The whole job is synchronized so even a single GPU failure can require a restart.

I believe Google has a much more mature system for redundancy and backup so that any hardware failures are not exposed and new hardware can takeover more easily. So Google's infrastructure is more reliable and established. Whereas any person working with Nvidia's GPUs need to write their own stack from scratch and need to address these kinds of reliability concerns.

TPUs were designed from the start to be immensely scalable and reliable. Nvidia is a bit behind in this regard.

[-]

norcalnatv@reddit

TPU line, and it supposedly outperforms the H100

Google used to submit to MLPerf. They don't meaningfully any longer. They used to also come out with a whitepaper \~18mos after the fact, showing how their latest TPU was faster than Nvidia's earlier generation. They didn't seem to want to compare to the current generation and took some grief, so they don't do that any more.

With Google being the place a lot of people start looking for answers, it was pretty easy for them to create a perception about their performance. I don't think it's accurate, but happy to see contrary data.

The other issue, mentioned elsewhere is the flexibility of Nvidia's platform vs. TPU.

[-]

StoatStonksNow@reddit (OP)

I think this is the last piece of the puzzle I was missing. Thanks a ton.

[-]

Deshke@reddit

The quest would be what does more revenue? Renting your own chip out or trying to sell it without the infrastructure around it?

[-]

norcalnatv@reddit

Nvidia showed numbers last year that for every $1 invested in A100, $4 were returned, and for every $1 invested in H100 $7 would be returned in terms of the instance rental business. Blackwell looks likely to improve those numbers.

[-]

quildtide@reddit

Googl$'s TPUs only work with TensorFlow, and iirc they were designed specifically for CNNs and don't outperform Nvidia GPUs on arbitrary AI architectures.

[-]

XYHopGuy@reddit

They work on TensorFlow and JAX (which has a lot of advantages over pytorch) and barely on pytorch. They work great for transformers, not just CNNs.

interally they almost entirely use TPUs. check out the research publications coming out if you need evidence. Google buys GPUs to sell on GCP, which honestly is the bigger reason GPUs are dominant. One software stack available via RTX, cloud, academic clusters, etc vs sharded ecosystem of ASIC accelerators.

[-]

ArnoF7@reddit

I don't know about Google’s product team, but within Google research and Deepmind, a lot of researchers don't use TPU at all and there is no pressure to force them to use TPU exclusively. So while TPU is a thing, Google itself isn't all in on it

Source: I have a few friends and labmates who work for Google and I asked them directly

[-]

YouMissedNVDA@reddit

Ask yourself why ChatGPT was found on Nvidia GPUs and not TPUs.

Ask yourself why nearly every new robotics firm is on NVDA robotics platforms.

Ask yourself why Apple ever sold more devices than competitors when the price/performamce was never optimal.

The job to be done is not compute efficiency. The job to be done is accelerating research utilizing accelerated computing.

[-]

From-UoM@reddit

Should also add how easy is it to start on Nvidia.

Buy any RTX card and you get access t libraries and software thats also on the top data centre gpus. There are some limits of scale but basically you get access to almost everything

Cost of entry is stupidly cheap.

Then those people get hired buy companies and they can start on the H100 in a very short time.

Meanwhile on ASICS they need time to learn and adjust

[-]

Rain08@reddit

Buy any RTX card and you get access t libraries and software thats also on the top data centre gpus. There are some limits of scale but basically you get access to almost everything

Cost of entry is stupidly cheap.

I remember back in 2018 when I was helping one of my friends on their thesis. They had a classifier that they were training on a GTX 750 Ti, but they wanted to speed up the training so I helped them by running the classifier on my GTX 1060. I just have to install the required programs and libraries to run the model and I'm all set.

The fact that you can start small and scale it up effectively with no additional work is great. CUDA being well-documented too was very nice since there wouldn't be much head scratching. More training time, less configuring time.

And from what I noticed in my batch, everyone went with an Nvidia card for those that didn't have PCs or who wanted an upgrade to do their ML-related thesis. Though there was a small portion who opted for Google Collab.

[-]

darknecross@reddit

And more than the upfront cost is the recurring cost. You need to keep developing new chips that outperform the competition faster than they do. Otherwise you hit an inflection point in a few years where it no longer makes sense to build your own.

[-]

DueRequirement6292@reddit

Tens of billions not millions

[-]

StoatStonksNow@reddit (OP)

But the google TPU v5 supposedly outperformed Nvidia h100 by a wide margin. Are they not able to produce it at scale?

[-]

Strazdas1@reddit

because:

AISC hyperscalers are specific workflow, have to make new chip for a different model to be efficient.

Until recently ASIC hyperscalers were rare

Nvidia comes with software stack and 24/7 software support if you are big enough client.

[-]

sittingmongoose@reddit

A huge portion of it is software. Nvidia is the king in software support. Nvidias solutions also encompass more than just AI accelerators, they have strong CPUs, gpus, and VERY advanced network solutions. It’s an all in one hardware and software solution that can scale all the way up to data center deployments.

On top of that, asics are not easy to make, you need to get them fabbed and designed, they are expensive to change, and still need all the supporting hardware around them.

Companies are heading the way of their own bespoke solutions. Apple is a good example of this. But it will take time to build up software support, iterate on designs, and the slowest part, test them at scale.

[-]

StoatStonksNow@reddit (OP)

That makes sense. So Google has the TPU line, but it isn’t as easy for customer to use?

[-]

the_dude_that_faps@reddit

Aren't the TPUs for inference? Nvidia GPUs are good for inference of LLMs, but they also can train the models, I'm not aware of many alternatives for training large models aside from AMD's instinct line.

[-]

Patient_Stable_5954@reddit

Many uses multicloud. Having common hardware across cloud environment is necessary. So TPU is reserved for specific task rather than as standard.

[-]

Acrobatic_Age6937@reddit

imho things don't add up. If google has their own (better) tpu as claimed, they should be ahead of the curve as they have enough dev's working on it so that the hardware should pull them ahead, but that's not what we see. They are barely competing.

[-]

Edenz_@reddit

The TPUs are certainly real you can read about them online, they’ve been presented at Hot Chips a few times. They’re bespoke inference ASICs for Google’s internal ML requirements. I’m not sure that they’re built to do training work like V/A/H/B/100 are.

[-]

Vushivushi@reddit

TPUs help Google's cost structure, but Google also has to produce a good quality model, too.

Gemini 2 isn't out yet, so we'll see if they can leapfrog the competition.

[-]

mer_mer@reddit

At my work we use local dev machines for prototyping. You can't do that with TPUs, so you'd have to do your local dev work on one software stack and then deploy onto another, which is scary. We're also currently on AWS so to use TPUs we'd have to switch to GCP. Amazon has their Inferentia chips, but they have the same issue with local development plus their software stack isn't as mature. We looked into switching over and we would have had to switch a bunch of library versions around. The savings weren't enough to justify the engineering investment at our scale.

[-]

StoatStonksNow@reddit (OP)

This is a great answer! Thank you

[-]

sylfy@reddit

Every PhD student working in ML and adjacent fields knows how to set up CUDA and the related software stack these days. A good number know how to troubleshoot when necessary. A handful even know how to write CUDA code.

When you go to TPUs, they’re starting from scratch.

[-]

sittingmongoose@reddit

Your code needs to not just run on it, but it has to leverage the accelerator efficiently or they won’t run fast.

Think of it like the PS3 vs x360. The PS3 had asics to accelerate graphics(SPUs). When a developer used them, the games ran well and looked insane. However, they required a ton of work and time to get right. So what happened is they did it half way and you were left with games that ran and looked better on x360 because it was too much effort to get to work better on PS3.

I’m kinda over simplifying it but it’s the best analogy I could think of while I’m on a stair master lol

[-]

AntLive9218@reddit

A huge portion of it is software. Nvidia is the king in software support.

I'd argue that it's mostly just convenient development as long as you fully commit to their proprietary ecosystem, because outside of that they are really bad with support.

Ever since it became obvious that CUDA was starting to have a comfortable monopoly, Nvidia's support for competing standards turned horribly bad.

[-]

MewKazami@reddit

The corporate world is much like the dumb consumer. The best thing is never the "best" you need a package thats easy to use and hassle free, that doesn't break the bank but also gives sort of the best performance. Versatility and Scalability are the name of the game.

The "Cloud" exists mostly because corporations are happy to pay a premium to have servers instead of having infrastructure of their own with people that need to maintain it. What if the people quit(We need to find someone good and pay someone a decent wage)? What if something goes wrong(We can shift blame)?

The same goes for AI. NVIDIA provides a solution that JUST WERKS. And thats enough for most corporations. Could custom chips do better in almost every case if they were made for these cases? Yes. But whos going to finance that?

[-]

FragrantMatch124@reddit

You mean the ASICs which most of the hyperscalers just recently started to develop?

Those ASICs are in development or brand new and only available in small quantities. Nvidia AI chips are broadly available just now. The demand is right now extremly high.

[-]

capn_hector@reddit

You mean the ASICs which most of the hyperscalers just recently started to develop?

tesla hired jim keller to make Dojo in 2016 and he left in 2018.

amazon launched trainium to the public in 2020.

starting to get the idea that people don't really understand that hyperscalers do have their own training chips and have for a long time.

the fact that they continue to choose NVIDIA is both notable and interesting!

[-]

StoatStonksNow@reddit (OP)

One of the main things I learned from this thread is that hyperscalers ASICs are such a bit player most people don’t even know they exist and have existed for ten years

[-]

norcalnatv@reddit

such a bit player

To take the contrary side for a moment, there is a faction that believe the ASIC business is just hardly getting started, and will be quite large in time. Broadcom/AVGO released earnings yesterday and they assist 3 major CSPs with their ASICs, Goog, Meta and (?) AWS. This is about an $8B annual business for them. Interesting that CEO Hock Tan wouldn't comment on 2025 revenue beyond a flattish type run rate.

Primarily Nvidia has the Training side of AI locked up, few asics, except TPU, really can do this well, and I don't believe too many folks are trying to compete here, including AMD.

Interencing is a different story. Nvidia claims 40% of their DC GPU sales (11- $12B last Q - GPU + adjacent technology) are going toward inferencing work loads. And 46% of Nvidia's business go towards top CSPs, so the CSP guys could potentially displace, idk, $5-6B in inferencing work loads if they went with 100% DIY designs.

All the LLMs are coming from these hyperscalers, no one else has the scale to run these. So what is the future of Asics serving LLM inferencing? It's hard to say. Don't the training GPUs have utility as inferencing servers too? Sure they do. So I think there will be a balance of GPUs to ASICs over time. But they will definitely be a larger market than they are today.

[-]

capn_hector@reddit

personally my opinion is that this is going to be a god-of-the-gaps situation. When the pace of innovation picks up, NVIDIA is in the driver's seat, both because they've got a fully general architecture (and one of the most advanced ones at that) and because they've got the organic ecosystem/userbase. Fixed-function is actually a liability when things are changing rapidly and NVIDIA not only has the ability to just execute arbitrary code if needed but also they are where the research takes place (at least for now). We will see if they can hold onto that organic ecosystem lead over time of course.

I think over time training will also be driven significantly lower, plus it's a very difficult market in the sense that you have to be a hyperscale player (or backed by a hyperscaler/VC - which will dry up eventually) to sustain the costs of developing a model. Like who are you going to sell training chips to that doesn't have the financial capability to make their own? And does that customer actually have enough money to use the chips and get to market? If not they won't get that VC funding.

And yes of course inferencing will be driven to absolute efficiency/commoditization. Cost will be driven as close to zero as sustainable and it's in no way a distinguishing factor or moat, on the 5-10 year window everyone will have them.

But for training, it really depends on how much things continue to shake up. The problem to date has really been that competitors can't keep up with the pace at which the research is happening - by the time they've brought the chips to market there is something else they need to incorporate to be relevant. Plus obviously the software ecosystem (although I don't think that itself is a durable moat). When things slow down, ASICs will eventually win out. When someone comes up with the next big innovation, ASICs will fall behind again. That's the problem with fixed-function hardware vs NVIDIA's (and AMD, Intel, etc) generalist GPGPU-style architectures.

[-]

norcalnatv@reddit

In general agree with your post, it's an early, evolving market and GPUs do have a flexibility advantage at this point in the game.

ASICs will eventually win out.

Question (maybe a bad topic for this sub, but) if Nvidia keeps raising the bar, a tactic well understood from PC gaming, how do ASICs ever catch up? Does the work load stop evolving?

I think the pace of innovation eventually becomes exhausting. Nvidia has the software and developer advantage (sounds familiar . . ), they obviously have a top architecture team, they've clearly show an execution pace that's hard to match, the market is still early innings, and new features are always in demand from developers. I mean, eventually could be a pretty long time.

[-]

capn_hector@reddit

Question (maybe a bad topic for this sub, but) if Nvidia keeps raising the bar, a tactic well understood from PC gaming, how do ASICs ever catch up? Does the work load stop evolving?

I generally think a lot of the technical underpinnings will settle down eventually. You don't need to invent 6 new datatypes every generation. In 5-10 years there will be a good understanding of what a "typical" model "shape" is. This isn't to say that innovation will stop, but the innovations will be the stuff you do with the model/to the model. And that's probably an easier car to chase.

The wildcard would be if someone comes up with something better than transformer architecture, that throws gasoline on the R&D fire again. But if things stay transformers, eventually there will come a point (even if it's a decade) where the foundational stuff is understood and that will be where ASICs can get good traction.

I also frankly expect hyperscalers to start turning their ASICs into GPGPU-lite. The writing is on the wall that CPU alone (at least the current "slap an A77 on it" approach) isn't good enough. If hyperscalers start rolling out GPGPU-lite, that's an existential threat to NVIDIA's moat. And I think NVIDIA understands the importance of their moats, and the importance of market access/platform reach, I really don't think they are anywhere near as eager to leave the graphics market as people think (the famous "nvidia is an AI company now" was like 2015).

Beyond that, it's really really hard to say, and it depends massively on how the chess board looks at that poitn in time. Jensen is a business savant and NVIDIA is an incredibly lean company with 100% of the staff being top talent, and now he has infinite money. There is a huge amount of untapped revenue in using ML for optimization problems, just like DLSS (sample weighting). These things tend not to have the hallucination problems of LLMs and so on. You don't need a text scratchpad to get DLSS into the right semantic space to generate a valid answer etc.

So big picture, even if "AI" collapses, if Jensen can make money with it himself, he may stay in the market anyway. And between that stuff and the Mediatek partnership/pivoting to laptop+deskop+mobile, even a pop might not be as bad as people think. Jensen has demonstrated an amazing ability to roll with the punches, and NVIDIA might conceivably be back to peak-bubble revenue within 1-3 years (wild-ass guess).

You're exactly right that NVIDIA's going to try and leverage their developers, their partners, their ecosystem, and their platform/market penetration no matter what. But I think the exact form of that pivot depends on the circumstances of the pop - the market conditions, technical conditions, and competitive conditions across various segments. And I doubt even Jensen could give you a roadmap 5 years out for that. He just does what seems good at the moment - what puts him in a place where he can make money now while he works toward where he wants to be in 5-10 years. And that will depend on lots of things, and jensen's reading of those etc.

[-]

norcalnatv@reddit

Thanks, appreciate your thoughts and am aligned with most of them.

The loyal to the graphics market idea stuck out to me, mentioned a couple of times. Nvidia are dominant here, so I think the risk of abandonment or de prioritizing is near non existent.

In the very broad perspective, Nvidia's goal is to establish a new processor architecture and ecosystem, then make that as relevant as possible. That strategy has and will allow growth until something better comes along. PC Graphics was just the first use case. GPGPU (black shoals modeling and monte carlo simulations for example) in late aughts established extend-ability outside graphcis, then came crypto mining and blockchain, and then ML in 2012. Today there are 5M developers working on the platform and growing by the day. Of course the problem is accelerated computing isn't general purpose and so necessarily requires bespoke software. They've done an great job establishing tools and support to make that happen, but ultimately it's simply about creating a ubiquitous high performant platform.

Whether that holds off ASICs in ML is an open question. My view is Nvidia are continuing to extend that ML platform into Enterprise and so there will be a wide coexistance with CUDA even if ASICs 100% take over hyperscale (which I doubt). And that is the ASIC strategy's biggest pitfall, 5 different efforts create 5 different ISAs, so none of them ever gain the momentum to takeover the broader market.

Nvidia will continue extending CUDA and the platform: Automotive, Materials Science, Biological Science, Robotics, and Omniverse -- which I think has zero competition and the biggest potential next to AI/ML.

My last thought is your topic of Nvidia's non-player status mobile and cell phones. Don't you think this is being established with the partnership with Mediatek?

I don't know what they're doing, but cutting down Orin SoC and licensing a low cost guy like Mediatek to build a low power mobile family could be a really great pivot into this market. Leveraging AI know-how could blow everything else out of the water in terms of performance and software ecosystem. I think they have a great chance here.

[-]

capn_hector@reddit

My last thought is your topic of Nvidia's non-player status mobile and cell phones. Don't you think this is being established with the partnership with Mediatek ?

That's exactly what I think. That's what I meant by "NVIDIA is partnering with Mediatek" and that being a significant thing. NVIDIA is pivoting away from the dGPU trap (in the sense of x86 APUs increasingly eroding a market previously populated by low-end dGPUs), on ARM they are an equal player with everyone else and they can sell SOCs or laptops or license IP or whatever else. Getting into Mediatek both gives them a vehicle for delivering their own ideas, and also getting put into random mediatek SOCs that everyone uses.

I think previously that's been somewhat difficult for them because they don't want to hire a bunch of people and dilute their "skunkworks"/"startup" nature, so to speak, which means they have probably been leaving behind ventures that just weren't worth the manpower. Project Denver/Tegra (on phones) is probably a good example. So is the G-Sync FPGA being used forever. Like I literally have been asking when they were going to spin an ASIC since probably 2018-2019, the writing was on the wall after GSync Compatible.

Probably spending a bunch of money and engineering time wasn't worth it, especially if you end up just another competitor in a cut-throat commodity market. But that's kinda what Mediatek does, all the time. Partnering with Mediatek will dramatically expand their ability to hit some of those smaller markets. It's also going to let them do things like build their laptop+desktop market chips etc.

I heard murmurs of the Mediatek automotive partnership (NVIDIA licensing IP to Mediatek) like a year or two ago, a while after the arm merger got shot down. And people were skeptical, NVIDIA doesn't license IP blah blah. Then later, mediatek flagship phone chip with geforce inside? And then the automotive thing got announced. The gsync pulsar chip didn't surprise me in general (as an example of things Mediatek can do for them), let alone the announcement of more to come (and the laptop/desktop chips, etc...). I don't see how people aren't making more out of the overall pattern there.

I think they probably also can do some more interesting things with custom display stream protocols too. They don't have to follow displayport line coding etc. It's interesting to see they're back in that game. Remember, OLEDs can scanout pixels/regions arbitrarily lol, and that can be something that they tie into DLSS to optimize specific important/high-motion parts of the image etc. There's lots of interesting things you could do with DLSS being variable-rate temporal and spatial sampling, and not rendering every region equally good or equally often. Maybe do async timewarp/async pixel warp/whatever on a per-region basis. That's an interesting direction to see them be interested in, they are very aware of perceptual delivery (and Tom Petersen will never not all be about frame pacing wherever he is).

NVIDIA isn't really living large on the revenue here. They are doing huge stock buybacks (which are, notionally, a "we can't deploy this cash effectively" signal). They are ramping some R&D spending etc, definitely a large increase etc. Plus I think they are attempting to rapidly diversify away from AI being their entire revenue, most likely. I see the Mediatek thing as a significant vector for that.

I really don't know how to feel about automotive and tegra in general, post-phones+tablets. Jetsons are fine. I suppose like anything the draw is having access to CUDA (which can do quite a bit of work per watt...). I'm sure the optical flow stuff is very important to automotive engineers etc, but then there's also mundane "we use it to render the instrument cluster" shit??? I have no real idea if any of it is competitive in perf/w or anything, other than the access to cuda. The timelines are often weird and it's all quite expensive too (makes sense, it's automotive/industrial and marketed there). Like I guess it's great for nintendo but it's never been that impressive a product other than it being a place to run your cuda.

It's been an interesting proving ground for some systems-design stuff maybe. NVLink and stuff (although often the cpu-gpu is just pcie). Bluefield is cool (my NIC is a NVMe server that talks to drives via RDMA...). Certainly that does show not everything they do is wildly successful etc. But they have been paying attention to engineering and scaling the system very early, the first nvswitch was like 2012 or something and it was big (about the same as a Xeon 1650v1 in transistor count).

I just imagine this is probably going to be another thing where people are sure that they're finally going to get that wascally wabbit and he just wriggles right out of the trap yet again. Jensen is supremely good at the pivot. He's already setting up what I see as escape routes from being tied to the AI market, and getting out of the APU pinch.

Because that is a strategic threat, that the low-end laptop market is imminently being eaten by bigger apus like hawk point, strix point, strix halo, lunar lake, etc. NVIDIA already contorted themselves to reduce the size of their product in some ways, supposedly to make the mobo footprint and case volume (most ultrathins are not at FAA limits) smaller because OEMs were realizing that if they yanked the dGPU the extra battery and lower power let them compete with apple. That is part of the context of Ada's design too - and AMD kinda did the opposite and spent more silicon to talk to more memory chips etc. Great for a different market but not a mystery why the laptop market laughed at the 7900M. Well, AMD has the last laugh because Strix Point-style products are coming for 4060 style laptops etc, and you could see strix halo-style products displacing 4070 type products even. So NVIDIA needs an exit strategy that lets them hit the market (or similar markets) in other ways. NVIDIA-branded chips or NVIDIA-licensed IP etc. As I said above, I always thought the ARM acquisition was about getting geforce as the default graphics IP for the lowest-tier ARM partners. If the gpu is there, hey, so is CUDA! And Mediatek lets them get a lot of the squeeze, if they can get adoption on those SKUs etc (surely it won't be the default).

Jensen has always been all about the platform. CUDA's support model is all about making a well-supported platform that you can reasonably write code against etc, and even retro-compile code back for older uarchs etc. And you make this product that is easy to use and you get it into everyone's computer, and you give them away to universities so students can access them. And then people start to do stuff with it. And you find the ones who are doing good stuff, and you hire them or pay them a stipend to keep doing their interesting thing and publishing it. Absolutely devious shit, quite heinous. Literally nobody else except maybe Apple has passed the bar of being a reasonable thing that you can actually just write code against instead of fighting the driver. It's the Baldurs Gate 3 "ok so [CUDA] is great and all, but you can't possibly expect the rest of the [hardware] industry to meet that level of quality..." and that's just been where NVIDIA has been quietly empirebuilding for 20 years.

[-]

StoatStonksNow@reddit (OP)

Thanks! This is great insight

[-]

capn_hector@reddit

hyperscalers ASICs are such a bit player most people don’t even know they exist and have existed for ten years

I think there is also some confusion around hyperscaling AI ASICs existing and the neverending treadmill of innovations over the last 2-3 years. People may have glommed onto "they're buying NVIDIA" as implying that hyperscaler ASICs don't exist at all, plus the press reports of "google working on trillium" and so on etc that sound like google doesn't have chips but actually is more about next-gen stuff etc.

[-]

pianobench007@reddit

So far here are the top upvoted comments on why NVIDIA still leads in the Ai race.

Huge existing software library for NVIDIA. Plug N Play usage.
ASIC is not inexpensive to develop.
Costs to develop 3rd party Ai chip is high.
ASIC require time. NVIDIA GPU available today! for money!

Here is my only response and answer as to WHY the Hyperscalers; Google, SAP, Equinix, AWS, Meta, Microsoft, Adobe, Twitter, IBM, Oracle, and others.

My answer is that no ONE company YET has an actual product that is the one mainstream Ai winner. Even OpenAi which looks like the current frontrunner has yet to turn into a viable product that they can sell/market/profit off of the regular consumer.

Just as only an example, Microsoft hopes that GPT-4 can over throw Google in Search and thus give Microsoft a way into the very lucrative Advertising market. Today, we don't know if that has happened yet. But all signs point to people still generally using Google search and Advertising still look to be firmly in Google's Grasps.

Although of course this can change in a very quick moments notice. I don't know anybody who has switched from Google to entirely BING. I myself have tried. It's alright. I went straight back to the safety of Google afterwards. It just is very difficult to get off of Google because of Maps, E-mail, and YouTube. I also search a lot for my work so I am used to sifting through information faster than the Ai.

IE I rather look through forum/tech support posts by users and the answers of other experienced users rather than rely on an Ai to summarize the answer for me.

So that is my response. Currenlty NVIDIA is still king because there isn't a clear product/winner out there yet. So since NVIDIA was the first company to pass out the shovels, it is in the lead to this new ~~Gold~~ (Ai) Rush.

Until there is a clear product with universal adoption and pathway to money, it is just NVIDIA products being bought and built in new datacenters.

[-]

RaspberryFirehawk@reddit

Nvidias days are numbered. chips like AWS titanium are far ahead silicon wise as they are optimized for training while Nvidia gpus were originally made for gaming. One the industry takes the time to take advantage of better hardware Nvidia is done.

[-]

StoatStonksNow@reddit (OP)

I don’t know. My experience in business is that no one thinks it is worth it to use a more difficult but cheaper and more performative thing, and they tend to be right.

If what other people are saying here is true, Trainium won’t take off until their software makes it’s as easy to use as CUDA makes NVDA chips. Are they close to that?

[-]

djm07231@reddit

With the exception of Google’s TPUs all of their other efforts are 1st or 2nd generation products whose software isn’t ready. Trainium/Inferentia is the most mature one besides TPUs but Amazon’s AI offerings have been pretty lackluster.

Even Google’s first generation TPUs were inference only so it takes time to get enough experience.

I think in 5-10 years hyperscalars could run a lot of inference workloads on custom ASICs.

[-]

Pristine-Woodpecker@reddit

A CUDA stack can be developed and tested using a laptop with an RTX card.

Try to move to TPUs, and suddenly your software support gets more finicky, and all the testing for the port has to be done in the cloud on a rented system, for potentially (not guaranteed) running your algorithm at a lower cost. If you're at a very large scale when running costs are more important than developer costs. For everything else...

[-]

EricOrrDev@reddit

Because CUDA and by extension the Bend programming language, fucking rock.

[-]

asenz@reddit

Mainly because of CUDA and the software ecosystem built around it. It's the most mature on the market and integrating it with other hardware doesn't seem to be much of a chore, compared with the competition.

[-]

Evilbred@reddit

ASICs are not flexible and AI is a growing and developing field.

If you build an ASIC system it will likely be more efficient than a GPGPU based system, but what if there's a breakthrough in AI architecture, which is likely since it's new and emerging field?

The ASICs system could be rendered obsolete, while the GPGPU system is flexible and likely can be repurposed

Maybe the largest companies can afford this scale and risk, but a lot of the companies at the end of AI development are smaller.

[-]

akshayprogrammer@reddit

It takes a long time to design a ASIC and you also need to book fab capacity in advance so even if you design one you could have too few units so NVDA still gets majority market share. Also NVDA probably has a better design team so your ASIC probably won't have as big gains as you think.

ASICs are less flexible and you need to determine what the future architecture will be so if you bet wrong like deep neural neywirks instead of transformers your ASIC isnt very useful. The more general your chip becomes the less performance and efficiency gains you have compared to more general purpose chips like NVDA chips while being less flexible. Then you also have to make good software so customers can use is easily.

Google technically beat Nvidia in building AI ASICs as TPUs were announced in 2016 and Google said that they had been using them for over a year so released around 2015 while Volta which introduced tensor cores was released in December 2017. However since most software was targeted at NVDA they become the go to. Anthropic uses Google TPUs for Claude which is the current best model so new AI ASICs from hyperscalers aren't very far behind.

If a new architecture comes out NVDA is in a much better position than AI ASICs because making a new one can take years and old ones won't be very useful

[-]

scytheavatar@reddit

Cause the speed at which AI algorithms are developing means ASICs become obsolete very quickly. GPUs have the advantage of being general purpose and lasting a long time for hyperscalers.

[-]

StoatStonksNow@reddit (OP)

That’s a good point. Thanks!

[-]

dagmx@reddit

I think a lot of the answers are missing out on training vs inference , and providing hardware to customers.

A lot of the custom chips are geared for inference throughput. Very few are intended for training. NVIDIA has a ton of horsepower which is great for training but gets wasted for inference. So even if they have their own chips, they still use NVIDIA for half their pipeline.

And then there’s customer support. They often have third party customers on their cloud offerings who use NVIDIA due to software compatibility. So even if first party use is on custom chips, they need capacity for third party use

[-]

StoatStonksNow@reddit (OP)

That makes sense. Thanks!

[-]

HyperSpazdik@reddit

Homebrew ASICs take time and money to develop. Nvidias GPUs are readily available, much more flexible, have a robust software ecosystem (CUDA) and offer better or comparable performance than ASICs and are continuously improving. Nvidia also has decades of experience in manufacturing chips, new players coming to market now are leagues behind already.

[-]

PostExtreme7699@reddit

Cause it's a lie, as always. It's all planned in order to make less gpus and sell them with triple profit.

I'm waiting yet for all the tons of gpus for the allegedly miner boom of 2020 on Aliexpress. Where are they? Where are the 3070, 6800, 3060... from sellers with 10.000 units available with a price of 100 a piece? They're not there, because they're never existed, it was all lies from Nvidia.

[-]

AutoModerator@reddit

Hello! It looks like this might be a question or a request for help that violates our rules on /r/hardware. If your post is about a computer build or tech support, please delete this post and resubmit it to /r/buildapc or /r/techsupport. If not please click report on this comment and the moderators will take a look. Thanks!

I am a bot, and this action was performed automatically. Please contact the moderators of this subreddit if you have any questions or concerns.