TheaterFire

Did Mark just casually drop that they have a 100,000+ GPU datacenter for llama4 training?

Posted by jd_3d@reddit | LocalLLaMA | View on Reddit | 174 comments

Did Mark just casually drop that they have a 100,000+ GPU datacenter for llama4 training?

Reply to Post

174 Comments

Dizzy_Season_9270@reddit

llama 4 is still not leaps and bounds ahead of the previous iterations
View on Reddit #61175758

sebramirez4@reddit

Wasn’t it already public knowledge that they bought like 15,000 H100s? Of course they’d have a big datacenter
View on Reddit #36554531

jd_3d@reddit (OP)

Yes, public knowledge that they will have 600,000 H100 equivalents by the end of the year. However having that many GPUs is not the same as efficiently networking 100,000 into a single cluster capable of training a frontier model. In May they announced their dual 25k H100 clusters, but no other official announcements. The power requirements alone are a big hurdle. Elons 100K cluster had to resort to I think 12 massive portable gas generators to get enough power.
View on Reddit #36562248

JamesJackL@reddit

just curious, why is it so hard to build a 100k gpu cluster, and how was xAi able to do so? And why did people think that making a cluster bigger then 30k is impossible.? Last question, how will elon make the 1million gpu cluster
View on Reddit #42479659

Atupis@reddit

It is kinda weird that Facebook does not launch their own cloud.
View on Reddit #36567123

virtualmnemonic@reddit

Seriously. What the fuck are they doing with that much compute?
View on Reddit #36572144

uhuge@reddit

AR for Messenger calls.. and a recommendation here and there.
View on Reddit #36631137

Chongo4684@reddit

Signaling the lizard planet.
View on Reddit #36588145

umarmnaq@reddit

[https://huggingface.co/collections/meta-llama/llama-32-66f448ffc8c32f949b04c8cf](https://huggingface.co/collections/meta-llama/llama-32-66f448ffc8c32f949b04c8cf)
View on Reddit #36573878

tecedu@reddit

Cloud can only be popular with incentives or killer products, meta unfortunately has neither in infrastructure
View on Reddit #36575781

progReceivedSIGSEGV@reddit

It's all about profit margins. Meta ads is a literal money printer. There is way less margin in public cloud. If they were to pivot into that, they'd need to spend years generalizing as internal infra is incredibly Meta-specific. And, they'd need to take compute away from the giant clusters they're building...
View on Reddit #36573737

carnyzzle@reddit

Llama 4 coming soon
View on Reddit #36553030

ANONYMOUSEJR@reddit

Llama 3.1 feels like it came out just yesterday, damn this field is going at light speed. Any conjecture as to when or where about Llama 4 might drop. I'm really excited to see the story telling finetunes that will come out after...
View on Reddit #36553230

ThinkExtension2328@reddit

Bro lama 3.2 did just come out yesterday 🙃
View on Reddit #36556809

Fusseldieb@reddit

We have llama 3.2???
View on Reddit #36568803

roselan@reddit

You guys have llama 3.1???
View on Reddit #36588627

CapitalNobody6687@reddit

Wait, what? Why am I still using Llama-2?
View on Reddit #36589646

Neither-Level1373@reddit

Wait. We have llama-2? I’m literally using a Llama with 4 legs.
View on Reddit #37182568

harrro@reddit

Because Miqu model is still fantastic
View on Reddit #36595542

Pvt_Twinkietoes@reddit

Yeah. 90B and 8B I think.
View on Reddit #36592663

ANONYMOUSEJR@reddit

Ah, misinput lol
View on Reddit #36556849

05032-MendicantBias@reddit

I swear, progress is so fast I get left behind weekly...
View on Reddit #36586594

Heavy-Horse3559@reddit

I don't think so...
View on Reddit #36565988

holchansg@reddit

As soon as they put their hands on a new batch of GPUs.
View on Reddit #36554135

gelatinous_pellicle@reddit

Gates said something about how datacenters used to be measured by processors and now they are measured by megawatts.
View on Reddit #36551910

holchansg@reddit

People saying AI is a bubble yet we are talking the same power input of entire countries in the future.
View on Reddit #36554088

BigBasket9778@reddit

Yes but bubble is about output, not input. No one can say that the big tech aren’t buying loads of accelerators and training impressive models. The question is, will that flow through into real economic benefit: more bread, more corn, more t-shirts, or real digital goods people will pay for, and will the amount people are willing to pay for them exceed the cost to train and run the models? If it doesn’t, this IS a bubble. It’s just that the big techs are part of the “bag holders” this time. They’ll be stuck with a huge amount of AI and no way to turn it into $$.
View on Reddit #36575177

Hunting-Succcubus@reddit

So movie, video games, music are bubbles because they are not physical good. Great
View on Reddit #36689566

BigBasket9778@reddit

No, they’re real goods, that people will pay for. Most gen.AI agents I am seeing are just being “baked in” for free, or the cost is so high it’s not worth it.
View on Reddit #37034146

bwjxjelsbd@reddit

well at least they get more "content" on their platform now that people can easily run no-face AI Tiktok/YT channel
View on Reddit #36790548

holchansg@reddit

Exactly, we are already seeing AI everywhere.
View on Reddit #36605287

BigBasket9778@reddit

Yes but that doesn’t matter. That’s like the peloton story x 100. It’s so exciting!! Everyone wants a peloton! But will you really lock in customers who will pay monthly bill for four years to cover your costs? We know how that one turned out. I think Antrhopic, with their sonnet 3.5 model, are charging what is as close as possible to the cost of inference. They’ve basically written off training and are just charging for production: like what US pharmaceutical manufacturers do in third world countries. Even then, the top complaint on Claude is usage limits. People are fundamentally estimating how expensive this tech is. Yes, it can do the job of a junior financial analyst. But the FA is 55k a year and the AI is 280k a year in inference and god knows what in training (depends on adoption). A product does not replace an existing product at the same quality until it is cheaper. Where this is headed, is wage reduction for knowledge workers to stay below the AI line, followed by a huge surplus of GPU compute, followed by a very short (5 year) AI winter, then a refocus on what AI is great at: prediction. The companies who have the capital to push through (Amazon, Google, Meta, Apple, Microsoft), and parallel industries to make products from generative AI (meta, and apple) will make bank. So many will be crushed. And the standard blah blah blah exponential growth - if we expected compute to continue at Moores law, we wouldn’t be talking about building fusion power plants and using a real percentage of the earths energy on new data centres. All of NVidia hand wavy faster than Moores law conversation is them just reducing precision.
View on Reddit #36647809

Ilforte@reddit

> I think Antrhopic, with their sonnet 3.5 model, are charging what is as close as possible to the cost of inference Well I think they're serving it at 90% markup. Gemini models and DeepSeek show that you can come pretty close with very cheap inference. Sonnet is a good model but it almost certainly doesn't cost more than, like, $5 to generate 1M tokens.
View on Reddit #36651268

montdawgg@reddit

MORE DIGITIAL CORNBREAD T-SHIRTS!
View on Reddit #36596557

CapitalNobody6687@reddit

Keep in mind that we're one disruptive innovation away from the bubble popping. If someone figures out a super innovative way to get the same performance on drastically less compute (e.g. CPUs or a dedicated ASIC that becomes commodity), it's going to be a rough time for Nvidia stock.
View on Reddit #36589777

bwjxjelsbd@reddit

Nvidia knows this, and that's why they're trying to lock in customers. But I do think it's inevitable, and it will first start with big tech developing their own chip. Heck, Google and Amazon already have their own in-house chips for both training and inference. Apple also uses Google's TPU to train its models and doesn’t buy Nvidia chips in bulk. Only Meta and Twitter seem like the ones that are buying a boatload of A100s to train AI. I'm pretty sure Meta is also planning, if not already working on, its own chip.
View on Reddit #36790958

AdagioCareless8294@reddit

Which bubble are you popping ? Dramatically reducing the cost of training and inference will likely create more usages where it was not economically feasible.
View on Reddit #36709050

kurtcop101@reddit

That's not exactly correct - it would have to both reduce the amount of compute needed drastically, *and* not scale. Because otherwise, they would take the same compute and the training advantages and take their X% increase in efficiency. It seems pretty logarithmic in terms of efficiency, so if it's, say, 10% compute, they could train on the same effectiveness as 10x their current compute. It would just generally be a boon, but for Nvidia to fall a really good competitor in hardware needs to be made that isn't relying on tsmc. It could happen if the equivalent efficiency ended up quite a bit better on a different type of hardware entirely, true, but that's highly unlikely.
View on Reddit #36622704

holchansg@reddit

Unsloth already uses up to 90% less VRAM. Yet we keep needing more GPUs and more raw power.
View on Reddit #36605211

AwesomeDragon97@reddit

Crypto energy usage was also compared to the amount used by countries.
View on Reddit #36557272

erm_what_@reddit

We only have all this AI explosion now because crypto crashed and left a load of spare GPUs
View on Reddit #36572534

bwjxjelsbd@reddit

not true lol. BTC needed ASIC miner to be profitable and ETH stop being PoW before market crash
View on Reddit #36783602

dysmetric@reddit

[Yeah Meta and Google are buying up all the second hand GPUs](http://gifrific.com/wp-content/uploads/2018/10/macruber-will-forte-head-nod.gif)
View on Reddit #36572898

MikeFromTheVineyard@reddit

Meta was able to build their cluster cheap because NVidia dramatically increased production volume right when crypto crashed. They’re not secondhand, but they were discounted thanks to crypto
View on Reddit #36573428

softmaximum02@reddit

That's really interesting! So, Meta got lucky with timing then. Do you think the market will stabilize now that the hype around AI is so high?
View on Reddit #36608057

dysmetric@reddit

Or they increased volume because AI allowed them to scale. AI optimised chips like H100s aren't well optimised for crypto.
View on Reddit #36573895

MikeFromTheVineyard@reddit

This was in 2022. Look at NVidias stock during that time. Meta announced a massive deal to buy a ton of GPUs. They just wrote off a bunch of money on their filings, they advertised 3D modeling and VR “omniverse” stuff. https://www.sec.gov/Archives/edgar/data/1045810/000104581022000008/q4fy22pr.htm https://www.nextplatform.com/2022/01/24/meta-buys-rather-than-builds-and-opens-its-massive-ai-supercomputer/
View on Reddit #36577246

OneSmallStepForLambo@reddit

>This of, course, happened before the AI explosion that kicked off Nov 2022. To your point, Meta purchased the GPU's then for reels. [Here's him talking about it with Dwarkesh Patel](https://youtu.be/bc6uFV9CJGg?t=296)
View on Reddit #36584384

dysmetric@reddit

That AI cluster is A100s
View on Reddit #36577569

MikeFromTheVineyard@reddit

Yea this happened years ago. We’re talking about the crypto boom crashing.
View on Reddit #36578595

dysmetric@reddit

Nothing in the links in your previous comment related to crypto, and you'll have to be more explicit about what you are trying to get me to infer because I'm not seeing the connections. I'm saying that' the crypto market performance was spuriously correlated with NVIDIA behaviour, not casually related I don't think NVIDIA would have scaled manufacturing based on crypto because of the huge risk, and I see no evidence NVIDIA was upscaling manufacture of efficient crypto-friendly chips.
View on Reddit #36578902

MikeFromTheVineyard@reddit

Crypto wasn’t just *market performance*, there were obvious GPUs actually involved. Nvidia made a lot of those GPUs, and since there were shortages, obviously they wanted to make *more*, to capture that demand. Unfortunately, the perfect storm was NVidia switching suppliers from 2021-2022 (Samsung to TSMC) meant they were the back of the queue for TSMC production- don’t forget the industry wide chip shortage at the time. This impact NVidias ability to quickly scale and meet demand, and explains why they missed it - they had to buy farther out and pre-commit more. To connect it to Crypto directly.. They mostly built increased capacity for gamers, a core market at the time, who were spurned trying to buy a GPU and being outbid during shortages by crypto miners. But, fate would have it, that when crypto crashed everyone dumped used GPUs, and growth slowed tremendously in that market (gaming market). Then they delayed the Ada architecture because of the oversupply, and repriced products *down* at that time to compete with the used market. Let me repeat that last part again: NVidia lowered the price of their silicon (coincidentally) at the same time as ChatGPT was announced. Their entire history, through 2022, the portion of their revenue from gaming was much higher than today 2024. They’re a public company, this is easy to audit (say, from the SEC link I shared). Data center sales didn’t outpace gaming until 2023, post ChatGPT craze starting. To connect it to cheap gaming GPUs… they had gaming GPUs they couldn’t sell, even when discounted, their stock was down 50%, and they even delayed their next-gen chips due to low demand. They needed (1) a way to juice revenue numbers, and desperately, and (2) they needed something to do with their TSMC capacity they purchased during the height of the global chip shortages. Then, Zuck comes around, and allocates >$1.xB in data center sales. Zuck needed some ML compute to improve recommendation algorithms after Apple turned off tracking (App tracking transparency), wiping out billions in metas revenue. Facebook was ~25% of their data center sales from that sec filing, and roughly 4% of their entire 2022 revenue - enough to go bring 2022 sales from a down year to a flat year. Nvidia stock was down a ton over 2022, as sales growth collapsed. They were growing >50%yoy for years, but then grew ~0% in 2022 (with Metas help). Meta basically pre-purchased a huge chunk of 2023 sales at NVidia’s weak moment. You mentioned NVidia not making a “risky bet” on crypto… but their entire history has been trend chasing. If you look at Nvidia in 2022, it was all metaverse and VR. 2023 saw a hard pivot in their marketing to AI/LLMs. https://www.statista.com/statistics/1120484/nvidia-quarterly-revenue-by-specialized-market/
View on Reddit #36580561

StevenSamAI@reddit

Did this like up with one of metas big GPU purchases. I recall seeing zuck in an interview dating they were fortunate to have huge volumes of GPU setup(or ordered) which reduced lead time on them jumping into llama development. He said they were probably going to be used for metaverse, but that it was sort of a speculative purchase. Basically, he knew they would need a shit load of GPUs, but was entirely sure what for. I guess it would make sense if crypto crash caused a price drop.
View on Reddit #36581261

erm_what_@reddit

The AI boom came immediately after the crypto crash. ML needs a ton of GPU compute, and data centres full of GPUs were underutilised and relatively cheap due to low demand. Current systems are using a lot of new GPUs because the demand has outstripped the available resources, but they're also still using a lot of mining compute that's hanging around. Crypto wasn't just people with 50 GPUs in a basement. Some data centres went all in with thousands in professional configurations. Google and Meta aren't buying second hand GPUs on Facebook, but OpenAI were definitely using cheap GPU compute to train GPT2/3 when it was available.
View on Reddit #36574111

dysmetric@reddit

You'll have to demonstrate the timeline in nVidia scaling manufacturing was unrelated to AI, because you're arguing they were scaling for crypto before crypto crashed... if that were the case, why not scale manufacturing earlier? Why did they scale with AI optimised chips, and not crypto-optimized chips? The scaling in manufacturing is also related to AI in another way via AI improving their manufacturing efficiency.
View on Reddit #36574714

dont--panic@reddit

They scaled up for crypto, then crypto crashed which led to a brief period in 2022 where it looked like Nvidia had over extended themselves and was going to end up making too many GPUs. However things quickly shifted as AI took off and since then they've scaled up even more for AI, and have also shifted production towards AI specific products because TSMC can't scale fast enough for them.
View on Reddit #36575203

dysmetric@reddit

The A100 was announced in 2020 though. And that article only mentions gaming demand, whereas crypto wants the efficiency of the 3060 which still seemed under supplied and at the time... if NVIDIA was scaling for crypto it would have scaled manufacturing of its most efficient products, not its most powerful. It still reads like a spurious correlation to me. Tempting to assume causation but it doesn't seem sound in the details.
View on Reddit #36575807

Tartooth@reddit

I like how people are acting like GPUs weren't already training models en-masse Machine learning has been a buzzword forever
View on Reddit #36579617

I_PING_8-8-8-8@reddit

That's nonsense. Bitcoin stopped being profitable on GPU's in 2011, so like 99% of GPU mining was Ethereum. That did not stop because Ethereum crashed, it stopped because Ethereum moved to proof of stake.
View on Reddit #36581033

erm_what_@reddit

Ethereum took a big dive in 2022, at the time it went PoS. As did most of the coins linked to it. That was about the time GPT3 was being trained. There was suddenly a lot more datacentre GPU capacity available, meaning training models was cheaper, meaning GPT3 could be trained better for the same cost, meaning ChatGPT was really good when it came out (and worth sinking a lot of marketing into), meaning people took notice of it.
View on Reddit #36586505

I_PING_8-8-8-8@reddit

>Ethereum took a big dive in 2022, at the time it went PoS. Yes but 2 years later it came back up. But GPU mining never returned because ETH was no longer minable and no other minable coins have grown as big as ETH since.
View on Reddit #36587867

erm_what_@reddit

It did, but it doesn't really matter. Training LLMs isn't tied to crypto other than the fact they both used GPU compute and cheap GPU access at the right time helped LLMs to take off faster than they would have without it. The GPUs freed up by both the general dip across all crypto and the ETH PoS kick-started the LLM boom. After it got going there's been plenty of investment.
View on Reddit #36598104

AIPornCollector@reddit

To be fair some of these large AI companies have more revenue than entire countries.
View on Reddit #36556002

ToHallowMySleep@reddit

No "to be fair" about it. A country and a company are not comparable, just because they have a similar amount of money sloshing around. May as well say a diamond ring is as good as a car.
View on Reddit #36582361

WeArePandey@reddit

Analogies are never perfect, but it’s valid to say that the resources and capital that Meta has allows it to do some things that some countries cannot. Of course Meta can’t join the UN or start wars like a small country can.
View on Reddit #36716277

Hunting-Succcubus@reddit

Comparing great companies to random countries is like comparing small amount of gold to large amount of pebbles
View on Reddit #36689481

Future_Calligrapher2@reddit

Companies and countries are directly comparable. There’s an entire field of academia devoted to studying exactly that. 
View on Reddit #36664600

redballooon@reddit

To be fair, a diamond ring is only good if you already have a car.
View on Reddit #36654683

AuggieKC@reddit

A diamond ring is better than a car for certain scenarios, what's your point?
View on Reddit #36598090

ToHallowMySleep@reddit

My god, you rolled a critical failure when trying to understand something. Try again next year.
View on Reddit #36598762

bearbarebere@reddit

That’s literally their entire point.
View on Reddit #36557187

Hunting-Succcubus@reddit

By Entire country you mean like USA,China,Russia right? So much electricity ⚡️
View on Reddit #36689286

fuulhardy@reddit

The most obvious sign that AI is a bubble (or will be given current tech) is that the main source of improvements *is* to use the power input of entire countries. If AI hypothetically goes far beyond where it is now, it won’t be through throwing more power and vram at it.
View on Reddit #36591806

holchansg@reddit

It will. Mark talked about that, Sam talked about that, Huang talked about that... We are using AI to have more powerful AI's(agents), and more agents to have yet more agents... We are limited by power.
View on Reddit #36593652

fuulhardy@reddit

They talked about it because they need people investing in that infrastructure, not because there won't or shouldn't be advancements in the actual techniques used to train models that could downscale the amount of raw power needed. If machine learning techniques advance in a meaningful way in the next decade, then in twenty years we'll look back on these gigantic datacenters the way we look at "super computers" from the 70s today.
View on Reddit #36604008

holchansg@reddit

>They talked about it because they need people investing in that infrastructure And whats holding this claim? The numbers shows that?
View on Reddit #36604859

fuulhardy@reddit

The GPT transformer model that revolutionized LLM training had nothing to do with using more electricity. It was a fundamental improvement of the training process using the same hardware. Are you under the impression that computational linguists and machine learning researchers only spend their time sourcing more electricity and buying Nvidia GPUs to run the same training methods we have today? That would be ridiculous. My claim was that they need investors to build more infrastructure. They want to build more infrastructure to power more GPUs to train more models right? Then they need money to do it. So they need investors. That’s just how that works. I don’t know what numbers you need when they all say that outright. And yes we have needed less energy to do the same or more workload with computers, that’s one of the main improvements CPU engineers work on every day. See? https://gamersnexus.net/megacharts/cpu-power#efficiency-chart
View on Reddit #36646867

ayyndrew@reddit

I'm not saying it's a bubble but those two things aren't mutually exclusive
View on Reddit #36567383

fuulhardy@reddit

The overlap is unfortunately pretty big too
View on Reddit #36591872

GoodNewsDude@reddit

I am saying it
View on Reddit #36571781

holchansg@reddit

You right, you have Tesla :)
View on Reddit #36568156

NullHypothesisCicada@reddit

It’s far better than mining though, at least AI makes life easier for everyone.
View on Reddit #36568237

holchansg@reddit

Well, it has way more fields, uses and prospects. Cant compare these two.
View on Reddit #36568309

NullHypothesisCicada@reddit

I’m just saying that the consumed power of the GPUs’ calculation can result in different outcomes, while I think that training an AI model is way better than mining the cryptos in terms of power consumption.
View on Reddit #36568504

Tartooth@reddit

Why not both? Get crypto for doing AI training
View on Reddit #36584388

holchansg@reddit

Oh for sure, way more noble.
View on Reddit #36569295

battlesubie1@reddit

It makes certain tasks easier - not life easier for everyone. In fact I would argue this is only going to benefit large corporations and the wealthy investor class over any benefits to average people.
View on Reddit #36572183

drdaeman@reddit

So many things giod and bad going on, I guess I wouldn’t mind living to see humanity building a Dyson sphere or something, powering some really beefy number crunchers to draw extremely detailed waifus… just kidding. :)
View on Reddit #36573231

Literature-South@reddit

Crypto was also discussed in those terms and had bubbles. We’ll see what happens.
View on Reddit #36569290

holchansg@reddit

Care to explain more the correlation?
View on Reddit #36569416

Literature-South@reddit

My point is that power input does not mean it’s not a bubble. We’ve seen similar power inputs to other tech projects that are bubbles. In fact, there’s a similarity here. The cost per query in AI is a similar problem as the cost per block in blockchain based cryptos. The big difference I suppose is that the incentive for AI is to lower that cost, but for crypto is was a core feature. Bottom line, I’m pointing out that a large power I put to the project doesn’t have anything to do with it being or not being a bubble.
View on Reddit #36569544

jerryfappington@reddit

So what? The same thing happened with crypto lol.
View on Reddit #36565696

holchansg@reddit

oh yeah, totally the same thing.
View on Reddit #36566611

jerryfappington@reddit

Yes it is the same thing. Power as a positive signal that AI isnt a bubble is a ridiculous thing to say lmao
View on Reddit #36567002

holchansg@reddit

One of.
View on Reddit #36568204

auziFolf@reddit

In the future?
View on Reddit #36557128

bwjxjelsbd@reddit

With AI imposing such significant constraints on grid capacity, it’s surprising that more big tech companies don’t invest heavily in nuclear power to complement renewable energy sources. The current 20% efficiency of solar panels is indeed a limitation, and I hope we’ll see more emphasis on hybrid solutions like this in the future
View on Reddit #36783715

bchertel@reddit

*Gigawatts
View on Reddit #36596040

gelatinous_pellicle@reddit

The human brain runs on 20 watts. I'm not so sure intelligence will keep requiring the scale of power we are on with ai for the moment. Maybe, just something people should keep in mind.
View on Reddit #36596796

reefine@reddit

Especially true with how cheap tokens have gotten with Open AI
View on Reddit #36648442

s101c@reddit

Semi-Automatic Ground Environment (SAGE) would like to have a word. https://en.wikipedia.org/wiki/AN/FSQ-7_Combat_Direction_Central
View on Reddit #36568898

CapitalNobody6687@reddit

Exactly. Everyone is talking about the Meta and xAI clusters right now. No one is talking about the massive GPU clusters the DoD is likely building right now. Keep in mind the US DoD can produce a few less tanks and jets in order to throw a billion dollars at something and not blink an eye. The Title 10 budgets are hamstrung by the POM cycle, but the black budgets often aren't. Can't wait to start hearing about what gets built in at a national scale...
View on Reddit #36589581

05032-MendicantBias@reddit

It's true, they are limited by access to the grid and cooling. [One B200 server rack runs you half a megawatt.](https://docs.nvidia.com/https:/docs.nvidia.com/nvidia-dgx-superpod-data-center-best-practices-with-dgx-b200.pdf)
View on Reddit #36586541

xXWarMachineRoXx@reddit

That’s a damn good quote
View on Reddit #36567079

Expensive-Paint-9490@reddit

But, can it run Crisis?
View on Reddit #36590068

UnkleRinkus@reddit

Yes, but it's slow.
View on Reddit #36853634

SeiryokuZenyo@reddit

I was at a conference 6 months ago where a guy from Mets talked about how they had ordered a crapload (200k ?) of GPU for the whole Metaverse thing, Zuck ordered them to repurpose to AI when that path opened up. Apparently he had ordered way more than they needed to allow for growth, he was either extremely smart or lucky - tbh probably some of both
View on Reddit #36736938

PixarCEO@reddit

no he didnt "drop"
View on Reddit #36657039

matali@reddit

Well known by now, yes
View on Reddit #36646167

denyicz@reddit

damn iam still at llama2 era
View on Reddit #36556828

uhuge@reddit

gotta distill up a bit!')
View on Reddit #36631640

2smart4u@reddit

At the level of compute we're using to train models, it seems absurd that these companies aren't just investing more into quantum computer R&D
View on Reddit #36562107

NunyaBuzor@reddit

adding quantum in front of the word computer doesn't make it faster.
View on Reddit #36566352

2smart4u@reddit

I'm not talking about fast, I'm talking about qubits using less energy.
View on Reddit #36568145

iperson4213@reddit

quantum computing is still a pretty nascient field, with the largest stable computers in the order of 1000’s of qubits, so it’s just not ready for city sized data center scale
View on Reddit #36575416

ambient_temp_xeno@reddit

I only have a vague understanding of quantum computers but I don't see how they would be any use for speeding up current AI architecture.
View on Reddit #36580427

iperson4213@reddit

I suppose it could be useful for new AI architectures that utilize scaled up quantum computers to be more efficient, but said architectures are also pretty exploratory since there aren’t any scaled up quantum computers to test scaling laws on them.
View on Reddit #36629665

2smart4u@reddit

I think if you took some time to understand quantum computing you would realize that your comment comes from a fundamental misunderstanding of how it works.
View on Reddit #36617257

iperson4213@reddit

any good articles/resources to learn more about this?
View on Reddit #36629546

5TP1090G_FC@reddit

Oh, this is sooooo, old. Git with the program please
View on Reddit #36620000

Axolotl_Architect@reddit

I feel like if human brains can run on a burrito, then maybe the problem with AI is the programming, not the input power.
View on Reddit #36577045

Capable-Path8689@reddit

our hardware is different. When 3d stacking will become a thing for processors, then they will use even less energy than our brain. All the processor are 2D as of today.
View on Reddit #36593694

Axolotl_Architect@reddit

True! *programming and hardware
View on Reddit #36610749

Capable-Path8689@reddit

our hardware is different. When 3d stacking will become a thing for processors, then they will use even less energy than our brain. All the processor are 2D as of today.
View on Reddit #36593700

gigDriversResearch@reddit

I can't keep with the innovations anymore. This is why. Not a complaint :)
View on Reddit #36605854

LoafyLemon@reddit

So this is where all the used 3090s went...
View on Reddit #36567980

ain92ru@reddit

Hyperscalers don't actually buy used gaming GPUs because of reliability disadvantages which are a big deal for them
View on Reddit #36582210

LoafyLemon@reddit

I know, I was making a joke.
View on Reddit #36594896

Pvt_Twinkietoes@reddit

This isn't news, he did say he was purchasing 100k GPUs to train models earlier this year.
View on Reddit #36592558

RiffMasterB@reddit

Can he just get a head transplant, his face looks so weird
View on Reddit #36592487

Capable-Path8689@reddit

we already knew this for like 2 months.....
View on Reddit #36592022

richard3d7@reddit

Whats the end game for meta? There is no free lunch...
View on Reddit #36588119

EDLLT@reddit

Guys, we are living at the exponential curve. Things will EXPLODE insanely quickly. I'm not joking when I state that immortality might be achieved(Just look up who Bryan Johnson is and what he's doing)
View on Reddit #36584856

Fatvod@reddit

Meta has well over 600,000 nvidia gpu's. This is not surprising.
View on Reddit #36583116

KarnotKarnage@reddit

But can they run far cry in 8k@120fps?
View on Reddit #36582245

RogueStargun@reddit

The engineering team released in a blog post last year that they will have 600,000 by the end of this year. Amdahl's law means that it doesn't mean they will necessarily be able to network and effectively utilize all that at once in a single cluster. In fact llama 3.1 405B was pre-trained on a 16,000 H100 gpu cluster.
View on Reddit #36554752

jd_3d@reddit (OP)

Yeah the article that showed the struggles they overcame for their 25,000 h100 GPU clusters was really interesting. Hopefully they release a new article with this new beast of a data center and what they had to do for efficient scaling with 100,000+ GPUs. At that number of gpus there has to be multiple gpus failing each day and I'm curious how they tackle that.
View on Reddit #36555467

ain92ru@reddit

Mind linking that article? I, in turn, could recommend this one by SemiAnalysis from June, even the free part is very interesting: https://www.semianalysis.com/p/100000-h100-clusters-power-network
View on Reddit #36581989

RogueStargun@reddit

According to the llama paper they do some sort of automated restart from checkpoint. 400+ times in just 54 days. Just incredibly inefficient at the moment
View on Reddit #36556741

jd_3d@reddit (OP)

Yeah do you think that would scale with 10 times the number of GPUs? 4,000 restarts?? No idea how long a restart takes but that seems brutal.
View on Reddit #36557618

Previous-Piglet4353@reddit

I don't think restart counts scale linearly with size, but probably logarithmically. You might have 800 restarts, or 1200. A lot of investment goes to keeping that number as low as possible. Nvidia, truth be told, ain't nearly the perfectionist they make themselves out to be. Even their premium, top-tier GPUs have flaws.
View on Reddit #36560109

iperson4213@reddit

restarts due to hardware failures can be approximated by an exponential distribution, which does have linear mtbf scaling to number of hardware units
View on Reddit #36575111

Previous-Piglet4353@reddit

Good to know!
View on Reddit #36575180

keepthepace@reddit

At this scale, reliability becomes as much of a deal as VRAM. Groq is cooperating with Meta, I suspect this may not be your commoner H100 that ends up in their 1M GPU cluster.
View on Reddit #36570040

KallistiTMP@reddit

In short, kubernetes. Also a fuckload of preflight testing, burn in, and preemptively killing anything that even starts to look like it's thinking about failing. That plus continuous checkpointing and very fast restore mechanisms. That's not even the fun part, the fun part is turning the damn thing on without bottlenecking *literally everything.*
View on Reddit #36569621

Mescallan@reddit

600k is metas entire fleet, including Instagram and Facebook recommendations and reels inference. If they wanted to use all of it I'm sure they could get some downtime on their services, but it's looking like they will cross 1,000,000 in 2025 anyway
View on Reddit #36562566

RogueStargun@reddit

I think the majority of that infra will be used for serving, but gradually Meta is designing and fabbing its own inference chips. Not to mention there are companies like Groq and Cerebras that are salivating at the mere opportunity to ship some of their inference chips to a company like Meta. When those inference workloads get offloaded to dedicated hardware, there's gonna be a lot of GPUs sitting around just rarin' to get used for training some sort of ungodly scale AI algorithmns. Not to mention the B100 and B200 blackwell chips haven't even shipped yet.
View on Reddit #36563934

ILikeCutePuppies@reddit

I wonder if Cerebras could even produce enough chips at the moment to satisfy more large customers? They already seems to have their hands full building multiple super computers and building out their own cloud service as well.
View on Reddit #36574231

Cane_P@reddit

From the man himself: https://www.instagram.com/reel/C2QARHJR1sZ/?igsh=MWg0YWRyZHIzaXFldQ==
View on Reddit #36557324

ab2377@reddit

i also was thinking while reading that he said this last year before release of llama 3 too
View on Reddit #36555307

bwjxjelsbd@reddit

At what point does it make sense to made their own chip to train AI? Google and Apple is using Tensor chip to train AI instead of Nvidia GPU which should save them a whole lot of cost on energy
View on Reddit #36580576

LeastWest9991@reddit

Can’t wait. I really hope open-source prevails
View on Reddit #36579836

rapsoid616@reddit

What gpu's are they using?
View on Reddit #36579262

drwebb@reddit

I was just at Pytorch Con, a lot is improving on the SW side as well to enable scaling past what we've gotten out of standard data and tensor parallel methods
View on Reddit #36555725

Which-Tomato-8646@reddit

Anything specific? 
View on Reddit #36578959

randomrealname@reddit

The age of LLM's while revolutionary, is over. I hope to see next gen models open sourced, imagine having a o1 to home where you can choose the thinking time. Profound.
View on Reddit #36555351

OkDimension@reddit

a good part of o1 is still LLM text generation, it just gets an additional dimension where it can reflect on it's own output, analyze and proceed from there
View on Reddit #36567961

randomrealname@reddit

No, it isn't doing next token prediction, it uses graph theory to traverse the possibilities and the outputs the best result from the traversal. An LLM was used as the reward system in an RL training run, though, but what we get is not from an LLM. OAI, or specifically Noam, explains it in the press release for o1 on their site, without going into technical details
View on Reddit #36578870

NunyaBuzor@reddit

tranfusion models.
View on Reddit #36566317

swagonflyyyy@reddit

It hasn't so much ended but rather evolved into other forms of modality besides plain text. LLMs are still gonna be around, but embedded in other complementary systems. And given o1's success, I definitely think there is still more room to grow.
View on Reddit #36555754

randomrealname@reddit

Inference engines (LLM's) are just the first in stepping stones to better intelligence. Think about your thought process, or anyone's... we infer, then we learn some ground truth and reason on our original assumptions(inference). This gives us overall ground truth. What future online learning systems need is some sort of ground truth, that is the path to true general intelligence.
View on Reddit #36558466

ortegaalfredo@reddit

>The age of LLM's while revolutionary, is over. Its the end of the beginning.
View on Reddit #36558181

randomrealname@reddit

Specifically, llm's, or better to say, inference engines alongside reasoning engines will usher in the next era. But I wish Zuckerberg would hook up BIG llama to an RL algorithm and give us a reasoning engine like o1. We can only dream.
View on Reddit #36558304

tazzytazzy@reddit

Newbie here. Would using these newer trained models take the same resources, given that the llm is the same size? For example, would llama3.2 7b and llama4 7b, require about the same resources and work at about the same speed?
View on Reddit #36554896

Fast-Persimmon7078@reddit

Training efficiency changes depending on the model arch.
View on Reddit #36555290

iperson4213@reddit

if you’re using the same code, yes. But across generations, there are algorithmic improvements that approximate very similar math, but faster, allowing retraining of an old model to be faster/use less conpute
View on Reddit #36575300

Downtown-Case-1755@reddit

It depends... on a lot of things. First of all, the parameter count (7B) is sometimes rounded. Second, some models use more vram for the context than others, though if you keep the context very small (like 1K) this isn't an issue. Third, some models *quantize* more poorly than others. This is more of a "soft" factor that effectively makes the models a little bigger. It's also possible the architecture will change dramatically (eg be mamba + transformers, bitnet, or something) which could dramatically change the math.
View on Reddit #36555214

jd_3d@reddit (OP)

Yes if they are the same architecture and the same number of parameters and if we were just talking dense models they are going to take the same number of resources. There's more complexity to answer but in general this holds true.
View on Reddit #36555195

utf80@reddit

Need 104567321467 more GPU's. 😅
View on Reddit #36573606

xadiant@reddit

Would they notice cuda:99874 and cuda:93563 missing I wonder...
View on Reddit #36562813

ThenExtension9196@reddit

100k is table stakes.
View on Reddit #36560085

TitusPullo4@reddit

"could"
View on Reddit #36556953

Beautiful_Surround@reddit

He dropped it a while ago: [https://www.perplexity.ai/page/llama-4-will-need-10x-compute-wopfuXfuQGq9zZzodDC0dQ](https://www.perplexity.ai/page/llama-4-will-need-10x-compute-wopfuXfuQGq9zZzodDC0dQ)
View on Reddit #36552053

jd_3d@reddit (OP)

See the interview here: [https://www.youtube.com/watch?v=oX7OduG1YmI](https://www.youtube.com/watch?v=oX7OduG1YmI) I have to assume llama 4 training has started already, which means they must have built something beyond their current [dual 25k H100 datacenters](https://engineering.fb.com/2024/03/12/data-center-engineering/building-metas-genai-infrastructure/).
View on Reddit #36550962