TheaterFire

scaling is dead

Posted by Crazyscientist1024@reddit | LocalLLaMA | View on Reddit | 29 comments

scaling is dead

Reply to Post

29 Comments

DisjointedHuntsville@reddit

Gary Marcus is a clown and says everything including the sun is dead
View on Reddit #72145473

thrownawaymane@reddit

Hey, if the sun was dead it would take us 8 minutes to notice. Maybe it’s dead right now ¯\_(ツ)_/¯
View on Reddit #72174349

Barafu@reddit

Lets check PS C:\Users\Barafu> ping sun.org Pinging sun.org [82.165.239.195] with 32 bytes of data: Reply from 82.165.239.195: bytes=32 time=43ms TTL=56 Reply from 82.165.239.195: bytes=32 time=43ms TTL=56 Reply from 82.165.239.195: bytes=32 time=44ms TTL=56
View on Reddit #72705624

thrownawaymane@reddit

My `ping` time to sub.org is like 960000ms, tell me/NASA your secrets
View on Reddit #72790497

AppearanceHeavy6724@reddit

Gary Marcus is spot on that LLMs are criminally overhyped and are dead end. He is also a bit of a grifter and asshole. Still him being right does not make LLMs boring and useless.
View on Reddit #72150327

Infamous-Lock-2156@reddit

Grifter how?
View on Reddit #72248299

Fun_Smoke4792@reddit

But the sun is indeed going to die.
View on Reddit #72147069

HeinrichTheWolf_17@reddit

Unless superintelligence stops that.
View on Reddit #72181601

Karyo_Ten@reddit

Not before we turn it into a Dyson Sphere to power Dyson-R1 235Z (zettabytes)
View on Reddit #72162195

LevianMcBirdo@reddit

Well, being dead and going to die are very different things.
View on Reddit #72151993

k_means_clusterfuck@reddit

Where does he say that llms are a dead end?
View on Reddit #72137740

UnlegitApple@reddit

although he didn't say it, [here's the video](https://www.youtube.com/watch?v=aR20FWCCjAs)
View on Reddit #72229114

shaman-warrior@reddit

Nowhere. He said it in the interview like scaling alone will not bring better results we need innovation
View on Reddit #72138711

k_means_clusterfuck@reddit

I think people are somehow thinking that "scale is dead = llm is dead", which is not necessarily the case
View on Reddit #72139602

AutomataManifold@reddit

"LLMs = scaling" is what OpenAI *wanted* everyone to believe. They had the advantage in scale, were building up rapidly, and at the time it sure looked like just adding more tokens and more parameters was the way to go. Then we ran out of easy internet tokens (and discovered that a lot of it was trash that wasn't helping much), improved a lot of infrastructure (especially inference speed and context length), discovered that smaller models could exceed older models while running faster), realized that most of the really big LLMs were undertrained for their size, invented MoEs, RoPE, etc. And then good RL training really shook things up: it means we can keep scaling on training compute but not in the way everyone was expecting a year earlier.
View on Reddit #72144792

jesuslop@reddit

What would be an example or two of socking RL training?
View on Reddit #72147567

AutomataManifold@reddit

DeepSeek R1. There was a massive pivot to everyone using GRPO immediately afterwards.
View on Reddit #72148820

Funny_Working_7490@reddit

Still today deepseek solution is way better even now
View on Reddit #72205771

misteramy@reddit

He didn't say that. He said scaling alone will make better models, but big leaps will need fo happen from research.
View on Reddit #72142134

SirRece@reddit

Ilya has said scaling is not all, and he was absolutely correct. He has not said LLMs are a dead end.
View on Reddit #72205877

martinerous@reddit

Andrej Karpathy also had similar sentiments about scaling and also RL. We definitely need better approaches. But scaling will go on just fine until then.
View on Reddit #72140400

Pvt_Twinkietoes@reddit

Yes, but we are already facing practical bottle necks, power grids not being able to support the needed infrastructure for one.
View on Reddit #72140898

dogesator@reddit

That’s why you scale power grid infrastructure and scale energy production. Stargate Abilene and XAI Colossus are both already producing their own on-site energy.
View on Reddit #72143818

Danger_Pickle@reddit

The real scaling problems are far more practical. You can't iterate quickly and build new products when you need to build out an entire new data center to improve your models. Shrinking models enough that they can run on commodity-ish hardware allows teams to iterate much faster on new architecture designs, which are where the real gains will come from. We've already seen the risks of model collapse, which only became apparent in 2023 once we had already exhausted the entire internet for training data. I don't expect that scaling to nuclear powered data centers will give as much improvement as you expect. We don't really have more data to throw at the problem, and adding more parameters to our models isn't scaling like it used to. Besides, the 80/20 rule applies. If an AI can give 80% of the performance for 20% of the training costs, they're probably going to have a better product than their competition because they can build 5 models for the cost of their competition. Plus, if we can shrink the current frontier models by 80%, it starts to become reasonable to run them on hardware normal people can buy. If we scale RAM production so consumer hardware can have >100gb of RAM (only a 4x increase from 32GB), then a 80/20 frontier model can run in places without stable internet access, places where security or reliability are critical. A local frontier model wouldn't deal with unstable latency issues or data center scaling problems, making it a far more compelling product to many businesses. Edge computing is very powerful, but that only works if models can run on a moderately sized cluster instead of needing an entire server rack.
View on Reddit #72190882

martinerous@reddit

The richest companies might come up with solutions that seem crazy, but might actually work and let them squeeze even more from scaling [https://research.google/blog/exploring-a-space-based-scalable-ai-infrastructure-system-design/](https://research.google/blog/exploring-a-space-based-scalable-ai-infrastructure-system-design/)
View on Reddit #72141069

AdministrativeRub484@reddit

Damn karparhy says RL is dead? what is he betting on nowadays?
View on Reddit #72146529

martinerous@reddit

Here's his latest interview: [https://www.dwarkesh.com/p/andrej-karpathy](https://www.dwarkesh.com/p/andrej-karpathy) In short - the approach of shoving insane amounts of data on LLMs is a dead end, we should instead find a way for LLMs to have reasonable forgetfulness. Of course easier said than done.
View on Reddit #72150336

praxis22@reddit

Ab fab
View on Reddit #72148722

S4M22@reddit

I think the original version of this meme was Ilya vs. Yann LeCun: [https://x.com/wyqtor/status/1993439559036911989](https://x.com/wyqtor/status/1993439559036911989)
View on Reddit #72139946