scaling is dead

[-]

DisjointedHuntsville@reddit

Gary Marcus is a clown and says everything including the sun is dead

Reply

[-]

thrownawaymane@reddit

Hey, if the sun was dead it would take us 8 minutes to notice. Maybe it’s dead right now ¯\_(ツ)_/¯

Reply

[-]

Lets check PS C:\Users\Barafu> ping sun.org Pinging sun.org [82.165.239.195] with 32 bytes of data: Reply from 82.165.239.195: bytes=32 time=43ms TTL=56 Reply from 82.165.239.195: bytes=32 time=43ms TTL=56 Reply from 82.165.239.195: bytes=32 time=44ms TTL=56

Reply

[-]

thrownawaymane@reddit

My `ping` time to sub.org is like 960000ms, tell me/NASA your secrets

Reply

[-]

AppearanceHeavy6724@reddit

Gary Marcus is spot on that LLMs are criminally overhyped and are dead end. He is also a bit of a grifter and asshole. Still him being right does not make LLMs boring and useless.

Reply

[-]

Infamous-Lock-2156@reddit

Grifter how?

Reply

[-]

Fun_Smoke4792@reddit

But the sun is indeed going to die.

Reply

[-]

HeinrichTheWolf_17@reddit

Unless superintelligence stops that.

Reply

[-]

Karyo_Ten@reddit

Not before we turn it into a Dyson Sphere to power Dyson-R1 235Z (zettabytes)

Reply

[-]

LevianMcBirdo@reddit

Well, being dead and going to die are very different things.

Reply

[-]

k_means_clusterfuck@reddit

Where does he say that llms are a dead end?

Reply

[-]

UnlegitApple@reddit

although he didn't say it, [here's the video](https://www.youtube.com/watch?v=aR20FWCCjAs)

Reply

[-]

shaman-warrior@reddit

Nowhere. He said it in the interview like scaling alone will not bring better results we need innovation

Reply

[-]

k_means_clusterfuck@reddit

I think people are somehow thinking that "scale is dead = llm is dead", which is not necessarily the case

Reply

[-]

AutomataManifold@reddit

"LLMs = scaling" is what OpenAI *wanted* everyone to believe. They had the advantage in scale, were building up rapidly, and at the time it sure looked like just adding more tokens and more parameters was the way to go. Then we ran out of easy internet tokens (and discovered that a lot of it was trash that wasn't helping much), improved a lot of infrastructure (especially inference speed and context length), discovered that smaller models could exceed older models while running faster), realized that most of the really big LLMs were undertrained for their size, invented MoEs, RoPE, etc. And then good RL training really shook things up: it means we can keep scaling on training compute but not in the way everyone was expecting a year earlier.

Reply

[-]

jesuslop@reddit

What would be an example or two of socking RL training?

Reply

[-]

AutomataManifold@reddit

DeepSeek R1. There was a massive pivot to everyone using GRPO immediately afterwards.

Reply

[-]

Funny_Working_7490@reddit

Still today deepseek solution is way better even now

Reply

[-]

misteramy@reddit

He didn't say that. He said scaling alone will make better models, but big leaps will need fo happen from research.

Reply

[-]

SirRece@reddit

Ilya has said scaling is not all, and he was absolutely correct. He has not said LLMs are a dead end.

Reply

[-]

martinerous@reddit

Andrej Karpathy also had similar sentiments about scaling and also RL. We definitely need better approaches. But scaling will go on just fine until then.

Reply

[-]

Pvt_Twinkietoes@reddit

Yes, but we are already facing practical bottle necks, power grids not being able to support the needed infrastructure for one.

Reply

[-]

dogesator@reddit

That’s why you scale power grid infrastructure and scale energy production. Stargate Abilene and XAI Colossus are both already producing their own on-site energy.

Reply

[-]

Danger_Pickle@reddit

The real scaling problems are far more practical. You can't iterate quickly and build new products when you need to build out an entire new data center to improve your models. Shrinking models enough that they can run on commodity-ish hardware allows teams to iterate much faster on new architecture designs, which are where the real gains will come from. We've already seen the risks of model collapse, which only became apparent in 2023 once we had already exhausted the entire internet for training data. I don't expect that scaling to nuclear powered data centers will give as much improvement as you expect. We don't really have more data to throw at the problem, and adding more parameters to our models isn't scaling like it used to. Besides, the 80/20 rule applies. If an AI can give 80% of the performance for 20% of the training costs, they're probably going to have a better product than their competition because they can build 5 models for the cost of their competition. Plus, if we can shrink the current frontier models by 80%, it starts to become reasonable to run them on hardware normal people can buy. If we scale RAM production so consumer hardware can have >100gb of RAM (only a 4x increase from 32GB), then a 80/20 frontier model can run in places without stable internet access, places where security or reliability are critical. A local frontier model wouldn't deal with unstable latency issues or data center scaling problems, making it a far more compelling product to many businesses. Edge computing is very powerful, but that only works if models can run on a moderately sized cluster instead of needing an entire server rack.

Reply

[-]

martinerous@reddit

The richest companies might come up with solutions that seem crazy, but might actually work and let them squeeze even more from scaling [https://research.google/blog/exploring-a-space-based-scalable-ai-infrastructure-system-design/](https://research.google/blog/exploring-a-space-based-scalable-ai-infrastructure-system-design/)

Reply

[-]

AdministrativeRub484@reddit

Damn karparhy says RL is dead? what is he betting on nowadays?

Reply

[-]

martinerous@reddit

Here's his latest interview: [https://www.dwarkesh.com/p/andrej-karpathy](https://www.dwarkesh.com/p/andrej-karpathy) In short - the approach of shoving insane amounts of data on LLMs is a dead end, we should instead find a way for LLMs to have reasonable forgetfulness. Of course easier said than done.

Reply

[-]

praxis22@reddit

Ab fab

Reply

[-]

S4M22@reddit

I think the original version of this meme was Ilya vs. Yann LeCun: [https://x.com/wyqtor/status/1993439559036911989](https://x.com/wyqtor/status/1993439559036911989)

Reply

Reply to Post

29 Comments

DisjointedHuntsville@reddit

thrownawaymane@reddit

Barafu@reddit

thrownawaymane@reddit

AppearanceHeavy6724@reddit

Infamous-Lock-2156@reddit

Fun_Smoke4792@reddit

HeinrichTheWolf_17@reddit

Karyo_Ten@reddit

LevianMcBirdo@reddit

k_means_clusterfuck@reddit

UnlegitApple@reddit

shaman-warrior@reddit

k_means_clusterfuck@reddit

AutomataManifold@reddit

jesuslop@reddit

AutomataManifold@reddit

Funny_Working_7490@reddit

misteramy@reddit

SirRece@reddit

martinerous@reddit

Pvt_Twinkietoes@reddit

dogesator@reddit

Danger_Pickle@reddit

martinerous@reddit

AdministrativeRub484@reddit

martinerous@reddit

praxis22@reddit

S4M22@reddit