Gary Marcus is spot on that LLMs are criminally overhyped and are dead end. He is also a bit of a grifter and asshole. Still him being right does not make LLMs boring and useless.
"LLMs = scaling" is what OpenAI *wanted* everyone to believe. They had the advantage in scale, were building up rapidly, and at the time it sure looked like just adding more tokens and more parameters was the way to go.
Then we ran out of easy internet tokens (and discovered that a lot of it was trash that wasn't helping much), improved a lot of infrastructure (especially inference speed and context length), discovered that smaller models could exceed older models while running faster), realized that most of the really big LLMs were undertrained for their size, invented MoEs, RoPE, etc. And then good RL training really shook things up: it means we can keep scaling on training compute but not in the way everyone was expecting a year earlier.
Andrej Karpathy also had similar sentiments about scaling and also RL. We definitely need better approaches. But scaling will go on just fine until then.
That’s why you scale power grid infrastructure and scale energy production. Stargate Abilene and XAI Colossus are both already producing their own on-site energy.
The real scaling problems are far more practical. You can't iterate quickly and build new products when you need to build out an entire new data center to improve your models. Shrinking models enough that they can run on commodity-ish hardware allows teams to iterate much faster on new architecture designs, which are where the real gains will come from. We've already seen the risks of model collapse, which only became apparent in 2023 once we had already exhausted the entire internet for training data. I don't expect that scaling to nuclear powered data centers will give as much improvement as you expect. We don't really have more data to throw at the problem, and adding more parameters to our models isn't scaling like it used to.
Besides, the 80/20 rule applies. If an AI can give 80% of the performance for 20% of the training costs, they're probably going to have a better product than their competition because they can build 5 models for the cost of their competition. Plus, if we can shrink the current frontier models by 80%, it starts to become reasonable to run them on hardware normal people can buy. If we scale RAM production so consumer hardware can have >100gb of RAM (only a 4x increase from 32GB), then a 80/20 frontier model can run in places without stable internet access, places where security or reliability are critical. A local frontier model wouldn't deal with unstable latency issues or data center scaling problems, making it a far more compelling product to many businesses. Edge computing is very powerful, but that only works if models can run on a moderately sized cluster instead of needing an entire server rack.
The richest companies might come up with solutions that seem crazy, but might actually work and let them squeeze even more from scaling [https://research.google/blog/exploring-a-space-based-scalable-ai-infrastructure-system-design/](https://research.google/blog/exploring-a-space-based-scalable-ai-infrastructure-system-design/)
Here's his latest interview: [https://www.dwarkesh.com/p/andrej-karpathy](https://www.dwarkesh.com/p/andrej-karpathy)
In short - the approach of shoving insane amounts of data on LLMs is a dead end, we should instead find a way for LLMs to have reasonable forgetfulness. Of course easier said than done.
I think the original version of this meme was Ilya vs. Yann LeCun: [https://x.com/wyqtor/status/1993439559036911989](https://x.com/wyqtor/status/1993439559036911989)
29 Comments
DisjointedHuntsville@reddit
thrownawaymane@reddit
Barafu@reddit
thrownawaymane@reddit
AppearanceHeavy6724@reddit
Infamous-Lock-2156@reddit
Fun_Smoke4792@reddit
HeinrichTheWolf_17@reddit
Karyo_Ten@reddit
LevianMcBirdo@reddit
k_means_clusterfuck@reddit
UnlegitApple@reddit
shaman-warrior@reddit
k_means_clusterfuck@reddit
AutomataManifold@reddit
jesuslop@reddit
AutomataManifold@reddit
Funny_Working_7490@reddit
misteramy@reddit
SirRece@reddit
martinerous@reddit
Pvt_Twinkietoes@reddit
dogesator@reddit
Danger_Pickle@reddit
martinerous@reddit
AdministrativeRub484@reddit
martinerous@reddit
praxis22@reddit
S4M22@reddit