The kernel panic over static HTML got me, but the real move is accepting your local setup won't match production and building around that constraint instead of against it. Run smaller models in prod, debug with whatever fits in your garage at 3am.
The weights visualizer at 0:12 hits home. We’re all still obsessed with dense FP16/BF16 visualization when the future is clearly sparse activation. The "struggle" in the video is mostly a symptom of memory bandwidth bottlenecks—if we moved to ternary MoE, that "loading" bar would be an afterthought because we'd finally be fitting 80B+ params into 24GB VRAM without the usual 4-bit quantization rot.
Meanwhile you are furiously searching GitHub for awesome-repos to "hire" a full stack "agentic workforce" in order to "add security" and "stop fucking up" while screaming at Claude code that despite the code having never worked on anything other than your busted old gaming rig because no one can afford computers anymore that it's time to take another one for the team and add those eight new untested MCP servers because: it's about how haad you can GIT hit, and keep movin' FOWAARD!"
I'm sorry, I realize I've made a mistake deleting SYSTEM32. I interpreted your request for writing html as making your computer unusable -- end of prompt dumbass good luck
It can work. I was surprised too. I'm currently benchmarking qwens still:
unsloth/Qwen3.5-122B-A10B-GGUF MXFP4_MOE for agentic tasks with quick prefill (I'm getting... 1.4k)
unsloth/Qwen3.5-397B-A17B-GGUF:Q3_K_M for general software development chat
... and I have been a big skeptic of local models (still can't forget how badly the llama2 and llama4 burned the trust) so far. Those two models, with all the patches and carefuly chosen quants to my hardware, are just spectacular for, what I ever imagined from a local LLM.
You can argue that it's because of the hardware (128/96 ram/vram), but if current trends continue (turboquant, improving datasets for coding), we might actually get to a place, where it's all starts being very feasible. We're practically on a brink of having something that can replace a subscription *for some usecases*.
Even if I code something I don't want to leave the machine unattended and without structure. I want a readable codebase. So far most of what I've tried to build has been built quite well using open source models.
100% true, but I'm betting that long term, open source models will win. I don't see how any company will have sustainable long term businesses selling compute to the masses.
You don't see how selling the same thing again and again behind an api can be profitable? Im doing stuff local already but I'm aware that I'm an outlier and the masses will buy lightweight hardware and use cloud. I don't want it but I can't stop it.
You're missing one important difference: Selling services like email scales, but selling AI power doesn't. To sell email to 1 million users, you scale by 10x. To sell AI to 1 million users, you need to scale by 100000x, if you're lucky. AI compute is conserved because it's proportional to energy. Email and other web services are not the same.
How are they not the same? There is an upfront cost and then a marginal cost. I get that the marginal cost is lower for email, but also nobody is paying a few cents per email either. All that matters is that they get a small spread per token.
If you think token generation is marginal, then you have a lot to learn. Proving you wrong is easy. If token generation was as marginal as you're claiming, your cost for using AI for a user would not be proportional to your token usage. Because, again, token generation (per model) is almost proportional to energy. That's the opposite of "marginal".
So, for companies like Anthropic to make money, they have to sell their services with 5x-10x the current price. The question is, in a future where models are much better and more efficient, will companies like Anthropic be profitable? In other words: will energy become cheaper faster than open source models become better? I doubt it.
Why are you condescending if you seemingly don't know what marginal cost means? Look up the wiki page of it and you'll see your comment makes no sense in relation to what I said.
If you can't see it then that's on you man lol local is good and all the best engineers will always flock to those who give more money and companies will always seek profits. china is doing it for now just to gain market share, not out of the goodness of their hearts.
You're forgetting that AI companies are not profitable. This isn't sustainable. That's why many are calling it a bubble. My thesis is that what's sustainable will not look like what we're seeing today, and by then (10-20 years), open source models will have improved a lot.
I can't decide between spending money into a cloud LLM subscription or air conditioning to cool down the room due to my crappy PC sweating when running gemma 4
mrtrly@reddit
The kernel panic over static HTML got me, but the real move is accepting your local setup won't match production and building around that constraint instead of against it. Run smaller models in prod, debug with whatever fits in your garage at 3am.
Impossible_Style_136@reddit
The weights visualizer at 0:12 hits home. We’re all still obsessed with dense FP16/BF16 visualization when the future is clearly sparse activation. The "struggle" in the video is mostly a symptom of memory bandwidth bottlenecks—if we moved to ternary MoE, that "loading" bar would be an afterthought because we'd finally be fitting 80B+ params into 24GB VRAM without the usual 4-bit quantization rot.
Heavy-Focus-1964@reddit
mirror?
BAZfp@reddit (OP)
https://youtu.be/G_PaC7QUczs
Heavy-Focus-1964@reddit
thanks bro. This is hilarious, travesty to take it down
Standard-Potential-6@reddit
u/BAZfp, upload to your own page please?
🙄
RuiRdA@reddit
Way too accurate.
StupidScaredSquirrel@reddit
Very high quality shitpost. Thank you.
KindnessBiasedBoar@reddit
Game knows.
LocalLLaMA-ModTeam@reddit
Rule 3 - shitpost
Cool-Chemical-5629@reddit
ZeitgeistArchive@reddit
This is now the LLM anthem, Voodoo people!
bakaraka@reddit
The voodoo who do what you don't dare do, people!
No_Lingonberry1201@reddit
"How did you manage to cause a kernel panic with a static HTML?"
TopChard1274@reddit
"How many hardware... has a virtual computer?"
he's a real human, a real human bean🎶
bakaraka@reddit
Meanwhile you are furiously searching GitHub for awesome-repos to "hire" a full stack "agentic workforce" in order to "add security" and "stop fucking up" while screaming at Claude code that despite the code having never worked on anything other than your busted old gaming rig because no one can afford computers anymore that it's time to take another one for the team and add those eight new untested MCP servers because: it's about how haad you can GIT hit, and keep movin' FOWAARD!"
ZiddyBlud@reddit
I'm sorry, I realize I've made a mistake deleting SYSTEM32. I interpreted your request for writing html as making your computer unusable -- end of prompt dumbass good luck
sourceholder@reddit
The hallucinations are just that good.
sultan_papagani@reddit
jeremymeyers@reddit
Ok but wheres the boobs
ZiddyBlud@reddit
(.) (.)
Thrumpwart@reddit
Whats the weights visualizer at 0:12?
bcell4u@reddit
How you somehow managed to capture our experiences with local llm in this video, the world will never know.
popsumbong@reddit
What was that at 11 seconds
Medium_Chemist_4032@reddit
It can work. I was surprised too. I'm currently benchmarking qwens still:
... and I have been a big skeptic of local models (still can't forget how badly the llama2 and llama4 burned the trust) so far. Those two models, with all the patches and carefuly chosen quants to my hardware, are just spectacular for, what I ever imagined from a local LLM.
You can argue that it's because of the hardware (128/96 ram/vram), but if current trends continue (turboquant, improving datasets for coding), we might actually get to a place, where it's all starts being very feasible. We're practically on a brink of having something that can replace a subscription *for some usecases*.
Thrumpwart@reddit
Check out the Apex quants. They are very good.
mrdevlar@reddit
Even if I code something I don't want to leave the machine unattended and without structure. I want a readable codebase. So far most of what I've tried to build has been built quite well using open source models.
miniocz@reddit
100+t/s? Not relatable.
FoxiPanda@reddit
100tok/s at prompt processing maybe, not output lol
alphapussycat@reddit
Use the 0.7b models.
TheQuantumPhysicist@reddit
100% true, but I'm betting that long term, open source models will win. I don't see how any company will have sustainable long term businesses selling compute to the masses.
StupidScaredSquirrel@reddit
You don't see how selling the same thing again and again behind an api can be profitable? Im doing stuff local already but I'm aware that I'm an outlier and the masses will buy lightweight hardware and use cloud. I don't want it but I can't stop it.
TheQuantumPhysicist@reddit
You're missing one important difference: Selling services like email scales, but selling AI power doesn't. To sell email to 1 million users, you scale by 10x. To sell AI to 1 million users, you need to scale by 100000x, if you're lucky. AI compute is conserved because it's proportional to energy. Email and other web services are not the same.
StupidScaredSquirrel@reddit
How are they not the same? There is an upfront cost and then a marginal cost. I get that the marginal cost is lower for email, but also nobody is paying a few cents per email either. All that matters is that they get a small spread per token.
TheQuantumPhysicist@reddit
If you think token generation is marginal, then you have a lot to learn. Proving you wrong is easy. If token generation was as marginal as you're claiming, your cost for using AI for a user would not be proportional to your token usage. Because, again, token generation (per model) is almost proportional to energy. That's the opposite of "marginal".
So, for companies like Anthropic to make money, they have to sell their services with 5x-10x the current price. The question is, in a future where models are much better and more efficient, will companies like Anthropic be profitable? In other words: will energy become cheaper faster than open source models become better? I doubt it.
StupidScaredSquirrel@reddit
Why are you condescending if you seemingly don't know what marginal cost means? Look up the wiki page of it and you'll see your comment makes no sense in relation to what I said.
Mayion@reddit
If you can't see it then that's on you man lol local is good and all the best engineers will always flock to those who give more money and companies will always seek profits. china is doing it for now just to gain market share, not out of the goodness of their hearts.
TheQuantumPhysicist@reddit
You're forgetting that AI companies are not profitable. This isn't sustainable. That's why many are calling it a bubble. My thesis is that what's sustainable will not look like what we're seeing today, and by then (10-20 years), open source models will have improved a lot.
Velocita84@reddit
It's peak...
_VirtualCosmos_@reddit
That's what you get for using a Q2
BAZfp@reddit (OP)
Let me just delete a few more models to make space
Cosack@reddit
Ok, but heads up, we'll only have it done by Q2 next year
minaminotenmangu@reddit
i recognise so much. I assume i'm still doing it right then.
BAZfp@reddit (OP)
And I love every second of it
Specter_Origin@reddit
Bro got a working rendered UI in one shot, what kind of hardware setup flex is this?
Monad_Maya@reddit
Even the smaller 9B Qwen can do that now.
Specter_Origin@reddit
🤦♂️
Monad_Maya@reddit
I might have missed the joke.
Specter_Origin@reddit
Indeed my fren!
overand@reddit
Literally any song from the Hackers soundtrack would be a great fit here. Except the sexy one.
1000_bucks_a_month@reddit
Oh yeah
lemondrops9@reddit
stop watching me!
rinaldo23@reddit
I can't decide between spending money into a cloud LLM subscription or air conditioning to cool down the room due to my crappy PC sweating when running gemma 4
gothlenin@reddit
C'mon, obviously the second option! No reason to think about cloud before even trying some liquid nitrogen...
Mister_Uncredible@reddit
I can't stop watching this.
gothlenin@reddit
I loled. Thanks :)
Toooooool@reddit
today's my bday and i literally got a prodigy vinyl as present, you couldn't had been more accurate
Jeidoz@reddit
What app for interaction with AI is used at 4th second of video? Copilot in VS code?
BAZfp@reddit (OP)
VScode Copilot, I might have been using the OAI API model extension to load from llamacpp
napkinolympics@reddit
Mess with the best, die like the rest
TrainingApartment925@reddit
The downloading is super fast for me. Have tons of storage, but sadly my gpus are shit...
pilkyton@reddit
Duddudududuuu duduuduud duuuu 🎵
The Prodigy - Voodoo People
PureSignalLove@reddit
holy crap I want it sooooo bad
BagelRedditAccountII@reddit