The Average Local LLM Experience

[-]

mrtrly@reddit

The kernel panic over static HTML got me, but the real move is accepting your local setup won't match production and building around that constraint instead of against it. Run smaller models in prod, debug with whatever fits in your garage at 3am.

[-]

Impossible_Style_136@reddit

The weights visualizer at 0:12 hits home. We’re all still obsessed with dense FP16/BF16 visualization when the future is clearly sparse activation. The "struggle" in the video is mostly a symptom of memory bandwidth bottlenecks—if we moved to ternary MoE, that "loading" bar would be an afterthought because we'd finally be fitting 80B+ params into 24GB VRAM without the usual 4-bit quantization rot.

[-]

Heavy-Focus-1964@reddit

mirror?

[-]

BAZfp@reddit (OP)

https://youtu.be/G_PaC7QUczs

[-]

Heavy-Focus-1964@reddit

thanks bro. This is hilarious, travesty to take it down

[-]

Standard-Potential-6@reddit

u/BAZfp, upload to your own page please?

🙄

[-]

RuiRdA@reddit

Way too accurate.

[-]

StupidScaredSquirrel@reddit

Very high quality shitpost. Thank you.

[-]

KindnessBiasedBoar@reddit

Game knows.

[-]

LocalLLaMA-ModTeam@reddit

Rule 3 - shitpost

[-]

Cool-Chemical-5629@reddit

[-]

ZeitgeistArchive@reddit

This is now the LLM anthem, Voodoo people!

[-]

bakaraka@reddit

The voodoo who do what you don't dare do, people!

[-]

No_Lingonberry1201@reddit

"How did you manage to cause a kernel panic with a static HTML?"

[-]

TopChard1274@reddit

"How many hardware... has a virtual computer?"

he's a real human, a real human bean🎶

[-]

bakaraka@reddit

Meanwhile you are furiously searching GitHub for awesome-repos to "hire" a full stack "agentic workforce" in order to "add security" and "stop fucking up" while screaming at Claude code that despite the code having never worked on anything other than your busted old gaming rig because no one can afford computers anymore that it's time to take another one for the team and add those eight new untested MCP servers because: it's about how haad you can GIT hit, and keep movin' FOWAARD!"

[-]

ZiddyBlud@reddit

I'm sorry, I realize I've made a mistake deleting SYSTEM32. I interpreted your request for writing html as making your computer unusable -- end of prompt dumbass good luck

[-]

sourceholder@reddit

The hallucinations are just that good.

[-]

sultan_papagani@reddit

[-]

jeremymeyers@reddit

Ok but wheres the boobs

[-]

ZiddyBlud@reddit

(.) (.)

[-]

Thrumpwart@reddit

Whats the weights visualizer at 0:12?

[-]

bcell4u@reddit

How you somehow managed to capture our experiences with local llm in this video, the world will never know.

[-]

popsumbong@reddit

What was that at 11 seconds

[-]

Medium_Chemist_4032@reddit

It can work. I was surprised too. I'm currently benchmarking qwens still:

unsloth/Qwen3.5-122B-A10B-GGUF MXFP4_MOE for agentic tasks with quick prefill (I'm getting... 1.4k)
unsloth/Qwen3.5-397B-A17B-GGUF:Q3_K_M for general software development chat

... and I have been a big skeptic of local models (still can't forget how badly the llama2 and llama4 burned the trust) so far. Those two models, with all the patches and carefuly chosen quants to my hardware, are just spectacular for, what I ever imagined from a local LLM.

You can argue that it's because of the hardware (128/96 ram/vram), but if current trends continue (turboquant, improving datasets for coding), we might actually get to a place, where it's all starts being very feasible. We're practically on a brink of having something that can replace a subscription *for some usecases*.

[-]

Thrumpwart@reddit

Check out the Apex quants. They are very good.

[-]

mrdevlar@reddit

Even if I code something I don't want to leave the machine unattended and without structure. I want a readable codebase. So far most of what I've tried to build has been built quite well using open source models.

[-]

miniocz@reddit

100+t/s? Not relatable.

[-]

FoxiPanda@reddit

100tok/s at prompt processing maybe, not output lol

[-]

alphapussycat@reddit

Use the 0.7b models.

[-]

TheQuantumPhysicist@reddit

100% true, but I'm betting that long term, open source models will win. I don't see how any company will have sustainable long term businesses selling compute to the masses.

[-]

StupidScaredSquirrel@reddit

You don't see how selling the same thing again and again behind an api can be profitable? Im doing stuff local already but I'm aware that I'm an outlier and the masses will buy lightweight hardware and use cloud. I don't want it but I can't stop it.

[-]

TheQuantumPhysicist@reddit

You're missing one important difference: Selling services like email scales, but selling AI power doesn't. To sell email to 1 million users, you scale by 10x. To sell AI to 1 million users, you need to scale by 100000x, if you're lucky. AI compute is conserved because it's proportional to energy. Email and other web services are not the same.

[-]

StupidScaredSquirrel@reddit

How are they not the same? There is an upfront cost and then a marginal cost. I get that the marginal cost is lower for email, but also nobody is paying a few cents per email either. All that matters is that they get a small spread per token.

[-]

TheQuantumPhysicist@reddit

If you think token generation is marginal, then you have a lot to learn. Proving you wrong is easy. If token generation was as marginal as you're claiming, your cost for using AI for a user would not be proportional to your token usage. Because, again, token generation (per model) is almost proportional to energy. That's the opposite of "marginal".

So, for companies like Anthropic to make money, they have to sell their services with 5x-10x the current price. The question is, in a future where models are much better and more efficient, will companies like Anthropic be profitable? In other words: will energy become cheaper faster than open source models become better? I doubt it.

[-]

StupidScaredSquirrel@reddit

Why are you condescending if you seemingly don't know what marginal cost means? Look up the wiki page of it and you'll see your comment makes no sense in relation to what I said.

[-]

Mayion@reddit

If you can't see it then that's on you man lol local is good and all the best engineers will always flock to those who give more money and companies will always seek profits. china is doing it for now just to gain market share, not out of the goodness of their hearts.

[-]

TheQuantumPhysicist@reddit

You're forgetting that AI companies are not profitable. This isn't sustainable. That's why many are calling it a bubble. My thesis is that what's sustainable will not look like what we're seeing today, and by then (10-20 years), open source models will have improved a lot.

[-]