What happens to local LLM if/when LLMs are no longer released for free?

Right, but surely we can agree that it's far from guaranteed that these companies will continue to spend vast amounts of money and effort to give us FREE models to run at home?

I think a lot of people are making the mistake of assuming this will sort of always be a thing, like Linux. Where even if Linus retires or gets abducted by aliens, it will continue.

But that's not true for these models. Linux will continue forever because nobody owns it, and anybody can get the source code and build it. That's not true at all for these models.

[-]

PracticlySpeaking@reddit

Your premise is solid, that these companies are not releasing leading-edge models out of altruism. There are solid business goals behind the releases, and we should enjoy them while we can.

Gemma-4 represents a shift in those business goals to more of a freemium model. Plus it offloads LLM work to local hardware while providers are constrained by compute.

There will always be local — just not frontier-level or smaller versions of the leading edge from top researchers. The current ones will never go back in the genie bottle, and enough expertise is out there to develop them. Though perhaps not at the pace or level we have enjoyed to date.

Linux will continue forever because nobody owns it

...and because there are people willing to support it (without asking users to pay).

[-]

lulzbot@reddit

napster.ai

[-]

Warm-Attempt7773@reddit

I believe there will still be 'open source' llms,or something like it, the same way we have Linux, or open-source libraries, or games.

As for the local AI inference scene, it's going to become much more prolific, with people using local inference without effort. Many of the manuals for equipment, instructions for doing things, and regular use of computers in general is going to move to a local agent that is secure, robust, and so usable that to say it's compelling would overstate the complexity. It just will be a way of life for people to have a local agent for shopping, bill payments, communications, etc. Large inference in datacenters will be for more generalized use and industry.

The enthusiasts and open-source engineers will still produce local inference machines capable of the same kinds of work the industrial ones do, but similar to today, with some limitation.

[-]

JohnBooty@reddit (OP)

I certainly hope so as well, but I'm not sure how optimistic to be.

Linux was famously started in Linus' dorm room on a cobbled-together PC. Most if not all open source libraries have similar origins.

LLMs are quite a different beast because training a nontrivial model from scratch takes a lot of hardware and electricity. The number of individuals with the means to undertake such a project is many orders of magnitude smaller than number of people who have the means to start traditional open source software projects.

[-]

Warm-Attempt7773@reddit

But it's not intensive to train a small model. That ability is going to expand. It won't reach the levels of industrial training, but enough to move from a readme.md file for your app to an interactive experience that people will simply expect. If you start with a .8B parameter model you can have a convincing character in a video game, for example, if you retrain with your dialog. Your 3-D printer can intelligently work with you on your print. And this is stuff that is ALREADY within reach. It's small, targeted, and useful.

Will your small model solve the next big math problem? HELL NO! Will it answer your question about elephants. No, not unless you trained it on elephants. But it will know the little thing you trained it for.

We'll still use the large providers and happily pay $20 per month for them. However, they'll use energy more wisely and be more of an MOE behind the scenes, have trillions of trillions of parameters, and suck up energy from nuke plants. Small models will be ubiquitous though.

[-]

Imaginary-Unit-3267@reddit

We also have to consider that with enough small models trained on enough task, you probably get general-enough intelligence by combining them - particularly using your own general intelligence to do so, heh. Maybe the future of AI looks like the Unix philosophy: tiny models that do one thing and do it well, and are meant to be piped together using plain text and JSON.

[-]

JohnBooty@reddit (OP)

But it's not intensive to train a small model. That ability is going to expand.

I'm currently clueless on how to do this and don't have a sense of exactly how intensive it is -- any pointers to good learning resources?

(I can search, obviously, but I was wondering if anybody has a particular personal recommendation 😅)

We'll still use the large providers and happily pay $20 
per month for them. However, they'll use energy more wisely 
and be more of an MOE behind the scenes, have trillions of
trillions of parameters, and suck up energy from nuke plants. 
Small models will be ubiquitous though.

I definitely think specialized models are the future, in some form or the other.

It feels like huge, general-purpose models are going to plateau to some extent. The compute/RAM required to train and serve today's frontier models during this goldrush is disrupting multiple industries. They can't simply keep 2x'ing or 10x'ing the resources required to train and serve these things at scale.

To overcome this we're probably going to need breakthroughs in either energy supply and/or semiconductors. I don't see those breakthroughs on the near horizon, and there's a chance that things could get far worse on both those fronts if the China/Taiwan situation takes a fateful turn.

So I think some sort of "MOE behind the scenes" seems inevitable. Maybe it's not even "MOE" within a single model. e.g. - Maybe instead of a single Opus 5.0 there are 100 different flavors of full-fat Opus 5.0, and your request gets routed to the best one for your specific query. I don't know how effective it would be but it certainly seems achievable if so.

[-]

Warm-Attempt7773@reddit

I don't know why you're downvoded.

"Not intensive" means you can train a small model on 8GB of VRam on a laptop now. Yes, really small.

[-]

Southern-Chain-6485@reddit

You mean $200 per month, right? At some point the AI developers will have to turn a profit... and they may not be able to.

[-]

Warm-Attempt7773@reddit

General public will not endure more than $20 (or so) per month. Yes, high use business use will have higher costs as they do now. When I say $20 I mean me, as a causal user, asking ChatGPT to lead me and my wife on a trivia game (which we do and it's great fun).

However, there will be local models that will run just fine for specific use cases, as we are just about doing now.

[-]

Southern-Chain-6485@reddit

And what's going to happen when companies try to run a profit is that you'll use a smaller model, local or even api, for your trivia nights. Companies, in the meantime, will build their own server racks, if they don't have it already, and deploy the last open weight free models. And maybe pool together to continue to finetune them or even develop them with updated knowledge as the years go by, because it's going to end up more cost efficient and gives them privacy.

[-]

mycall@reddit

I want to hear Bill Gates say "Nobody needs more than 128GB RAM"

[-]

Awwtifishal@reddit

At that point we can probably pool together resources and people with enough know-how to train new models or to update existing ones. Perhaps we will develop/improve systems for distributed training across many volunteers' machines.

[-]

ObsidianNix@reddit

Like this? https://docs.psyche.network

[-]

PrettyMuchAVegetable@reddit

I'm in, thats awesome. Unused to do Seti @home and folding @ home, I've been waiting for this kind of distributed effort to appear .

[-]

tiffanytrashcan@reddit

AI Horde is similar for inference too (without blockchain signup, it uses an internally tracked "kudos" system) running KoboldCPP for a text gen worker.

[-]

Imaginary-Unit-3267@reddit

Honestly most people on this entire sub should be in the Horde.

realizes I haven't joined yet

Well, shit... now I have to, since I've mentioned it...

[-]

tiffanytrashcan@reddit

Get Kobold from github - Google promotes fake download sites! (that we should probably be reporting..)

[-]

mxforest@reddit

PTSD of me doing folding at home religiously being completely oblivious to BTC existing.

[-]

Consumerbot37427@reddit

Ouch.

I had folding@home going 24/7 back in 2009. Even worse, I had read on slashdot about bitcoin, but made the conscious choice to stick with folding@home for charitable reasons.

As much as I kick myself for that decision, I tell myself that I probably would've sold it when it hit $100 or $1000, so I'd just be kicking myself for a different reason.

[-]

Thrumpwart@reddit

I remember reading about bitcoin in 2010ish with ample hardware at my disposal and thinking “this will never work”.

D’OH!

[-]

jiml78@reddit

Don't feel bad. I got in bitcoin super early. Mined coins on a macbook pro laptop. Had about 200 bitcoins overall. Sold them all when the price hit $6/coin. Thought there was no way they would go higher.

[-]

No_War_8891@reddit

me too buddy

[-]

Subject_Mix_8339@reddit

LOL

[-]

Ambitious_Worth7667@reddit

I started mining at home when it was \~$700...and I was kicking myself when I heard the Pizza delivery story...

[-]

Quartich@reddit

Glad that I share this experience

[-]

ShutUpAndDoTheLift@reddit

Little concerned about the immediate spiral iconography. Is this at at link to spiralism?

[-]

fatYogurt@reddit

How is over internet even possible? I thought gpu cluster required super fast network

[-]

TamSchnow@reddit

No idea for how this project does it (probably documented somewhere, need to look it up), but folding@home sends “tiny“ jobs to your machine to do them.

I think it might be done in a similar way.

[-]

JohnBooty@reddit (OP)

Also, I assume this is perhaps more valuable for training than inference?

[-]

TamSchnow@reddit

This is (from the stuff that I read in their documentation) distributed model training.

[-]

Andozinoz@reddit

Spot on. There's enough open source knowledge, code and appropriately licensed technology. The cat is out of the bag now.

[-]

UnionCounty22@reddit

Until that project does a 180 once they make it

[-]

honato@reddit

As is tradition.

[-]

Substantial_Swan_144@reddit

Tell me how that's going with open source GPUs.

[-]

Awwtifishal@reddit

Hardware and software are worlds apart. One can't just create an ASIC in their basement.

[-]

Gipetto@reddit

Exactly. Where are we going to get compute power when it is all being gobbled up?

[-]

Awwtifishal@reddit

That's why I mention distributed computing.

[-]

neopolitan77@reddit

Do we even need distributed training? People are running finetunes on their local hardware already, is a model update not on a similar complexity scale? I'm more concerned about the training data than the compute, everybody building their own scraper at home won't fly, and open sourcing data sets will run into all kinds of legal hurdles.

[-]

stoppableDissolution@reddit

Pretraining requires enormous amount of compute. And, yea, data, but we could crowdsource some kind of fully synthetic dataset, I guess.

[-]

neopolitan77@reddit

But does simply updating a model (moving the knowledge cutoff forward) even require full retraining? Can't we treat it like a finetune?

[-]

stoppableDissolution@reddit

Not in any meaningful capacity, at least in pure transformers. Training new data overwrites previous data in quite a random fashion, leading to significant overall degradation, unless you have the original dataset to "remind" it about things.

[-]

neopolitan77@reddit

Thanks. Sounds like what we'll need are fully open source models, not just open weights.

[-]

finah1995@reddit

Not original commentor.

Those exists like Allen AI Olmo or Allen AI SERA models.

[-]

t_krett@reddit

Has anyone looked into distributed destillation attacks?

[-]

jcdoe@reddit

The future of running local llms will lead to us making our own models. We won’t be tied to big corporations for long.

[-]

Ok_Warning2146@reddit

By then probably it is cost effective enough for the open source community to build their own llm

[-]

Potential-Fan-6148@reddit

Open source community forms. Which should be starting now anyway. Form a foundation and start pooling resources to all collective contribute to a standard model.

[-]

mhb-11@reddit

Community models will still be supported by the community. We're the community.

[-]

cutter89locater@reddit

As long as still competition among those ai big coops.
As long as green/red team put their acceleration tools on open weight models.
We're good.

[-]

Puzzleheaded_Base302@reddit

nvidia will continue to release open-source model. it is in their financial interest to do it.

[-]

JohnBooty@reddit (OP)

That’s an interesting assertion. It’s definitely in their interest to develop their own models as a hedge against unforeseen events like a major customer defecting to AMD or something. It’s a smart hedge, and probably gives them a great training ground to dogfood their own hardware and software stack.

I’m less convinced that continuing to provide free consumer hardware-friendly models will be in their best interests forever.

They clearly prefer selling to the enterprise, as the margins are insane right now compared to the consumer world. So I don’t think “release consumer friendly LLMs to drive sales of consumer hardware” is a big motivator for them. At least for the time being.

[-]

Irisi11111@reddit

No need to worry about that. LLM vendors will only charge you if you've established a successful, profitable business, and their fees will be based on your profits. There's no incentive for them to cut off a client if you haven't yet reached that stage.

[-]

djparce82@reddit

I wonder if companies eventually will sell premium LLMs for local use like software? I know the current cloud subscription is essentially letting you hire them.

[-]

JohnBooty@reddit (OP)

I could kinda see that. But it’s hard for me to imagine how they’d stop you from just sharing the model.

Other open source companies have found ways to make money even though their source is available by selling support/consulting. I wonder what that might look like in the LLM world.

[-]

El_Danger_Badger@reddit

By the grace of greed, these guys are giving it away right now to get us all hooked.

But there will be a time when that door slams shut. Read Trump considering approving all new frontier models.

Stockpile externals filled with models, ahead of the open weight apocalypse. One day this stuff will no longer be free. Make sure your system has a strong ingest pipeline built in to keep it learning. Just because the model is older doesn't mean the system is bad. A good system should be able to hot swap models anyway.

[-]

usa_reddit@reddit

What about when using or downloading non government approved AI models is illegal? I think that is the more realistic future. Use the models we say you can use with the biases and guardrails we approve.

“I’m sorry Dave,I can’t do that.” -HAL 9000

[-]

JohnBooty@reddit (OP)

Yes, absolutely. That possibility is one reason I'm interested in this.

I highly doubt they'll go after individual users. I think they'll tackle things on the supply side.

I also think it's 100% likely they'll force LLM hosts to track user logs and supply them to the government on request. If they're not already doing that.

[-]

usa_reddit@reddit

So soon new models will be bit-torrent and dark-web only. Lovely.

[-]

DonkeyBonked@reddit

There will always be new companies trying to create their own LLM's and share them so they can get the data to improve them. However if they didn't we were stuck with what we have right now people would still fine-tune and create (q/re/si/)LoRAs for them to keep them updated.

The open-source ecosystem is an invaluable asset bigger than any one model using it, within that ecosystem will always give rise to those who will need to use it and those who will finish using it for themselves and determine they will no longer contribute.

The need for transparent open source models is not going away anytime soon. It's not even about us either, but the genuine corporate need because closed cloud models are a data privacy nightmare and often a data leak class action lawsuit waiting to happen.

[-]

rdkilla@reddit

nearly all open models are already fine tuned and reliant on closed models, especially if they say they aren't but don't show the receipts

[-]

superdariom@reddit

Is this like when Claude claims to be Deep seek?

[-]

AmoebaDue6638@reddit

The weights are already out there. Even if every lab stopped releasing tomorrow, the community would keep fine-tuning and distilling what exists. RAG tooling solves the stale knowledge problem better than new base models would anyway.

[-]

Jayfree138@reddit

Now that the technology is out there the odds of new models not being released are practically zero. Even if it became illegal there would be pirate models. The technology will only get better so ever if Google and Alibaba stop others will continue. It's not something anyone needs to worry about.

[-]

notAllBits@reddit

Distillation and "piracy"

[-]

jld1532@reddit

Universities are building out GPUs. I'd argue we're about to get new sources of models rather than fewer. I strongly suspect that the freeware ecosystem built at universities that has slowly eroded for-profit software in fields like statistics, computer science, and geography will have an impact on AI in the very near future.

[-]

condorthe2nd@reddit

I like your optimism, I hope you are right.

[-]

darktotheknight@reddit

https://i.redd.it/m6d2gpnc7y1h1.gif

[-]

MeganDryer@reddit

LLMs aren't that hard to make, and we're increasingly getting into the position where people can homebrew them with consumer parts.

In 5+ years all sub 300b models will be free, because there will be no point in keeping them private unless you have very specialized datasets.

[-]

superdariom@reddit

We'll be vibe-training soon at the rate this field seems to be becoming accessible

[-]

jacek2023@reddit

Mistral Nemo from 2024 still works and people use it. You can use local models forever - with new, better software.

People who hype 1T models here will just hype cloud models, what's the difference, they use them in cloud anyway (or don't use them at all).

[-]

DeepOrangeSky@reddit

Mistral Nemo from 2024 still works and people use it. You can use local models forever - with new, better software.

I don't think we are at a level yet even with GLM5.1 where it will be able to do whatever awesome stuff the models a couple years from now will be able to do, let alone with Nemo, or with Qwen3.6 27b or anything like that, even with the best software and harnesses, either.

These models are going to be like Pong and Pac-Man compared to Elder Scrolls and Counterstrike (or a significantly bigger gap than even that) by a few years from now.

Now, to be fair, I don't think we're ever actually going to really see local models on par with whatever those insanely powerful models are going to be a few years from now, locally, because they are going to be so powerful in the hands of people with bad intentions that the governments are going to clamp down harder on it than anything we've ever seen before. As in, it'll make trying to smuggle a few tons of heroin or fentanyl or coke across a border seem like a joke, compared to how they are going to be about local AI models pretty soon, if they realize a d-bag in his garage can wipe out the entire human race with however strong the local AI is a few years from now. I think they are going to get rid of VPNs, and make it life in prison without the possibility of parole, or death penalty, for anything to do with hosting or downloading local AI, and huge teams of governments monitoring everything (and using the strongest AI on the planet, which will be quite strong by that point) to help them do so, too.

That's where I think this is headed.

Or, who knows, maybe they'll figure out some way of making downloadable models that are somehow "impossible" to decensor/jailbreak, and we get to have those ultra strong AIs without the huge government lockdown on all the local AI somehow, if they can make it not be able to do anything super dangerous. But, I doubt it. They'd be too worried someone would crack it, or figure out some way of using the non-jailbroken version to still wipe everything out somehow, probably.

Anyway, I understand the sentiment, and it is nice that we get to have the current stuff forever, but, I'm not sure it will mean much in the grand scheme of things, given how much stronger AI is probably going to become in the next few years. There will still be a small niche of people who enjoy the "quaint" old models in the way some people still enjoy playing tic tac toe, or Pong. But... yea I think it'll be a whole new world out there.

[-]

Lissanro@reddit

I run Kimi K2.6 Q4_X and other large models locally daily, and using cloud would not be even a viable option for me, neither for work (due to restrictions on sending to a third-party) nor personal purposes (due to lack of privacy).

By the way, buying hardware to run 1T models is used to be not that expensive, even just about a year ago, like $1600 for 8-channel 1 TB DDR4 3200Mhz RAM or about three-four times more for 12-channel DDR5. But thanks to RAMpocalypse building a new rig for running large models became way more expensive, to the point that for a new rigs it may make more sense to go VRAM-only than high RAM+VRAM.

[-]

iportnov@reddit

Qwen 3.6 is significantly better than 3.5, for the same number of parameters. One would hope that 3.7 will be significantly better than 3.6... Cloud model most certainly will be. About smaller models, all we have is a hope that they will be released.

[-]

cibernox@reddit

Also, China has a particular interest in openweight LLMs being widely available. Since the US is ahead in the quality dimension, the only reasonable geopolitical movement is to choke the US AI effort on the side of price. With enough “not-SOTA-buy-good-enough” models they can many companies to not pay the premium and cut profitability of big US labs to the point of drying investors money before break-even happens.

If LLM and software becomes a commodity, china has the upper hand if the manufacturing side. And one that is a lot harder to revert for any western coin

[-]

zxyzyxz@reddit

But we already see them not releasing the big models like 100 B parameters ones but only the smaller ones

[-]

Legitimate-Dog5690@reddit

It's relentless at the moment, Qwen3.5 was Feb, Qwen3.6 was April, new Deepseek Flash, Kimi K2.6. We're not even half way into the year, I can barely even remember what I used to use last year, it's changed so rapidly.

[-]

IrisColt@reddit

I can barely even remember what I used to use last year

Qwen 3 VL, heh

[-]

cibernox@reddit

Well, let’s not loose our shit yet. We’ve been pampered with one open near SOTA ever 6weeks for so long that a 3month hiatus is like 40 years through the desert.

[-]

CreamPitiful4295@reddit

What if the surveillance is inside the model?

[-]

Legitimate-Dog5690@reddit

I'd say the opposite, local models don't have any way to send out data. OpenAI, Anthropic and Google openly admit to farming your data.

[-]

CreamPitiful4295@reddit

Yep. I’ve begun reducing my reliance on API LLMs. Bought a 5090 to keep my 3090 company. They get 80% of the work done. qwen3.6 is very good and corrects Opus4.6 all the time. The offline does all the analysis and planning. Claude is the implementation when it’s existing code. Otherwise I’ll let qwen take a crack at it and Opus can correct. They make a good team.

[-]

cibernox@reddit

Surveillance from which country? Both seem equally plausible to me.

[-]

CreamPitiful4295@reddit

I know the API is easier but the premise is API are vulnerable. And, I’d say Chinese.

[-]

Legitimate-Dog5690@reddit

This! They have the power to make it so these huge American corporations have blown billions for nothing. Those billions were spent on getting China to build ram, they get to sell it to consumers next.

[-]

Foreign_Yard_8483@reddit

Os que conhecemos hoje, não vão deixar de serem ofertados gratuitamente (minha visão). Mas perderao o sentido e valor intrinseco: sera como vender compactador de arquivos ou biblioteca grafica: sera mais por estratégia e preferencia ao ecosistema que por modelo.

Os modelos realmente poderosos serão os que neste momento estao documentando os procesos de trabalho, arquivos e dados anonimizados de todos que usam openclaw. agent copilot 'premium' e claude qualquer coisa. Eles serao diferencial competitivo pois oferecerão soluções novas para você e que ainda não são conhecidos das outras pessoas. Nesse ponto, a destilação não sera mais viável, por bloqueios e camadas de complexidade e custo.

Também haverão os modelos de smart-twins para cidades, governos, vigilância.. mas é um degrau a frente.

[-]

Pleasant-Shallot-707@reddit

NVIDIA will alway release open source models and I suspect Google, Microsoft, and Apple will always have local models they release (hopefully with open weights).

I also assume we will be far enough along in the world of AI that there will be a “Linux if AI” if things start locking down. There’s too many knowledgeable researchers who want this stuff available and widely used and understood for that not to be the case.

[-]

Ornery_Hall@reddit

maybe go with similar path as Linux, free to use for some release, then License for Corporate.

[-]

ObjectiveActuator8@reddit

Someone group comes up with the idea of building a non profit and releases a tool that allows users to lend their homelab computing to train newer open source models. We make our own.

[-]

alecmuffett@reddit

Be aware that if the history of cryptography is anything to go by there will be a few years if not a couple of decades where various governments attempt to brand all such efforts as illegal due to inability to regulate them

[-]

bennmann@reddit

As long as there are Apache 2.0 datasets and people who believe in free and open datasets, there will be models trained on them.

Even copyright is "only" a lifetime scale issue. Not to mention the US Freedom of Information Act at the government level, should the US national labs get their act together. Your grandchildren should have better data than you. Your grandchildren will have better models than you.

The new first world dream is that our children will have a better life than us, in the form of safe and effective data and privacy and robots.

Also, databases get leaked sometimes.

[-]

nopanolator@reddit

What happens to local free LLM if they are only released at sizes that don't fit on consumer hardware ?

I think that it's a more relevant angle to take the equation, just an opinion.

[-]

JohnBooty@reddit (OP)

Hmmm. Definitely interesting. What do you think would be the motivation for Google, Qwen and the others to go in that direction?

[-]

ObsidianNix@reddit

Make this popular: https://docs.psyche.network

[-]

teleprint-me@reddit

Eww, it requires solana which is tied to ethereum.

[-]

shoeshineboy_99@reddit

A hypothetical question -- with the presence of proprietary software, has the number of open source projects come down??

[-]

freia_pr_fr@reddit

Right now, it doesn't really make sense to attempt building SOTA open-weight models from scratch when the industry is competing on this. But if the industry stops, the academic sector can and will likely continue. It won't be as good as the latest commercial models, probably, but it will keep getting better in my opinion.

[-]

dobkeratops@reddit

heh that's why we're all frantically downloading the very latest and archiving ike our freedom depends on it .. i think we'd still get progress with fine tunes and frankensteins ? there might be some progress with federated learning ?

[-]

Squidgical@reddit

We build better orchestration tools to get our gains rather than using newer models. We're potentially at the point where we should already be doing this as a primary means of getting better results instead of more hardware for more params.

[-]

Scared-Tip7914@reddit

To be fair you can use these models for many many years to come as long as you give them access to the open web. The trained intelligence doesn't really get outdated, you just need the correct harness and web search tools.

[-]

DeProgrammer99@reddit

To a certain extent, yeah. Not even Sonnet can get the syntax for my own super-simple query orchestration language consistently right, given decent documentation of it and an example. I tell it "src db to someTable" and it gives me "src db to memory:someTable"; I tell it a query containing a list parameter is run as multiple iterations with a subset of IDs injected each time, and it gives me a "NOT IN (@ids)" clause...

Of course, I can instruct it with even more detail to fix those unexpected misuse cases, but the more knowledge it's missing (e.g., because we're up to C# 25, but it was only trained on C# 13), the more trial and error it'll take to get things working, and the more context it'll need dedicated to the basics. Given that 64k context is enough for significant intelligence loss, the adaptability for broad domains is pretty limited...without fine-tuning, at least.

[-]

CYTR_@reddit

This is the case for agentic coding. But I think that quite a few tasks without pure agentic behaviors can still be automated.

A deterministic workflow that integrates LLM as a controled stochastic module, like Windmill, allows us to mitigate many of these risks. By constraining the agent and its output with GBNF, command prohibitions/attributions, recipes/examples for output with dynamique context enrichment... (and who knows what other ideas we might come up when u think of all the possibilities) u can overcome quite a few things (poor generalization/intelligence of the model and training that is too fragile) while putting in place safeguards for dangerous/slop content. In the case of local models, you can even add LoRa quite easily (with certain targeted adapters depending on the modules if you like sleepless nights).

But it's true that we lose the ease of use of the OpenCode/CC-style agentic and the associated freedom with .md prompt system. It might not be suitable for software development yet (except for maintenance/ticketing? I don't know... i'm not a developer lol). But for some data processing pipelines, this is much better than letting a model call tools on its own.

[-]

ColonelKlanka@reddit

You should enable the context7 mcp server to your harness - its specifically made to make sure the llm gets the most upto date apis for a framework. Once enabled, you just add a statement to agents.md or claude.md such as 'Always use context7 when looking up framework apis or syntax'.

You can also add it to the prompt too 'use context7' - i found it a lifesaver

[-]

JohnBooty@reddit (OP)

 access to the open web.

Obviously, to an extent, this already works well today for some kinds of queries.

But I'm way less optimistic than you on this one when looking at the long horizon...

While existing LLMs are famously trained in part on teh interwebs, that training is still curated, giving much higher weight to more authoritative sources, and also including actually-authoritative sources like peer-reviewed research, etc.

So yeah, in 2030 and 2040 we'll still be able to use 2026 models augmented with web search to get up-to-date data... but the more the models drift from reality, the more they'll need to rely on web search... and web search has been absolute slop for a very long time.

[-]

Scared-Tip7914@reddit

I agree with you on the long horizon completely as well as web search, as a preventative measure we as the open source community will need to come to together, someone else already put https://psyche.network/runs, what these guys are doing is probably a good direction for the future for local llms.

[-]

Outrageous_Bug_669@reddit

Concrete data point on the "fine-tunes will fill the gap" thread: I've been doing Qwen3.5-27B bf16 LoRA fine-tunes on a single Strix Halo mini-PC (Ryzen AI MAX+ 395, 128 GB unified) for the last 6 months on a narrow domain. \~900 training chunks, \~12.5 min/step, multi-day runs are routine. Total hardware cost \~$2400.

Point being: if model supply froze today, the base capability of Qwen3.5-27B / Llama 3 / GPT-OSS 120B + accessible fine-tuning capacity at this hardware tier = community can keep specializing them for narrow domains at a per-team level indefinitely. That's not "all of AI" obviously, but it's a meaningful slice. The thing you can't easily replace with fine-tunes is reasoning depth on novel out-of-distribution tasks — that needs new pretrains, full stop.

u/N1ckFG's point upthread about unified RAM is the under-discussed factor IMO. The shift to APUs with 128 GB+ shared memory is already happening — Strix Halo, Apple Silicon, eventually mainstream desktop boards. That's the hardware curve that puts serious local inference within reach without datacenter prices, and it's mostly independent of whether new SOTA models keep dropping.

[-]

janusr@reddit

You’ve been doing fine-tunes of a model that was released 2 months ago for the last 6 months?

[-]

Outrageous_Bug_669@reddit

The phrasing "the Qwen3.5 family" was confusing — the family is Qwen, and 3.5 is one generation in it. Should've said "the Qwen series." But the timeline itself is accurate.

Hybrid-GDN architecture work specifically (the eager-attention requirement, FLA Triton kernels) has been the second half of those months. The first half was Qwen3-32B which is a standard transformer and much easier on this hardware.

[-]

ambient_temp_xeno@reddit

The knowledge cut-off will not be that big of problem compared to the models just being outdated in terms of brains.

Just look it as a glass half full: we could've ended up with just the Llama 1 leaks in another timeline.

[-]

BlackBeardAI@reddit

Qwen 3.6 27b warped time

[-]

lightskinloki@reddit

We will just make our own models at that point.

[-]

Cherubin0@reddit

I am more worried about laws changing. If the government decides that LLMs need a license to train on data, then the big corporations will all cross license their IP (like Springer with their many papers) then OpenAI etc. will keep going while open weights models are basically banned.

[-]

ByteDinosaurs@reddit

the knowledge staleness problem is more solvable than the capability staleness problem

good RAG and retrieval tooling gets you current facts forever. a 2026 model that can search the web and query a knowledge base handles "what happened in 2029" fine

what you can't patch with retrieval is reasoning capability. if a 2031 task requires skills that 2026 models just don't have, no amount of context stuffing fixes that

the more likely scenario though is that open weights don't actually dry up. the incentives for labs to release are too varied — research credibility, developer ecosystems, geopolitical flex. china alone has enough motivation to keep releasing regardless of what openai does

the real risk isn't zero new models. it's a quality gap that slowly widens between open and closed until open weights are a generation behind in ways that actually matter for serious use cases

[-]

ProfessionalSpend589@reddit

Why would corporations release for free the means of production to the peasants?

[-]

catplusplusok@reddit

Because most corporations have fields that need to be plowed rather than being in the business of selling plows. So it makes sense to release blueprints for great plows or form a consortium to develop these jointly with others who have similar needs.

[-]

ProfessionalSpend589@reddit

In your analogy the plows can’t plow by themselves. While LLMs can (with enough skill and proper harness - as people say).

[-]

JohnBooty@reddit (OP)

Yeah, exactly. There aren't going to be corporation-sponsored free models forever. We're seemingly only getting the Chinese ones as part of an effort to undercut big American players. It's not because they have our best interests at heart. It's a very temporary situation.

[-]

zerubeus@reddit

Decentralized, non-profit training initiatives are something the local LLM community will eventually need to seriously invest in, instead of placing all its hopes in companies like Alibaba and Google. Large-scale, donation-driven open-source projects have existed successfully for decades — Linux, Blender, and Bitcoin are proof that community-driven ecosystems can build and sustain critical infrastructure at a global scale.

[-]

Due_Duck_8472@reddit

Free models are coming to an end - we'll stop delivering in Q3 where a subscription model will be introduced

[-]

Shoddy-Tutor9563@reddit

the quality of current open models is already above an beyond of what one could dream of. Take GLM 5.1 as an example. Even if companies stop to release new open weights models, this alone will be enough. We just need for capable enough and cheap hardware to run it locally :)

[-]

Double_Ad9821@reddit

We all have to come together to keep the local alive.

[-]

JohnBooty@reddit (OP)

That's where I'm unsure.

How, exactly, can this be done in a grass-roots way?

We need two things. We need the LLMs themselves, and we need the hardware to run them. The bar to entry to produce either of those things is very very very high.

Training a capable LLM takes a lot of time and money. And of course, given the rate of progress of RISC-V, the idea of a grassroots GPU capable of running this stuff seems like an utter pipe dream.

I don't see any of the hardware companies prioritizing consumer GPUs with large amounts of RAM for a long time, maybe ever. The enterprise market is so much more lucrative.

[-]

ea_man@reddit

You said that: enterprise big general models take a lot of resources, local wants small and fine tuned models.

OFC if everybody will do inference we will have cheaper consumer hw that does what the market wants.

[-]

Party-Special-5177@reddit

In all honestly, as long as LLM inference remains heavy, then the ‘hardware to run them’ is the hardware to build the LLMs. This hardware apparently has a very long shelf life (looking at the 3090 users here), and there aren’t really any ways that any actor could put this ‘cat back in the bag’.

Everyone’s excited by improvements in inference speed, but all we need to secure our supply chain is improvement in training token efficiency. Muon was huge for that, so was swiglu.

There are other improvements in the works, but we’re already seeing the new chinchilla optimal point being around 10 tokens per parameter, down from 20 just a few years back.

[-]

ea_man@reddit

With hw getting more powerful and ~~cheaper~~, SOTA model getting more powerful it's gonna be easier to distill small models for the local folks.

We'll use cloud model to build small local models :)

[-]

umbrosum@reddit

Knowledge is not really a problem with web search. even today, an agent workflow with web searches can provide more up to date knowledge than any closed source model

[-]

_mayuk@reddit

We should a decentralize shared network under web3 protocol so we can’t mint a token while giving compute to the Net … for inference and training ourself ….

How long are we gonna wait ?

[-]

DiscipleofDeceit666@reddit

We train our own. We only need 1 dude stepping up

[-]

El_90@reddit

If reasoning is good, and speed is good, the only thing missing is knowledge.

That can be solved with harness, tools, mcp, etc

I.e.....does the model NEED to continue, or is there a good enough?

[-]

Gullible_Response_54@reddit

I can imagine that academia might train their own models, gwdg in Germany is already hosting existing models. With super computers such as Lichtenberg 2 it is theoretically possible to train & release open source models that should be good enough for most use cases 🫣 Also: LLMs still do not have a moat

[-]

BidWestern1056@reddit

I'm building the infra for continuous fine tuning and self improvement such that with current models we would be fine for this scenario https://github.com/NPC-Worldwide/npcpy

well also be releasing more models in hf too, local user owned models arent going anywhere anytime soon https://hf.co/npc-worldwide

[-]

Kahvana@reddit

Don't think it will be the case, Google has been consistently releasing new models since their creation of the transformer architecture, Mistral's whole business model depends on making LLMs and finetuning it for custumers, and deep interest from China for their own political reasons.

Having that said, entertaining the idea: it's like reading an encyclopedia from the 70s: It's perfectly usable for general and historic concepts, just not for modern computer architectures and such.

You can use openzim-mcp to store a local copy of wikipedia and use that as source of truth in case websearch stops functioning. You can finetune current models with up-to-date knowledge to enhance their capabilities specific to RAG for modern concepts.

Personally for me, not much will change. I enjoy writing in ISO C99 and writing C# with the restrictions of .NET 2.0 Subset from Unity 5.1, both of which Qwen3.6-27B can do quite nicely. For general tasks and roleplay, Gemma4-31B remains king within it's size range. The only thing that might change is my savings going towards bigger VRAM GPUs.

If this would be it, I am happy with what we got and gladly keep using it / buying better equipment for it. Personally I hope we get at least one, maybe two years of flagship open-weight models from most companies.

[-]

robertpro01@reddit

cry at the corner.

[-]

graypasser@reddit

I highly doubt closed ecosystem has proper advantages over open source ecosystem, honestly.

[-]

Ledeste@reddit

The great part of the big tech working on LLM are not the weight themselves, but how they achieve it. And this wont be lost. The open weights are almost side effect :p

People will still be to use the release one (and we saw how much optimization we can get from a same model) and build new one (but slower)

[-]

a_beautiful_rhind@reddit

Oh.. we'll be screwed. There'll be regulatory capture so your distributed efforts won't be allowed use of copyrighted data and subject to "testing". Anything too good isn't passing the test.

Probably closer to 5-10 year timeline, imo. While everything gets worked out. But this assumes there's not other world events that interrupt progress in general.

[-]

Fabulous_Fact_606@reddit

The cat is out... We have this new tech that was science fiction 5 years ago. There is a cut off date of the LLM knowledge. Don't let that deter you, because right now it can do internet search for context, parse arXiv papers, run mathematical theories and proofs. In the hand of the right people, Local LLM becomes a force multiplier for individual research that used to require an institution behind you. In essence, I'm grateful for what we have now.

[-]

yes2matt@reddit

I think mixture of experts becomes mixture of models. To solve a real world problem as a human would, or better, there is a multi-stage process with rubricks (e.g. SWOT, business plan, applixation design, marketing plan, budget, expectations and metrics plan) that can be a 32b model. Then for each of those pieces a second pass, maybe with a different model(s), then for software programming a model and for media generation a model and for writing and audio a model.

All of which we have already pretty well.

[-]

anzzax@reddit

same what happened to java when oracle decided to put higher walls

[-]

kulchacop@reddit

Way too many people are starting to use local models.

Once the industry consolidates and big labs stop releasing open models, I really hope that the tinkerers find a way to circumvent catastrophic forgetting and directly update the weights, or even graft additional weights to add new knowledge.

[-]

PayMe4MyData@reddit

LLMs with hydraulics? I would run in the opposite direction!

[-]

Virtual_Monitor3600@reddit

Constrained llms with hydraulics, is probably okay no?

[-]

EbbNorth7735@reddit

Nvidia will continue to release models

[-]

noctrex@reddit

There are open source models that have been trained on a distributed compute, like

https://www.primeintellect.ai/blog/intellect-3

Compute won't vanish. On the contrary, all these huge compute clusters that are built out today, at some point, we'll end up on eBay for us to grab for our home labs, like the V100 is now.

At some point in time, we will be able to train on our own distributed open clusters.

[-]

Confusion_Senior@reddit

in the image gen community SDXL still has new finetunes until today. What matters the most is a good architecture. We probably would start to see Loras for LLM's more frequently

[-]

More-Curious816@reddit

yeah, I think we can even train small models like 8 and 12 if the hardware keep double in performance and the previous gen becomes cheaper to rent in cluster. also google allows independent researchers to train on their servers as long as it's for research purposes and not commercial.
another thing is that this community can form a non profit organization that can work as business (which can purchase GPUs or rent clusters easier than individuals) and pool donations to train new models.
new startups will emerge and they will release their first models to compete and gain market foothold like the previous companies did and the cycle will continue

[-]

Miriel_z@reddit

The LLMs were not created by private sector alone, plus China can always help if it happens with their free models. It is more like cat out of the bag. Retraining can also help. So very much not likely to happen. On top of that, hive mind is stronger than any private company. You can try to buy everyone, but it is an enormous effort when so many tools are available.

[-]

baksalyar@reddit

Moreover, we already have the Hugging Face full of custom model merges, mashups, and SOTA distillations.

States, institutes, and non-profits are stepping up: look at AI2 (OLMo) or the UAE (Falcon) funding true open-source FMs from scratch.

Pre-training is getting cheaper every year — what requires Big Tech today will be trained by a decentralized collective tomorrow (we had Psyche mentioned twice here).

So I hope that when we have an acute need, we'll see a booming abundance of new models of different calibers.

[-]

TheRealMasonMac@reddit

I doubt Google will ever stop releasing LLMs. They're fairly friendly to the developer community and it helps them capture the market.

[-]

UnbeliebteMeinung@reddit

There is enough human knowledge in the open source sphere to build up new training systems for models that would be able to compete with the current qwen3.6.

We still need to cluster our distributed hardware and this would also be possible somewhat. Most likely is that we just rent a collosus from elon for the sake of freedom tbh

[-]

stoppableDissolution@reddit

Main issue is data. Curating good data is difficult, curating good data at scale of trillions of tokes requires coordinated effort of a lot of people and experience. And, yea, compute.

[-]

UnbeliebteMeinung@reddit

We will do it like the chinese ai labs do. Borrow data from openai and anthropic (they steales our data so its ok).

[-]

stoppableDissolution@reddit

Well, yes, but it still requires a lot of trial and error (and great minds) to figure out what kind of data you need. Majority of open datasets on HF are ~~useless garbage~~ something like "hey claude respond to a question", and you cant build a good model just with that.

[-]

UnbeliebteMeinung@reddit

We have enough man power to distribute that work to us real humans tho. I guess that will not be the problem.

[-]

stoppableDissolution@reddit

I sure hope so!

[-]

catplusplusok@reddit

Model training can be easily distributed. As a long as a group of users (commercial, education, individual) see a value in a model, they can cooperate to keep it updated

[-]

synn89@reddit

We'd probably see open source effort shift into 30B and smaller models, fine tuning existing ones and a few projects full training new base ones as architecture changed. Within that, I'd also expect new training harnesses for whatever hardware is available in 3-5 years.

[-]

Far_Course2496@reddit

Moore's law is dead-ish, but computer parts will still get cheaper over time. When we all have 1tb of vram, we can resurrect old big models with fresh fine-tunes or architectural updates or distills of future frontier models.

[-]

Maleficent-Ad5999@reddit

1tb of vram 😭😭😭 when will we ever get there

[-]

Zeeplankton@reddit

This actually occurred to me the other day too. I don't think they'll be that useful.. LLMs right now are like brute forcing intelligence in every possible way. Out of date knowledge is super confusing. It's already annoying as fuck when the model doesn't use svelte runes.

I think the only real solution is: breakthrough in training costs(?) or moving to another paradigm other than transformers.

[-]

neph1010@reddit

Where will LLMs be in 3-5+ years? Is the transformer the ultimate architecture? Can there be nothing better?

[-]

UniqueIdentifier00@reddit

I was also thinking about this yesterday while I had some windshield time. I suspect regulation will force the bigger companies to stop giving open access to local llms eventually, with Mythos concerns probably being the drop in the bucket that gets the ball rolling downhill towards that regulation in years coming. If legislation holds local llms back it will probably be under the guise of “security” but in reality will be lobbyist-pressured monopolization of ai power holders. The masses will need to fund the upkeep of the massive amount of needed compute coming.

Surveillance doesn’t work if everyone is using local ai models.

Anyways, I’m being super cynical and Orwellian, it probably won’t be that bad. As @awwtifishal says, there will probably always be a network of like minded individuals who can pool compute and resources to further local usage in an open source environment.

[-]

JohnBooty@reddit (OP)

Yeah. I think it's absolutely possible things go in that kind of a dark direction.

I actually think the most realistic dark scenario might already be happening - LLM providers funneling data on user data/habits to the government, or at least making it available. The same old story that Google and others have been running through for the last 25+ years. It would be shocking if that doesn't happen IMO. I'm actually gonna call that one 100% likely.

That's the type of tack US has always taken... give the illusion of freedom, while monitoring you all the while. As opposed to the more overtly Orwellian tactics of forbidding you to discuss certain topics entirely, i.e. discussing Tiananmen Square being verboten.

I think that's much less likely in America, but I certainly don't rule it out either.

[-]

FullOf_Bad_Ideas@reddit

As long as closed models will be easy to distill we should be good.

[-]

CreamPitiful4295@reddit

Time to start vibing the next SETI at Home app for llm building

[-]

VoiceApprehensive893@reddit

>10b's might get rare after china stops griefing the american ai industry

[-]

N1ckFG@reddit

Before financialized datacenters messed up the component markets, it looked like we were about to see a broad switch to unified RAM--until recently mainly limited to phones and Macs, but increasingly available on other laptop and desktop platforms. I think the continued interest in local models is going to depend on the availability of cheap unified RAM machines. If that trend turns out to be only delayed by the datacenter shenanigans, then demand for high-quality local inference will follow.

[-]

CreamPitiful4295@reddit

Probably, but until local inference is faster that’s not a great option

[-]

pmttyji@reddit

Come on, don't say that :( I need Open models till 2030 at least. Hope we get so much evergreen models for consumer GPUs before that.

Anyway we'll get some open models here

[-]

ComplexType568@reddit

Fine tunes or merges will probably rule the scene then. As it used to when Llama was prevalent and competition was sparse.

Although, to be honest, most models feel extremely SOTA for their size. At least for me when I had to scrap by with DSV3 And Claude Sonnet 3.5.

What I think will also lead if a situation like this happens would be harnesses and pipelines. Using the LLM more efficiently, prompting, self review (or model-peer review, which has had many testimonies for being "better" than just one megamodel)

[-]

Hood-Boy@reddit

Hopefully better harness, tooling, agents flows will establish

[-]

Budget-Juggernaut-68@reddit

then the community will get serious at developing distributed training, and data curation.

[-]

Sash17@reddit

I think good retrieval/context systems could keep today’s models useful way longer than people expect. A lot of real-world usage is reasoning over provided info anyway, not raw memorized knowledge. The bigger issue would probably be falling behind in reasoning/planning quality, not just outdated facts.

[-]

Adventurous-Paper566@reddit

Hors code, pour 80% des usages communs, les modèles locaux déjà disponibles sont plus que suffisants.

[-]

Bimbam_tm@reddit

Someone would probably figure out a folding@home style crowdsourced GPU training approach just to put the finger up big tech idk

[-]

Technical-Earth-3254@reddit

if we build out *really* good knowledge-retrieval tooling

The better question is: Does this still work the same way in 5 years? I can't tell. This is one of the rare things where I would say to just let it happen and see from there, because right now I don't really see a way to tell.