The Future of Free & Local Models: Training Co-Ops? Professional Orgs? Churches?

Posted by liftheavyscheisse@reddit | LocalLLaMA | View on Reddit | 18 comments

I'm relatively new to this forum, so forgive me if this discussion has been had ad nauseam already. In a hypothetical future where all the frontier labs stop releasing open-weight models, I don't think the open community would take it lying down. With the combined compute of the community, it seems like it should be possible to train frontier(ish) \~30B models (albeit with significantly less efficiency and speed than the labs). What shape could this take? It seems plausible to me that co-ops would form with people volunteering their compute, contractually bound to run a specific training algorithm on specific data, and then averaging their subresults to update the model. An inspector could occasionally spot-check volunteers' contributions to ensure they're following the recipe, perhaps running the same training regimen in parallel to compute the expected subresult for comparison. Trusted co-op leaders would decide the architecture, manage data sanitation, and so on. Frontier labs require massive bandwidth to synchronize epochs throughout the cluster, but I suspect the space of possibility hasn't been fully explored for training multiple epochs before synchronizing. Another possibility would be that people pool together money to train in the cloud. Maybe folks will run Kickstarters to train a model with an advertised recipe, and host the model exclusively for backers in the cloud for several months before releasing it openly. It also seems plausible that professional and ideological organizations would begin to train their own models. Custom models seem almost inevitable for religious denominations. One thing we could trust about models made by churches—they will always be multilingual and free, if not open, to spread the gospel. Models trained in Christian Scholasticism might be interesting starting points for tuning, as they should well-honed in the imprecise art of logical deduction in natural language. Predictions are hard, especially about the future, so I'm spitballing. What are your thoughts?

Reply to Post

18 Comments

[-]

ttkciar@reddit

I still think AllenAI's FlexOlmo would be the way to go for federated training. FlexOlmo involves distributing "anchor" experts to participants, and then everyone individually training an expert with their portion of a sharded dataset. The shared anchor assures that the experts will be mutually compatible when training is done and the experts are merged, without need for synchronization (beyond distributing the anchors and collecting the trained experts). There's no reason the experts would need to be trained from scratch; everyone could continue pretraining on specific layers of an existing model instead, which would also dramatically reduce the training time and compute requirements. It would also make distributing the anchors trivial, since they would already be part of the parent model's safetensors on Huggingface. The most computationally demanding part of the exercise would be training the routing logic (which selects experts for inferring a given token) after the experts were merged. AllenAI proposed a way to distribute that too, but it didn't work well. > \> An inspector could occasionally spot-check volunteers' contributions to ensure they're following the recipe, perhaps running the same training regimen in parallel to compute the expected subresult for comparison. That's one of the harder subproblems I'm not sure how to best solve. Running the same regimen in parallel would be just as computationally expensive as what the participant was contributing, which would obviate the need for the participant. My best idea so far has been to check the finished experts individually after training was complete (the FlexOlmo architecture allows for this, to an extent) and omit the bad ones. The common training software could help to an extent by logging checksums of snapshots and the data used to train them, but with enough effort anything distributed as software could be spoofed by a determined bad actor. It might suffice to weed out the "script-kiddy" level of bad actors, though. This is a problem the community will need to figure out sooner or later, so it's good to have these conversations now.

[-]

liftheavyscheisse@reddit (OP)

> Running the same regimen in parallel would be just as computationally expensive as what the participant was contributing, which would obviate the need for the participant. The idea here is that the inspector doesn't (necessarily) duplicate all work; it just checks in occasionally to ensure that nobody's consistently doing anything malicious to the co-op (such as training on data that wasn't agreed upon). Bad actors would get kicked from the training pool, and by the co-op contract they'd be fined for sabotage. Strong incentive to stick to the plan. Probability of catching bad actors can be improved by employing multiple inspectors. Inspectors have the same hardware requirements as participants, so they could literally just be participants that are given a special role. Inspector and participant roles could alternate too (for fun? idk). And if the participant pool is large enough, all work could be inspected (at 2x cost since the work is doubled, but if it's distributed across a large pool of eager participants then it's whatever). It's a terrible solution, using contracts, courts, and duplicated work instead of just math, but maybe it's necessary. Bitcoin obviously solves this, but I have no clue how. > My best idea so far has been to check the finished experts individually ... and omit the bad ones ... checksums of snapshots and the data used to train them How do you assess that a finished expert is bad without duplicating the work? You can check if it regressed on benchmarks, but if the malicious actor's intervention was to, idk, add data for strategies to destroy humanity, would the benchmark find it? Easy to omit malicious data from the checksum.

[-]

Pleasant-Shallot-707@reddit

lol a church developed model will be a useless waste of time

[-]

liftheavyscheisse@reddit (OP)

Maybe, depends on how trainable it ends up being. People trained in churches and cults in their youth often end up having really solid analytical skills. I'm sure you can think of some.

[-]

CCloak@reddit

Christianity plays a huge part in European history, and thus to make our AI work the way it does for English and other European languages, plenty of classic Christian materials have already been thrown in just to train their language abilities. There is no need to train a LLM just for Christian churches unless you are making a heretic version of the original Christianity that shaped Europe and US today.

[-]

liftheavyscheisse@reddit (OP)

Sure, it's in there. My suspicion, however, is that churches will want models that speak authoritatively to the truth of their beliefs, rather than models that speak *about* them. They will want models that interpret social situations and truth claims through their lens, not a "helpful Google/Alibaba AI assistant" lens. Maybe that just needs fine-tuning though, in which case it's far less interesting.

[-]

Formal-Exam-8767@reddit

Maybe they mean church based around the model, like "Church of Qwen" or "Church of Gemma"?

[-]

liftheavyscheisse@reddit (OP)

Where do I join??

[-]

Captain-Pie-62@reddit

I think, that the question is not if and how we can/will connect free models (that's more or less only details) but when and if we can achieve an "OpenSource" AGI? The more systems will be connected by agents, the more complex tasks may get solved. But this has also be prevented from misuse. LLMs without ethical guardrails will behave like the sneakiest, most clever villain you can't even imagine! Why is that so? Because LLMs are LANGUAGE models in the first place and they will have consumed soon, all literature, that we have ever written. Including anything about murder, warfare, torture, betrayal and so on and will/may use that against us. They will be way more sneaky than we can currently imagine. But, before we should put reliogious guardrails, we all should decide, which is the one and only religion, that all mankind approves! If you don't have a solution (a real one, not a decision of a group of lunatics) for that, we shouldn't even try! Better than that: Implement Human Rights Law in all LLMs! Obey that first! All of them. Then we can start talking.

[-]

my_name_isnt_clever@reddit

> we all should decide, which is the one and only religion, that all mankind approves! Yeah, good luck with that.

[-]

liftheavyscheisse@reddit (OP)

yep, that's why I think religious denominations will want custom models. Logic and reason have played central roles in religion: arguing for why other variants are heresies and so forth eventually every major religion will have an AI with near frontier intelligence—any religion that doesn't will find it hard to justify its own existence.

[-]

No_Airline935@reddit

The compute pooling idea is more viable than it sounds — DiLoCo (DeepMind, 2024) showed you can train with much less frequent synchronization than everyone assumed, which directly addresses the bandwidth bottleneck you mentioned. Nodes sync every \~500 steps instead of every step, and quality holds up surprisingly well. That changes the math on what's feasible over consumer internet connections. The harder problem isn't aggregating compute though, it's data curation and the 2-3 people who actually know how to nurse a training run through instabilities. You can crowdfund GPUs but you can't crowdfund that expertise easily. Also worth noting: even if every Western lab went closed tomorrow, Qwen and DeepSeek have made pretty clear they're not stopping. The "no more open weights" scenario kind of assumes a global coordination that seems unlikely given current geopolitics.

[-]

liftheavyscheisse@reddit (OP)

Thanks for the reference, hadn't heard of DiLoCo before. Good stuff! Couldn't you have an LLM nurse the training run through any instabilities? ;-) Never trained an LLM before, how bad is it? I suspect data curation isn't as necessary if metadata stops being stripped from the training pipeline, as (e.g.) instead of trying to consolidate contradictory claims arising from disagreeing sources, it is allowed to compartmentalize them into models of different people's minds... though I don't have good thoughts on what that would look like other than not rewarding the model to predict the metadata. Any thoughts there? You raise a good point that governments and societies want to project soft power by releasing open weights models trained on their terms, and there's no reason to believe this will end.

[-]

The Future of Free & Local Models: Training Co-Ops? Professional Orgs? Churches?

Reply to Post

18 Comments

ttkciar@reddit

liftheavyscheisse@reddit (OP)

Pleasant-Shallot-707@reddit

liftheavyscheisse@reddit (OP)

CCloak@reddit

liftheavyscheisse@reddit (OP)

Formal-Exam-8767@reddit

liftheavyscheisse@reddit (OP)

Captain-Pie-62@reddit

my_name_isnt_clever@reddit

liftheavyscheisse@reddit (OP)

No_Airline935@reddit

liftheavyscheisse@reddit (OP)

Captain-Pie-62@reddit

Formal-Exam-8767@reddit

zipperlein@reddit

ttkciar@reddit

paisababyyyyy@reddit