Staying Warm During AI Winter, Part 1: Introduction
Posted by ttkciar@reddit | LocalLLaMA | View on Reddit | 26 comments
The field of AI has always followed boom/bust cycles.
During "AI Summers", advances come quickly and enthusiasm runs high, but commercial interests hype up AI technologies and overpromise on their future capabilities. When those promises fail to materialize, enthusiasm turns to disillusionment, dismay and rejection, and "AI Winter" sets in.
AI Winters do not mark the end of progress in the field, nor even pauses. All manner of technologies developed during past AI Summers are still with us, subject to constant improvement, and even commercial success, but they are not marketed as "AI". Rather, they are called other things -- compilers, databases, search engines, algebraic solvers, provers, and robotics were all once considered "AI" and had their own Summers, just as LLM technology is having its own.
What happens during AI Winters is that grants and venture capital for investing in AI dries up, most (but not all) academics switch to other fields where they can get grants, and commercial vendors relabel their "AI" products as other things -- "business solutions", "analytics", etc. If the profits from selling those products do not cover the costs of maintaining them, those products get shelved. AI startups which cannot effectively monetize their products are acquired by larger companies, or simply shut their doors.
Today's AI Summer shows every sign of perpetuating this pattern. LLM technology is wonderful and useful, but not so wonderful and useful that commercial interests cannot overpromise on its future, which is exactly what LLM service vendors are doing.
If overpromising causes disillusionment, and disillusionment causes AI Winter, then another AI Winter seems inevitable.
So, what does that mean for all of us in the local LLaMa community?
At first glance it would seem that local LLaMa enthusiasts should be in a pretty good position to ride out another Winter. After all, a model downloaded to one's computer has no expiration date, and all of the software we need to make inference happen runs on our own hardware, right? So why should we care?
Maybe we won't, at least for the first year or two, but eventually we will run into problems:
-
The open source software we depend on needs to be maintained, or it will stop working as its dependencies or underlying language evolve to introduce incompatibilities.
-
Future hardware might not be supported by today's inference software. For example, for CUDA to work, proprietary .jar files from Nvidia are required to translate CUDA bytecode into the GPU's actual instructions. If future versions of these CUDA .jar files are incompatible with today's inference software, we will only be able to use our software for as long as we can keep older JVMs compatible with the older .jar files running on our systems (and only with older GPUs). It's certainly possible to do that, but not forever.
-
If the GPU-rich stop training new frontier models, our community will have to fend for ourselves. Existing models can be fine-tuned, but will we find ways to create new and better ones?
-
The creation of new training datasets frequently depends on the availability of commercial services like ChatGPT or Claude to label, score, or improve the data. If these services become priced out of reach, or disappear entirely, dataset developers will need to find alternatives.
-
Even if the community does find a way to create new models and datasets, how will we share them? There is no guarantee that Huggingface will continue to exist after Winter falls -- remember, in AI Winters investment money dries up, so services like HF will have to either find other ways to keep their servers running, or shut them down.
These are all problems which can be solved, but they will be easier to solve, and more satisfactorily, before AI Winter falls, while we still have HF, while Claude and GPT4 are still cheap, while our software is still maintained, and while there are still many eyes reading posts in r/LocalLLaMa.
I was too young to remember the first AI Winter, but was active in the field during the second, and it left an impression on me. Because of that, my approach to LLM tech has been strongly influenced by expectations of another AI Winter. My best guess is that we might see the next AI Winter some time between 2026 and 2029, so we have some time to figure things out.
I'd like to start a series of "Staying Warm During AI Winter" conversations, each focusing on a different problem, so we can talk about solutions and keep track of who is doing what.
This post is just an introduction to the theme, so let's talk about it in general before diving into specifics.
ithkuil@reddit
That's excessively speculative, but I am very concerned about AI hardware availability as tariffs and the trade war with China ramp up. Strong possibility of a Taiwan blockade or worse in the next few years.
Jumper775-2@reddit
I do believe we are nearing a stable era
Sabin_Stargem@reddit
The only winter I foresee is hardware progression, not the software. With the recent election, the odds of Taiwan being attacked are much higher. That will disrupt the replacement of consumer hardware.
FrostyContribution35@reddit
Wdym AI winter, we literally got an OSS model with an 89.9 mmlu 3 days ago. Qwen 2.5 gave us a gpt-4o-mini class model at 32B parameters (some may argue the 14B is even 4o-mini class). These recent releases put us at the heels of the big tech companies. Furthermore smaller fine tuners like Nous Research and ArceeAI have made stellar fine tunes, proving smaller tuners can create models on par with big tech in certain domains. Zuck said Llama will continue to remain open source, Elon promised to release Grok 2 once Grok 3 was released.
ttkciar@reddit (OP)
Yep, those are all true things.
My position is that we should take the fullest advantage of these progressions for as long as they last, while preparing ourselves to stay warm when Winter falls.
Sad-Replacement-3988@reddit
People have been talking about the next AI winter since 2010, every couple of months there is a new bait post about it.
Guess what? It never happened, deep learning kept progressing, more companies opting in, more funding year after year as people found success with it.
Now we have LLMs that are way more useful than our old ML algorithms. There is no reason to think we are headed for one, the improvements in this decade far surpass last decade, and keep coming
visarga@reddit
You got to consider that
the transformer is simple, you can write down the equations on a napkin, and understand it with very limited math (high-school level, or 1st year college)
we can already port it to AMD, CPU, TPU, NPU, cell phones, laptops and even Raspberry Pi. It's not that complex, just pure code and a blob of data, as demonstrated by llama.cpp
it's already being integrated both in personal and business activities
we already have the model code, datasets and benchmarks ready for use or adaptation (which could be done by AI or with its help)
ttkciar@reddit (OP)
Yes, all of that is true.
My position is simply that while interest and resources remain high, we should be preparing for the lean times ahead.
FrostyContribution35@reddit
That’s a fair point. Fortunately we will be entering the winter with a big bundle of firewood and a cozy cabin to ride it out.
noobgolang@reddit
Really dont see any winter in sight
Thellton@reddit
Sorry for the long post /u/ttkciar
With regards to training, I'm not a professional in the field or have even studied and earned a degree in the field; but I have been watching and reading the past ?two? ?three? years since GPT-3.5 blew up and suddenly became a thing and honestly; I don't think we're as helpless in regard to obtaining frontier models when the AI labs stop putting models out.
the idea of federated learning gets mentioned on here every now and then, and I think it could be done as long as it's done asynchronously. basically, the idea is made up of four components; A Dataset-Coordinator server, a Micro-Model Hyperparameter Spec, 'Mini-Trainer' program, and a 'Macro-Merge' program.
Dataset-Coordinator hosts and distributes portions of a dataset (such as fineweb which has 15 Trillion tokens). these portions are called datasubsets and are made up of 3.6 million (roughly) tokens each from the dataset, 80% from a specific contiguous slice, and another 20% taken from two other 80% slices. these datasubsets are tagged for content such as "code", "conversational", "general knowledge", "science"... and so on.
The Micro-Model Hyperparameter Spec is basically the specification for a 2,000 parameter model, something small enough that even a non-SOTA CPU such as a 5 year old one, could (I think) do training for even if slower than a GPU.
the Mini-Trainer would take a link to a Dataset-Coordinator, pinging the server and download an unallocated, untrained on datasubset to train a Micro-Model in accordance with the hyperparameter spec on CPU or GPU (if a suitable one is available), train the model, and then upload the model to huggingface or similar and pinging the Dataset-Coordinator to notify it of training on the datasubset being complete. when the trained model is finished and uploaded, it's description will have every tag that it's datasubset included so that it can be readily filtered.
the Macro-Merge basically allows the user to create a recipe utilising tags to define the proportion of X and Y in the model; and then download a random selection of models that are compliant with the tags the user requested and then merge them into either a Mixture of Expert model utilising PEER layers as in Mixture of a Million Experts or a Dense Model. this would essentially operate similarly to Arcee's MergeKit. this idea is very heavily predicated upon the merging working as intended, and there would likely need to be some continued training done after the merge even with a dense model; after all, I am proposing frankensteining an LLM/whatever modality model.
the benefits of the idea as I see it is that: 1) we could be independent of the corporations/AI labs, 2) we wouldn't be training on wikipedia for the 90th time (seriously how many times have those tokens alone been trained on) and instead could simply update the relevant micro-models as needed, 3) if it works, it'd be highly customisable to a degree we're not capable of with current models, 4) hand curation of datasets to a degree becomes possible, reducing to a degree the black box nature of the model.
I'm not sure of how fast a CPU could train a 2,000 parameter model, nor am I entirely clear on all the specifics of training; all I'm certain of is that I think it'd be worth a try, and that I'm not nearly competent a programmer to actually execute on this idea.
ttkciar@reddit (OP)
You're mostly right about all of that, I think. It's exactly one of the topics to be addressed by a future "Staying Warm" thread. We have options. If we can train small models as community projects, we should be able to merge and retrain them into larger models.
Unfortunately I think the entry level is quite a bit higher than 2K parameters. For existing merge technology to work, models need a minimum number of layers (16'ish, I think), and if the end objective is to stack them into larger models, we would be better served if those layers started out pretty wide.
Fortunately once we had a small model trained, we should be able to perform continued-pretraining as a community with a much lower entry point -- each participant would only need to continue pretraining on a single unfrozen layer, if they could, or train a LoRA if continued-pretraining were beyond their capabilities.
We know continued-pretraining on selected unfrozen layers works, because that's how the Starling team came up with their (quite excellent) model. The organizing of participants would be the hardest part of the whole endeavor, not the technical aspects.
It's worth keeping in mind, too, that as affordable hardware grows more powerful (and especially when large numbers of datacenter GPUs start hitting eBay, and get snatched up by LLM enthusiasts), more people should clear the threshold of entry.
Thellton@reddit
It's good to hear that this brain bug I've had bugging me for a week isn't completely stupid or improbable. :D
shame about the current state of merging, as the idea (as I conceptualised it) kind of depended on being able to assemble a model from the micro-models in a fashion akin to building blocks. as to the 2,000 parameters, that actually wasn't a number picked out of the air at random as the Mixture of a Million Experts model that was developed by Xu Owen He (a google deep mind researcher) utilises experts that small and selects on a per layer basis between 64 and 512 of the experts (dependent upon routing mechanism's training), with the potential to select an expert multiple times as a token makes its way through the layers. depending on the number of experts being selected, the competence of the model likewise also increases.
a model like what I was thinking of could make for some interesting capabilities:
1) we'd have the ability to create arbitrarily large or small models, depending on how many compatible micro-model's were in circulation.
2) It'd be possible to have micro-models that are trained on its own timestamped outputs to provide it a form of episodic memory, which if I recall correctly is something that the Mixture of a Million Experts paper notes as a possibility.
3) It'd even be possible to prebake a personality into a merged model by training a micro-model/s that are essentially examples of a particular personality/character that the routing mechanism is trained to always include in its expert selection.
4) furthermore, a single model could actually have multiple routing mechanisms trained for it that cater to fast but less competent inference or slower more competent inference.
it's a bit of an usual conceptualisation of a machine learning model I guess :S
FullstackSensei@reddit
Hot take: Who says there has to be another winter? Just because it happened once or twice in the past it doesn't mean it has to happen again. It's not like it's a law of nature. The comparison with compiler, databases, search engines, or robotics is also flawed IMO. Most of those technologies reached very high maturity levels and pushed the limits of what available hardware can provide.
LLMs are unlocking the kind of change that was brought by the initial invention of computers. Call it another digital revolution. Sure, there's crazy VC money pouring in for anyone willing to present some slides to a VC, very much like the .com bubble in the 90s. I haven't heard anyone calling the 00s the internet winter just because the .com bubble burst.
I seriously doubt there'll be another AI winter, the same way I'm fairly certain there's an AI bubble now waiting to burst. The VC money will dry, but there'll be no shortage for research funding. There's still an ocean of applications for which LLMs haven't been tuned yet.
ttkciar@reddit (OP)
It sounds like you do expect there to be another AI Winter, except you think it won't touch academia. Otherwise what you describe is pretty much how previous Winters played out -- technologies continued to be developed into maturity and find their way into new applications, but at a lower level of funding and interest.
FullstackSensei@reddit
My definition of a winter is a lack of funding for fundamental research, like what happened in the 90s and 00s. It wasn't easy to do AI research even in academia. Hinton was for years looked at as the delusional professor. Even he thought backpropagation was hopeless. I'm old enough to remember when papers were being published arguing that multilayer networks were pointless, and even proofs that they can substituted with a single layer perceptron. THAT was a winter.
Please excuse me if this sounds mean, I really don't mean it that way, but you also seem to not understand well how important software compatibity is in the industry, or how tools like CUDA are distributed, or what happens when a new version of a library stops supporting an old version of hardware. You also don't seem to be aware of the open source efforts to generate huge synthetic datasets, or the open source synthetic data sets that are already out there (ex: cosmopedia, and soon cosmopedia 2).
LLM training costs are also falling at an exponential rate. Karpathy said it took over $50k to train GPT2 in 2019, and you can replicate it today for $250. Faster hardware, better training algorithms, and better understanding of how to prepare data continue to drive this exponential fall in training cost.
When the bubble bursts, Nvidia, AMD, Intel, etc will have no market for their products but us good old consumers. We'll get the High VRAM cards we've been dreaming about within a few months, and we'll be able to train 7-10B models at home, or pool our resources together (distributed training is coming) to train 70-100B parameters.
The genie is out, and there's still a ton it can do for consumers and for businesses (without hype). That's why I don't believe in a winter.
ttkciar@reddit (OP)
Actually I am developing my own implementation of Evol-Instruct, so am quite familiar with synthetic datasets, and have Cosmopedia and some other datasets archived locally. These were topics I intend to bring up in future "Staying Warm" conversations. Can't blame you for making assumptions, though; it's hard not to project negative characteristics onto those with whom we disagree.
_Erilaz@reddit
Because it isn't about the technology, it's the society working as a massive pendulum. Winter follows summer, naturally. Hyped people are stimulated to invest their efforts or emotions into something (sometimes productively, oftentimes not), but they HAVE TO pay with exhaustion no matter what, as one can't simply sustain that energy state indefinitely, the motivation burns out, hence the stale stage.
race2tb@reddit
Training these giant models is slowing everything down. it would be faster to have hundreds of teams taking different approaches and training smaller models to prove out transferable gains to larger models before training these giant models for not that much gains.
visarga@reddit
Pretraining is expensive but we only need one or a few base models. Finetuning is easy and can be done on our own computers or on rented VMs for cheap.
Inevitable_Fan8194@reddit
Maybe this time around an AI winter is not that a bad thing, given how people are freaking out about AI (and science/tech in general). It's a good way to let things cool down.
This is a problem called software rot. It's a serious issue with python machine learning programs, which tend to not be executable anymore after just a few months if their maintainers don't stay on top of their dependencies.
That's why (as much as I love python) I was so happy to see llama.cpp be released, personally. It's not necessarily a given it will be less subjected to software rot, but it tends to be the case of C++ programs (and C programs even more). Then again, there may be problems with nvidia drivers and tools, which are very version sensitive as well (and they even introduce hard dependencies to specific gcc versions 🙄). I'm not sure how exposed to that llama.cpp is. Anyway, I think we should all be very grateful to Georgi Gerganov.
visarga@reddit
This sounds to me like "Oh, our cute little Johnny is so perfect, I wish he didn't grow as fast". Never happened
ttkciar@reddit (OP)
Your thoughts run parallel to my own. The fact that llama.cpp is mostly written in C++, and is self-contained with relatively few external dependencies, and is small enough that I might be able to maintain it myself if need be, all contributed to my decision to make it my go-to inference stack.
I'd like to see its training capabilities come back, too, so it can truly be a do-everything tool, but we will see. The recent chat on https://github.com/ggerganov/llama.cpp/pull/8669 looks promising.
Someone13574@reddit
Personally I don't think software rot will hurt local inference too much. The funding for large companies may dry up, but the community will not disappear. There will still be enough people using the software to keep it up and running at the very least. Development would probably slow, but its not going to stop completely.
Vabaluba@reddit
The focus is too fold: big models by big companies with idea “how far can we stretch it by simply adding more”? And small, targeted models and how to stitch them all together for a best use? Of course it is more nuanced and a lot more going on in industry. But outside tech most companies still has no AI, not even proper data infrastructure to start AI efforts. I might be wrong. Anyone else might chip in?
Balance-@reddit
Not sure. While investments are insane, there's still steady capabilities increases between releases.