Does anyone here rember EleutherAI with GPT-Neox-20b? Or BigScience Bloom 176B?
Posted by Mr_Moonsilver@reddit | LocalLLaMA | View on Reddit | 18 comments
Those were the days... even before Llama and Mistral 7b, or the first Deepseek-Coder (7b and 33b), or WizardLM models with their 16k context windows... man, I feel like an OG even though this is only some 3 or 4 years ago. Things have come a long way. What were your favourites?
StellaAthena@reddit
GPT-NeoX-20B will always have a soft spot in my heart :)
MPT was also super impressive at the time and the best open source licensed model for quite a while
EmbarrassedAsk2887@reddit
wizard lm and alpaca datasets, bitsandbytes, qlora, amazing times man
Mr_Moonsilver@reddit (OP)
Oh yeah! And interesting that a number of the early players aren't around anymore. Wonder why that is.
Ok_Mammoth589@reddit
Pour one out for The Bloke
Mr_Moonsilver@reddit (OP)
Who could have forgotten The Bloke!
Ok_Category_5847@reddit
Because they got scaled out. Its too expensive for local finetuners to keep up. Larger models and larger datasets ramped to prohibitive gpu costs over a few years.
EmbarrassedAsk2887@reddit
i’m here you are here.
a_beautiful_rhind@reddit
NeoX would never want to run. I kept trying to compress it with GPTQ.
Mr_Moonsilver@reddit (OP)
Curious now, what GPU were you using?
a_beautiful_rhind@reddit
Old pascal 6000 24gb.
Mr_Moonsilver@reddit (OP)
like a boss
DinoAmino@reddit
DeepSeek Coder 33B was awesome for a minute. Immediately got a 2nd 3090 in order to run it q8.
Several-Tax31@reddit
I remember running deepseek coder 7B and that was impressive. That was way before the deepseek moment, and I thought those guys were up to something. I wish they release a small model like this again.
shockwaverc13@reddit
6.7B 🤷🤷🤷
Mr_Moonsilver@reddit (OP)
I think it was the first local model that one-shotted snake
Myrkkeijanuan@reddit
The best I could do was GPT-Neo-2.7B on KoboldAI. Back then I thought that I wouldn't be able to run a 20B model until the 2030s because you needed 40GB of VRAM to run them.
Altruistic_Heat_9531@reddit
I remember GPT 3 as frontier model, and saying myself "There is no way in hell i can house that parameters on my computer" and here i am with Qwen 80B and Nemotron 120B
Mr_Moonsilver@reddit (OP)
Makes you wonder what we'll be able to do in a year's time