Does anyone here rember EleutherAI with GPT-Neox-20b? Or BigScience Bloom 176B?

Posted by Mr_Moonsilver@reddit | LocalLLaMA | View on Reddit | 18 comments

Those were the days... even before Llama and Mistral 7b, or the first Deepseek-Coder (7b and 33b), or WizardLM models with their 16k context windows... man, I feel like an OG even though this is only some 3 or 4 years ago. Things have come a long way. What were your favourites?

[-]

StellaAthena@reddit

GPT-NeoX-20B will always have a soft spot in my heart :)

MPT was also super impressive at the time and the best open source licensed model for quite a while

[-]

EmbarrassedAsk2887@reddit

wizard lm and alpaca datasets, bitsandbytes, qlora, amazing times man

[-]

Mr_Moonsilver@reddit (OP)

Oh yeah! And interesting that a number of the early players aren't around anymore. Wonder why that is.

[-]

Ok_Mammoth589@reddit

Pour one out for The Bloke

[-]

Mr_Moonsilver@reddit (OP)

Who could have forgotten The Bloke!

[-]

Ok_Category_5847@reddit

Because they got scaled out. Its too expensive for local finetuners to keep up. Larger models and larger datasets ramped to prohibitive gpu costs over a few years.

[-]

EmbarrassedAsk2887@reddit

i’m here you are here.

[-]

a_beautiful_rhind@reddit

NeoX would never want to run. I kept trying to compress it with GPTQ.

[-]

Mr_Moonsilver@reddit (OP)

Curious now, what GPU were you using?

[-]

a_beautiful_rhind@reddit

Old pascal 6000 24gb.

[-]

Mr_Moonsilver@reddit (OP)

like a boss

[-]

DinoAmino@reddit

DeepSeek Coder 33B was awesome for a minute. Immediately got a 2nd 3090 in order to run it q8.

[-]

Several-Tax31@reddit

I remember running deepseek coder 7B and that was impressive. That was way before the deepseek moment, and I thought those guys were up to something. I wish they release a small model like this again.

[-]

shockwaverc13@reddit

6.7B 🤷🤷🤷

[-]

Mr_Moonsilver@reddit (OP)

I think it was the first local model that one-shotted snake

[-]

Myrkkeijanuan@reddit

The best I could do was GPT-Neo-2.7B on KoboldAI. Back then I thought that I wouldn't be able to run a 20B model until the 2030s because you needed 40GB of VRAM to run them.

[-]

Altruistic_Heat_9531@reddit

I remember GPT 3 as frontier model, and saying myself "There is no way in hell i can house that parameters on my computer" and here i am with Qwen 80B and Nemotron 120B

[-]

Mr_Moonsilver@reddit (OP)

Makes you wonder what we'll be able to do in a year's time