[FOUNDING] SupraLabs - real open-source AI models for you!

Posted by LH-Tech_AI@reddit | LocalLLaMA | View on Reddit | 49 comments

Hey r/LocalLLaMA !

We founded SupraLabs, and it's huge!

What we do?

We train, finetune and explore small models with good results to revolutionize small AI models by making them accessible to everyone. ❤️🙂

Are we on Hugging Face?

Of course: https://huggingface.co/SupraLabs

Are there any models yet?

YES THERE ARE MODELS!

E.G.: https://huggingface.co/SupraLabs/Supra-Mini-v4-2M and many more!

What models will come?

We will share more models soon, like:

StorySupra 10M: a 10M story telling SLM running on edge devices
Supra Mini v5 5M: a cutting-edge SLM with really good performance and great results
many more... stay tuned

Where do I get updates?

You can read our blog here: https://huggingface.co/spaces/SupraLabs/Blog
Come check it out!

Can I join or support this?

Yes! Feel free to ask in a community discussion on HF or under this post in the comments if you want to join us!
Plus: you can always support us by dowwloading and liking our models and following us on HF.

See all models here: https://huggingface.co/SupraLabs/models

[-]

KickLassChewGum@reddit

// prompt "Artificial intelligence is " // output "Artificial intelligence is the idea of the theory that the world has a very high-performance technology, which is also more important to society's lives than people who are being able to find their own knowledge and understanding how it can be used for future generations..."

v4 is a base model, it is not fine-tuned for instruction following or chat. The next experiments on our roadmap include fine-tuning on instruction datasets, exploring quantization at this new scale

...please tell me this is some sort of elaborate practical joke

[-]

Dangerous_Try3619@reddit

It's a 2M parameter base model trained from scratch, not an instruction-tuned assistant yet. Do you expected ChatGPT-level alignment? 😭 lol

[-]

KickLassChewGum@reddit

It's the result of a 10-minute tutorial, ostensibly pasted into Claude Code, released as a "revolution of small AI models by making them accessible to anyone".

This already is accessible to everyone. Tell Claude you want to train a 2M model from scratch based on ~10 million tokens from fineweb-edu, launch your training run on a calculator, watch loss go down, and found "non-profit organization" apparently.

[-]

TemperatureMajor5083@reddit

that it's a model with 2 million parameters.

2 days ago, this Lab released MicroSupra-1k, a 1 thousand parameter model (which already outperformed similar models up to 10x larger). Releasing 2M today means they scaled up pretraining 2000x in 2 days. This means they are on track to release a 8T model in a week max if they can keep that pace. Also, SLMs like that can outperform vastly larger models if you just finetune them on one specific task. For example, such an 1k Model trained on just a specific type of poetry could outperform commercial models in that specific niche.

[-]

KickLassChewGum@reddit

that it's a model with 2 million parameters.

2 days ago, this Lab released

lol, "lab"

(which already outperformed similar models up to 10x larger).

Outperformed at what? Parameter count?

[-]

TemperatureMajor5083@reddit

Yes, this 1k Model performed at the same level or above the level of 10k Param Models. Isn't that impressive?

[-]

Foreign_Risk_2031@reddit

This is not a general purpose model

[-]

KickLassChewGum@reddit

Of course not. To be exact, it's a no purpose model.

[-]

Megneous@reddit

Um... you can make a 2M model output coherent English by training it on TinyStories V2. That was like the entire point of the dataset- to prove that sub-10M models were capable of coherent English if trained on very small vocabularies and simplified syntax.

[-]

KickLassChewGum@reddit

You can make a tiny model coherent if you train it on an extremely simple vocabulary and an intricately constructed/curated dataset, yes. But for one, that is not relevant to anything I said; and for two, it's hardly "revolutionary" art.

[-]

Megneous@reddit

No one said it's revolutionary art. You said that models with 2M parameters are incapable of producing anything other than garbage, and there's highly cited research papers and one of the most well-known datasets in machine learning that contradicts you, so I thought I would bring it up.

[-]

KickLassChewGum@reddit

I know reading is hard, but I didn't think the comment was that long. Read it again, very carefully. Use your attention.

[-]

TemperatureMajor5083@reddit

Can't wait for 50M

[-]

Dangerous_Try3619@reddit

The 50M model are in our future roadmap now.

[-]

TemperatureMajor5083@reddit

We also need a 0.1M reasoning model, for running on MCUs, for example. Should fit GPT-2 level intelligence (at least) fully in a rp2040s SRAM.

[-]

elemental-mind@reddit

Behold my VRAM!

[-]

LH-Tech_AI@reddit (OP)

YEAH!

[-]

KaMaFour@reddit

This look absolutely tiny (1k parameter model, uwu), but I guess there are some usecases for them at that size and there are things worth learning from making them. Interested in how well the new models will be able to keep coherence at that size.

[-]

Dangerous_Try3619@reddit

1k parameters is so small that is equal to C. elegans worm (1k neurons)

[-]

Dany0@reddit

Not really. The connection between artificial NN weights and real biological neurons is shaky. You'll find numbers of anything between 20 to 1000 weights equal one biological neuron. Some people day they are fundamentally incomparable. There's a whole philosophical argument about it

But I liked the argument which likened one biological fruit fly neuron to 300 ish NN model params

So C. elegans is a 300k param model, which honestly... kinda makes sense?

[-]

Dany0@reddit

Also keep in mind that in a biological system, neurons aren't the only thing doing the thinking. Memories can be "stored" in dna or dna markers. While an artificial model has different limitations

[-]

lordhiggsboson@reddit

Is there new evidence for memory being stored in DNA? Do you have links to any papers? Last I read up on this was that this is not plausible, curious to read new findings on it

[-]

Dany0@reddit

Of course memory stored is in the DNA. You have a sexual organ, and your father had a sexual organ. Bam, you remember what it's like to have a wee wee

To be serious, it's still a very philosophical question, what the fuck is memory in the first place

The only thing we know for sure is that certain stress marks are inherited epigenetically. Some genes get turned on/off, its way especially visible in populations that had suffered through war and famine. I would say arguing that is a sort of memory is certainly acceptable

[-]

LetsGoBrandon4256@reddit

You have a sexual organ, and your father had a sexual organ. Bam, you remember what it's like to have a wee wee

I literally inherited half of my chromosomes from my mom. Why don't I remember how it feels like to have a cunt between my legs?

What a regarded anology.

[-]

GiveSparklyTwinkly@reddit

🤦‍♀️ Genetic information absolutely does not contain memories. It contains information.

[-]

Dany0@reddit

By your definition memories... are not information? You should probably delete your comment my man I don't see how you can dig yourself out of this one

[-]

GiveSparklyTwinkly@reddit

What? Are all oranges also apples?

[-]

Dany0@reddit

I'm sorry I cannot even pretend to be stupid enough to get on your level

[-]

GiveSparklyTwinkly@reddit

Are you going to argue the point or just attempt to insult me?

[-]

MerePotato@reddit

Parameters are more akin to synapses than neurons so the worm comes out on top

[-]

LH-Tech_AI@reddit (OP)

Yeah!

[-]

gotfan86@reddit

Interesting project, could you show us some example outputs of what the models can do?

[-]

Kodix@reddit

Thankfully they do provide some on the model card.

Prompt: "The main concept of physics is "

Output: "The main concept of physics is `'animi-'hisi', and therefore the universe's own light. In this case, a theory that is not only used to explain what it can be called "the universe" or 'two planets, which are exactly about the earth's gravitational energy, but also in reality, we know how much things do. It will actually mean that the stars from the Earth’s orbit, as the galaxy, would say, they have to get into the planet. The same thing that has been discovered at all, there was nothing more than that of anthropological world than those who were now doing so. And if you don't think, why does this matter? It seems that I am"

[-]

Public-Thanks7567@reddit

gguf ?

[-]

SnooPaintings8639@reddit

Remember to don't go under Q4, it will lose some reasoning capability.

[-]

Kodix@reddit

You sure? I'm not really seeing a difference between UD-Q8_K_XXL and IQ1_XXS for this model, personally.

[-]

lordhiggsboson@reddit

Congrats! Excited to see more research into small model development. Do you have any details to share on the architecture you are using or any learnings that surfaced during the training/reserach? Would love to learn more about the techniques you employed

[-]

Dangerous_Try3619@reddit

Thank you! 🙏

The model is currently a very small experimental transformer (\~2M parameters), focused mainly on testing language learning at tiny scale rather than instruction following.

Right now we're experimenting with:

tokenizer compatibility improvements
training stability
quantization support
scaling behavior on small architectures
conversational/instruction tuning for future versions

One interesting lesson so far is how much coherent semantic structure can emerge even at extremely small parameter counts when the training pipeline is stable.

Still a lot to improve, but the goal is learning and iterating step by step 🚀

[-]

lordhiggsboson@reddit

Interesting! Thanks for sharing

[-]

LH-Tech_AI@reddit (OP)

Thanks ❤️ 🙂

All code is always in the model repos!

[-]

lordhiggsboson@reddit

Awesome! I'll check it out

[-]

LH-Tech_AI@reddit (OP)

YEAH!!

[-]

More-Curious816@reddit

I would love to see detailed technical blogs from your lab in the future after each release. it would be cool as knowledge sharing and also you folk gain a research reputation which will attract investment or acquisitions from big tech.

[-]