[FOUNDING] SupraLabs - real open-source AI models for you!
Posted by LH-Tech_AI@reddit | LocalLLaMA | View on Reddit | 49 comments

Hey r/LocalLLaMA !
We founded SupraLabs, and it's huge!
What we do?
We train, finetune and explore small models with good results to revolutionize small AI models by making them accessible to everyone. β€οΈπ
Are we on Hugging Face?
Of course: https://huggingface.co/SupraLabs
Are there any models yet?
YES THERE ARE MODELS!
E.G.: https://huggingface.co/SupraLabs/Supra-Mini-v4-2M and many more!
What models will come?
We will share more models soon, like:
- StorySupra 10M: a 10M story telling SLM running on edge devices
- Supra Mini v5 5M: a cutting-edge SLM with really good performance and great results
- many more... stay tuned
Where do I get updates?
You can read our blog here: https://huggingface.co/spaces/SupraLabs/Blog
Come check it out!
Can I join or support this?
Yes! Feel free to ask in a community discussion on HF or under this post in the comments if you want to join us!
Plus: you can always support us by dowwloading and liking our models and following us on HF.
See all models here: https://huggingface.co/SupraLabs/models
KickLassChewGum@reddit
...please tell me this is some sort of elaborate practical joke
Dangerous_Try3619@reddit
It's a 2M parameter base model trained from scratch, not an instruction-tuned assistant yet. Do you expected ChatGPT-level alignment? π lol
KickLassChewGum@reddit
It's the result of a 10-minute tutorial, ostensibly pasted into Claude Code, released as a "revolution of small AI models by making them accessible to anyone".
This already is accessible to everyone. Tell Claude you want to train a 2M model from scratch based on ~10 million tokens from fineweb-edu, launch your training run on a calculator, watch loss go down, and found "non-profit organization" apparently.
TemperatureMajor5083@reddit
2 days ago, this Lab released MicroSupra-1k, a 1 thousand parameter model (which already outperformed similar models up to 10x larger). Releasing 2M today means they scaled up pretraining 2000x in 2 days. This means they are on track to release a 8T model in a week max if they can keep that pace. Also, SLMs like that can outperform vastly larger models if you just finetune them on one specific task. For example, such an 1k Model trained on just a specific type of poetry could outperform commercial models in that specific niche.
KickLassChewGum@reddit
lol, "lab"
Outperformed at what? Parameter count?
TemperatureMajor5083@reddit
Yes, this 1k Model performed at the same level or above the level of 10k Param Models. Isn't that impressive?
Foreign_Risk_2031@reddit
This is not a general purpose model
KickLassChewGum@reddit
Of course not. To be exact, it's a no purpose model.
Megneous@reddit
Um... you can make a 2M model output coherent English by training it on TinyStories V2. That was like the entire point of the dataset- to prove that sub-10M models were capable of coherent English if trained on very small vocabularies and simplified syntax.
KickLassChewGum@reddit
You can make a tiny model coherent if you train it on an extremely simple vocabulary and an intricately constructed/curated dataset, yes. But for one, that is not relevant to anything I said; and for two, it's hardly "revolutionary" art.
Megneous@reddit
No one said it's revolutionary art. You said that models with 2M parameters are incapable of producing anything other than garbage, and there's highly cited research papers and one of the most well-known datasets in machine learning that contradicts you, so I thought I would bring it up.
KickLassChewGum@reddit
I know reading is hard, but I didn't think the comment was that long. Read it again, very carefully. Use your attention.
TemperatureMajor5083@reddit
Can't wait for 50M
Dangerous_Try3619@reddit
The 50M model are in our future roadmap now.
TemperatureMajor5083@reddit
We also need a 0.1M reasoning model, for running on MCUs, for example. Should fit GPT-2 level intelligence (at least) fully in a rp2040s SRAM.
elemental-mind@reddit
Behold my VRAM!
LH-Tech_AI@reddit (OP)
YEAH!
KaMaFour@reddit
This look absolutely tiny (1k parameter model, uwu), but I guess there are some usecases for them at that size and there are things worth learning from making them. Interested in how well the new models will be able to keep coherence at that size.
Dangerous_Try3619@reddit
1k parameters is so small that is equal to C. elegans worm (1k neurons)
Dany0@reddit
Not really. The connection between artificial NN weights and real biological neurons is shaky. You'll find numbers of anything between 20 to 1000 weights equal one biological neuron. Some people day they are fundamentally incomparable. There's a whole philosophical argument about it
But I liked the argument which likened one biological fruit fly neuron to 300 ish NN model params
So C. elegans is a 300k param model, which honestly... kinda makes sense?
Dany0@reddit
Also keep in mind that in a biological system, neurons aren't the only thing doing the thinking. Memories can be "stored" in dna or dna markers. While an artificial model has different limitations
lordhiggsboson@reddit
Is there new evidence for memory being stored in DNA? Do you have links to any papers? Last I read up on this was that this is not plausible, curious to read new findings on it
Dany0@reddit
Of course memory stored is in the DNA. You have a sexual organ, and your father had a sexual organ. Bam, you remember what it's like to have a wee wee
To be serious, it's still a very philosophical question, what the fuck is memory in the first place
The only thing we know for sure is that certain stress marks are inherited epigenetically. Some genes get turned on/off, its way especially visible in populations that had suffered through war and famine. I would say arguing that is a sort of memory is certainly acceptable
LetsGoBrandon4256@reddit
I literally inherited half of my chromosomes from my mom. Why don't I remember how it feels like to have a cunt between my legs?
What a regarded anology.
GiveSparklyTwinkly@reddit
π€¦ββοΈ Genetic information absolutely does not contain memories. It contains information.
Dany0@reddit
By your definition memories... are not information? You should probably delete your comment my man I don't see how you can dig yourself out of this one
GiveSparklyTwinkly@reddit
What? Are all oranges also apples?
Dany0@reddit
I'm sorry I cannot even pretend to be stupid enough to get on your level
GiveSparklyTwinkly@reddit
Are you going to argue the point or just attempt to insult me?
MerePotato@reddit
Parameters are more akin to synapses than neurons so the worm comes out on top
LH-Tech_AI@reddit (OP)
Yeah!
gotfan86@reddit
Interesting project, could you show us some example outputs of what the models can do?
Kodix@reddit
Thankfully they do provide some on the model card.
Prompt: "The main concept of physics is "
Output: "The main concept of physics is `'animi-'hisi', and therefore the universe's own light. In this case, a theory that is not only used to explain what it can be called "the universe" or 'two planets, which are exactly about the earth's gravitational energy, but also in reality, we know how much things do. It will actually mean that the stars from the Earthβs orbit, as the galaxy, would say, they have to get into the planet. The same thing that has been discovered at all, there was nothing more than that of anthropological world than those who were now doing so. And if you don't think, why does this matter? It seems that I am"
Public-Thanks7567@reddit
gguf ?
SnooPaintings8639@reddit
Remember to don't go under Q4, it will lose some reasoning capability.
Kodix@reddit
You sure? I'm not really seeing a difference between UD-Q8_K_XXL and IQ1_XXS for this model, personally.
lordhiggsboson@reddit
Congrats! Excited to see more research into small model development. Do you have any details to share on the architecture you are using or any learnings that surfaced during the training/reserach? Would love to learn more about the techniques you employed
Dangerous_Try3619@reddit
Thank you! π
The model is currently a very small experimental transformer (\~2M parameters), focused mainly on testing language learning at tiny scale rather than instruction following.
Right now we're experimenting with:
One interesting lesson so far is how much coherent semantic structure can emerge even at extremely small parameter counts when the training pipeline is stable.
Still a lot to improve, but the goal is learning and iterating step by step π
lordhiggsboson@reddit
Interesting! Thanks for sharing
LH-Tech_AI@reddit (OP)
Thanks β€οΈ π
All code is always in the model repos!
lordhiggsboson@reddit
Awesome! I'll check it out
LH-Tech_AI@reddit (OP)
YEAH!!
More-Curious816@reddit
I would love to see detailed technical blogs from your lab in the future after each release. it would be cool as knowledge sharing and also you folk gain a research reputation which will attract investment or acquisitions from big tech.
JazZero@reddit
Suggestion:
Work on some NPU models on the AMD Rocm.
Very narrow Market right now but performance gain is huge.
LH-Tech_AI@reddit (OP)
cool!
FullOf_Bad_Ideas@reddit
Distilling a 4M model into 0.2M one is a pretty dope idea.
If you haven't looked into it, I'd suggest reading TIIUAE blog on how they made FalconTiny90M, it's super interesting
LH-Tech_AI@reddit (OP)
Thanks β€οΈ π
Silver-Champion-4846@reddit
Good hunting yall!
LH-Tech_AI@reddit (OP)
Thanks π