I BUILT MY FIRST MODEL FROM SCRATCH
Posted by volious-ka@reddit | LocalLLaMA | View on Reddit | 30 comments
Sup, I'm Crownelius, I made that popular opus distill dataset.
TODAY YOU ARE INTRODUCED TO SHARD a 40m parameter mal-formed LLM.
Right now I'm working on a series of tiny LLM's, with a goal to run a coherent model for IoT tasks. I've researched atomic models, and while doing that I came across a project called Compact AI. Since joining them, I've learned a lot and even made my own model from scratch.
The model is available here: CompactAI-O[HF Organization]
About my model named "Shard"-I call it Scamp.
CelvestianNesy@reddit
YOU ARE INSANE!
YOU MUST BE ELIMINATED TO SUPPORT OUR CORPORATE OVERLORDS!
Jk, that's awesome sauce man!
CelvestianNesy@reddit
some people cannot take a joke, just trying to praise someone for their amazing work.
Chance-Device-9033@reddit
Nice, what’s the architecture like? Maybe there’s a write up somewhere but I don’t see it. Anything fancy?
xeeff@reddit
all I know about you are your cringe model names and CAPITAL AI GENERATED DESCRIPTIONS talking about the models like it's crack
FullOf_Bad_Ideas@reddit
It's just hobby open source AI research, I don't get where the hate is coming from.
xeeff@reddit
look at his replies to me lol that's where
FullOf_Bad_Ideas@reddit
I've seen them. I think that some people don't know how to react to hate and they crumble and spiral. That's what I see here and it's just a bit sad.
And regarding AI generated descriptions - some people who are technical are really bad at marketing so they let LLMs do their marketing and they might not know how to judge outputs well. I don't even see it here, his most popular model has perfectly good beginner friendly model card. It oversells the strength of distillation here, sure, but it's just a small free community project marketing itself, I think it's OK.
volious-ka@reddit (OP)
I actually have a pretty impressive 9b model. My cringe model names are your opinion.
xeeff@reddit
I expected you to at least tell me something cool but nop you're still known as that guy
volious-ka@reddit (OP)
My LLM can drive a drone. fk off.
xeeff@reddit
this is an even worse image. thank you, I will avoid you like the plague
Turbulent_Pin7635@reddit
You can avoid him, but can you avoid his kamikaze AI powered drones?
amitbahree@reddit
Very nice. Congrats. I had done something similar which was also inspired by this sub.
https://blog.desigeek.com/post/2025/09/building-llm-from-scratch-part1/
ReferenceOwn287@reddit
Thanks for documenting it, saving it for a later read.
ReferenceOwn287@reddit
It's interesting to see a project about building an LLM from scratch. Not clear of the practical benefits, but it must have been a good learning experience for sure. What hardware setup did you use and how many hours did you have to run it?
__JockY__@reddit
_raydeStar@reddit
What are some use cases here? Is there anything practical?
You say iot devices. I think that's really cool but... What's it solve?
volious-ka@reddit (OP)
If I changed the training, yes there are HUGE practical uses.
It could be used as a controller for an AI to operate an MRI, or a satellite.
Borkato@reddit
Hey I think what you’re doing is neat! Maybe just make it seem less groundbreaking until it’s ready? :p
No_Hunter_7786@reddit
Nice work building from scratch! 40M for IoT tasks is a smart direction, edge deployment needs models that actually fit on constrained hardware.
kyr0x0@reddit
Do you have the training pipeline code on GitHub?
FullOf_Bad_Ideas@reddit
What's the training batch size? I'm trying to understand how many tokens it has seen.
volious-ka@reddit (OP)
240m tokens.
FullOf_Bad_Ideas@reddit
Ok so it took about an hour, you gave room to make it more polished.
I've pre-trained 0.1-4B MoEs on 100M-80B tokens, a lot of it locally on a powerful rig, and I've seen gains quickly taper off. Models get kinda coherent quickly but to get to real subject understanding it takes about 10000x+ more compute. My best model still can't quite tell how to take care of a goat or where to find cows.
Falcon team really got hard into the science of pretraining tiny models, it's definitely something worth using if you're not after compute optimal models but rather tiny models - https://huggingface.co/spaces/tiiuae/tiny-h1-blogpost
wasnt_in_the_hot_tub@reddit
Cornelius sounds like a made up name, but that's pretty cool, Cornelius.
Silver-Champion-4846@reddit
Is that a meme reference?
Athabasco@reddit
Cool, but what is it for?
volious-ka@reddit (OP)
It's a project on how I teach myself how to build an LLM...
volious-ka@reddit (OP)
Our org has a discord dedicated to discussing small LLM's and how to make them.
https://discord.gg/XwQ9mZqruY
jkstaples@reddit
Very cool! I’m very interested in joining your discussion on discord. I’ve just put together a workstation cluster with an m1 ultra 128gb an 4 other Mac minis 16-48gb each m4 and m4 pro to handle smaller models and orchestrate agents and serve as my workstation. I’m currently reviewing a research paper a friend in finance sent me about a financial markets foundational predictions model built by a Chinese researcher, but we’re going to try to apply the process to US equities markets and potentially start building our own models to progress.. I will definitely be reviewing your work!! Thanks for sharing!