I BUILT MY FIRST MODEL FROM SCRATCH

Posted by volious-ka@reddit | LocalLLaMA | View on Reddit | 30 comments

Sup, I'm Crownelius, I made that popular opus distill dataset.

TODAY YOU ARE INTRODUCED TO SHARD a 40m parameter mal-formed LLM.

Right now I'm working on a series of tiny LLM's, with a goal to run a coherent model for IoT tasks. I've researched atomic models, and while doing that I came across a project called Compact AI. Since joining them, I've learned a lot and even made my own model from scratch.

The model is available here: CompactAI-O[HF Organization]

About my model named "Shard"-I call it Scamp.

[-]

CelvestianNesy@reddit

YOU ARE INSANE!
YOU MUST BE ELIMINATED TO SUPPORT OUR CORPORATE OVERLORDS!

Jk, that's awesome sauce man!

[-]

CelvestianNesy@reddit

some people cannot take a joke, just trying to praise someone for their amazing work.

[-]

Chance-Device-9033@reddit

Nice, what’s the architecture like? Maybe there’s a write up somewhere but I don’t see it. Anything fancy?

[-]

xeeff@reddit

all I know about you are your cringe model names and CAPITAL AI GENERATED DESCRIPTIONS talking about the models like it's crack

[-]

FullOf_Bad_Ideas@reddit

It's just hobby open source AI research, I don't get where the hate is coming from.

[-]

xeeff@reddit

look at his replies to me lol that's where

[-]

FullOf_Bad_Ideas@reddit

I've seen them. I think that some people don't know how to react to hate and they crumble and spiral. That's what I see here and it's just a bit sad.

And regarding AI generated descriptions - some people who are technical are really bad at marketing so they let LLMs do their marketing and they might not know how to judge outputs well. I don't even see it here, his most popular model has perfectly good beginner friendly model card. It oversells the strength of distillation here, sure, but it's just a small free community project marketing itself, I think it's OK.

[-]

volious-ka@reddit (OP)

I actually have a pretty impressive 9b model. My cringe model names are your opinion.

[-]

xeeff@reddit

I expected you to at least tell me something cool but nop you're still known as that guy

[-]

volious-ka@reddit (OP)

My LLM can drive a drone. fk off.

[-]

xeeff@reddit

this is an even worse image. thank you, I will avoid you like the plague

[-]

Turbulent_Pin7635@reddit

You can avoid him, but can you avoid his kamikaze AI powered drones?

[-]

amitbahree@reddit

Very nice. Congrats. I had done something similar which was also inspired by this sub.

https://blog.desigeek.com/post/2025/09/building-llm-from-scratch-part1/

[-]

ReferenceOwn287@reddit

Thanks for documenting it, saving it for a later read.

[-]

ReferenceOwn287@reddit

It's interesting to see a project about building an LLM from scratch. Not clear of the practical benefits, but it must have been a good learning experience for sure. What hardware setup did you use and how many hours did you have to run it?

[-]

JockY@reddit

[-]

_raydeStar@reddit

What are some use cases here? Is there anything practical?

You say iot devices. I think that's really cool but... What's it solve?

[-]

volious-ka@reddit (OP)

If I changed the training, yes there are HUGE practical uses.

It could be used as a controller for an AI to operate an MRI, or a satellite.

[-]

Borkato@reddit

Hey I think what you’re doing is neat! Maybe just make it seem less groundbreaking until it’s ready? :p

[-]

No_Hunter_7786@reddit

Nice work building from scratch! 40M for IoT tasks is a smart direction, edge deployment needs models that actually fit on constrained hardware.

[-]

kyr0x0@reddit

Do you have the training pipeline code on GitHub?

[-]

FullOf_Bad_Ideas@reddit

What's the training batch size? I'm trying to understand how many tokens it has seen.

[-]

volious-ka@reddit (OP)

240m tokens.

[-]

FullOf_Bad_Ideas@reddit

Ok so it took about an hour, you gave room to make it more polished.

I've pre-trained 0.1-4B MoEs on 100M-80B tokens, a lot of it locally on a powerful rig, and I've seen gains quickly taper off. Models get kinda coherent quickly but to get to real subject understanding it takes about 10000x+ more compute. My best model still can't quite tell how to take care of a goat or where to find cows.

Falcon team really got hard into the science of pretraining tiny models, it's definitely something worth using if you're not after compute optimal models but rather tiny models - https://huggingface.co/spaces/tiiuae/tiny-h1-blogpost

[-]

wasnt_in_the_hot_tub@reddit

Cornelius sounds like a made up name, but that's pretty cool, Cornelius.

[-]

Silver-Champion-4846@reddit

Is that a meme reference?

[-]

Athabasco@reddit

Cool, but what is it for?

[-]

volious-ka@reddit (OP)

It's a project on how I teach myself how to build an LLM...

[-]

volious-ka@reddit (OP)

Our org has a discord dedicated to discussing small LLM's and how to make them.

https://discord.gg/XwQ9mZqruY

[-]

jkstaples@reddit

Very cool! I’m very interested in joining your discussion on discord. I’ve just put together a workstation cluster with an m1 ultra 128gb an 4 other Mac minis 16-48gb each m4 and m4 pro to handle smaller models and orchestrate agents and serve as my workstation. I’m currently reviewing a research paper a friend in finance sent me about a financial markets foundational predictions model built by a Chinese researcher, but we’re going to try to apply the process to US equities markets and potentially start building our own models to progress.. I will definitely be reviewing your work!! Thanks for sharing!