I'm training a 140M param LLM from scratch on a consumer AMD GPU — 100k steps in, here's what the loss curve looks like

Posted by CapSensitive5165@reddit | LocalLLaMA | View on Reddit | 6 comments

Hey r/LocalLLaMA, first post here.

I've been building a local AI from scratch for the past 4 days —

not a fine-tune, not a wrapper, training from zero on my own

consumer PC. Here's where I'm at.

The model

- Architecture: LEAPv2.1 (custom recurrent, not a transformer)

- Parameters: 140M

- Vocab: 16,000 tokens

- Context: 512 tokens

- Target RAM: <100MB at inference

The hardware

- Single AMD GPU, consumer PC

- Running via DirectML

- \~5,500 tok/s throughput

Training progress

- Dataset: \~1.27B tokens

- Steps: 101,000 / 200,000 (halfway)

- Best val loss: 3.2266 ★ (hit at step 98,000)

- ETA: \~163h remaining

The goal isn't to compete with 70B models. The goal is a brain

that lives on your machine, learns from you over time, and works

offline forever. No cloud, no subscription, no data leaving your PC.

Happy to answer any questions on the architecture, the DirectML

setup on AMD, or why I went with a recurrent design over a transformer.

[-]

Icy_Annual_9954@reddit

What can you do with this model? Is there a usecase where it would be outsmart the bigger ones?

[-]

CapSensitive5165@reddit (OP)

Not at general tasks — a 140M model won't beat GPT-4.

The use case is different: persistent local memory, privacy-first, offline operation.

Think less "better chatbot" and more "a brain that knows you specifically because it's been on your machine for a year".

[-]

Icy_Annual_9954@reddit

OK, I got it. Time beats the size at some point. So what advantage could be possible when the model get to know me?

I just collect some inspiration.

[-]

CapSensitive5165@reddit (OP)

A few concrete examples:

- It remembers how you think, not just what you said. After months of use it starts anticipating your reasoning style.

- It learns your vocabulary, your projects, your context. You stop explaining background every time.

- Two people using the same base model for a year end up with two completely different brains.

The advantage isn't raw intelligence — it's that it becomes specifically yours. That's something a cloud model resets every conversation.

And the privacy angle matters here too: everything stays on your machine. No conversation is sent to a server, no data is used to train someone else's model. What you tell it stays between you and your PC.

Forever.

[-]

Historical-Camera972@reddit

Let me know when you have it functional.

This sounds interesting, but I am going to need to see some usage.

The AI space is results based from the ground up.

We wouldn't be where we are, if transformers didn't give results.

So if your way has any type of modern edge to it, I expect to see something. Elsewise the space is crowded and a hundred guys a day come in here with pet projects that sound like the coolest thing ever... Right up until they have to show "something".

[-]

CapSensitive5165@reddit (OP)

Fair point — results are everything in this space.

I'm not claiming it'll outperform anything at 140M params.

The edge isn't raw performance, it's the use case:

a model that runs locally, learns from you over time, and never sends data anywhere. I'll share results as soon as inference is running.