I'm training a 140M param LLM from scratch on a consumer AMD GPU — 100k steps in, here's what the loss curve looks like

Posted by CapSensitive5165@reddit | LocalLLaMA | View on Reddit | 6 comments

Hey r/LocalLLaMA, first post here.

I've been building a local AI from scratch for the past 4 days —

not a fine-tune, not a wrapper, training from zero on my own

consumer PC. Here's where I'm at.

The model

- Architecture: LEAPv2.1 (custom recurrent, not a transformer)

- Parameters: 140M

- Vocab: 16,000 tokens

- Context: 512 tokens

- Target RAM: <100MB at inference

The hardware

- Single AMD GPU, consumer PC

- Running via DirectML

- \~5,500 tok/s throughput

Training progress

- Dataset: \~1.27B tokens

- Steps: 101,000 / 200,000 (halfway)

- Best val loss: 3.2266 ★ (hit at step 98,000)

- ETA: \~163h remaining

The goal isn't to compete with 70B models. The goal is a brain

that lives on your machine, learns from you over time, and works

offline forever. No cloud, no subscription, no data leaving your PC.

Happy to answer any questions on the architecture, the DirectML

setup on AMD, or why I went with a recurrent design over a transformer.