I got a real transformer language model running locally on a stock Game Boy Color!
Posted by maddiedreese@reddit | LocalLLaMA | View on Reddit | 52 comments
No phone, PC, Wi-Fi, link cable, or cloud inference.
• The cartridge boots a ROM, and the GBC runs the model itself.
• The model is Andrej Karpathy’s TinyStories-260K, converted to INT8 weights with fixed-point math so it can run without floating point.
• Built with GBDK-2020 as an MBC5 Game Boy ROM.
• The model weights live in bank-switched cartridge ROM. Prompt entry happens on-device with the D-pad/buttons and an on-screen keyboard.
• The prompt is tokenized on the Game Boy, then the ROM runs transformer prefill + autoregressive generation. The KV cache is stored in cartridge SRAM, because the GBC’s work RAM is tiny.
It is extremely slow, and the output is gibberish because the math is heavily quantized/approximated, but the core thing works!
Hardware: stock Game Boy Color + EZ Flash Junior + microSD.
Used Codex for a large portion of the building!
https://github.com/maddiedreese/gbc-transformer
WeatherD00d@reddit
Very creative project, super cool!
idongesit1999@reddit
This is genuinely impressive edge-case engineering, INT8 quantization with fixed-point math on hardware that predates the transformer paper by decades.
What fascinates me is the inference-at-the-edge pattern this demonstrates.
At Yellow Network we're building settlement infrastructure for Al agents, and the constraint you've solved here (running models on minimal hardware without external dependencies) maps directly to how we think about trust.
Just a cryptographic settlement that doesn't rely on centralized infrastructure.
zippyfan@reddit
How are your guys even running these projects? I though we needed CUDA, ROCM or other mature compilers to run llms. You guys are running llms on the equivalent of a potato.
I'm curious to know if it will be easy to run llms on Chinese GPUs once they come here even if we get no manufacturer support whatsoever.
algebra_dragon@reddit
If you want LLMs with lots of parameters, decent training and token generation speeds, and coherent results, then yeah, you'll need a suitable GPU, CUDA/ROCm support, etc. You're not going to use a language model on a Game Boy to generate code or write an email.
But for small proofs of concept, building a model doesn't take a whole lot. Andrej Karpathy wrote microgpt in 200 lines of Python without resorting to NumPy, PyTorch, or other libraries. And getting these algorithms to work with very limited resources is a good exercise in understanding what the essentials are. As OP noted, it's slow, and you're not going to get useful results compared to nontrivial models on better hardware. But it's a fun idea all the same, and I'm here for it.
s101c@reddit
Also you can do pure CPU inference with llama.cpp as well, no GPU needed. Some CPU & RAM combos can be faster than expected.
Technical-Earth-3254@reddit
This makes me wanna run a model on my N64. Love the project!
Operation_Neither@reddit
But it has to be a 64 bit quantized model. That’s the law.
FatheredPuma81@reddit
What about 0.64 bit?
addandsubtract@reddit
Don't let your memes be dreams.
KalonLabs@reddit
But can it run doom?
JayPSec@reddit
Beat me to it. I was gonna go "Yes... But can it run crysis?"
RogerRamjet999@reddit
Rumor has it that it can generate doom game-play screens on the fly.
aanzeijar@reddit
If you're already abusing the SRAM, would it be cheating to implement the flotaing point arithmetic as giant ROM lookups?
MindPsychological140@reddit
KV cache in cartridge SRAM is the move I wouldn't have thought of.
Tokens/sec ballpark? And is the matmul or the bank-switching dominating cycles?
AccomplishedFix3476@reddit
tried karpathys nanogpt on a raspberry pi pico last year and the int8 quant kept exploding on me past 200k params, the gbc surviving 260k is what im stuck on tbh. ram budget for prompt encoding when ur memory is counted in kb is where most of these constrained projects die 👀
Imn1che@reddit
How many tokens/s?
simplearms@reddit
We’re really measuring things in seconds per token at this point.
Imn1che@reddit
Oh yeah shit my bad
Hey OP how many minutes per token /s
maddiedreese@reddit (OP)
Didn’t officially measure, but working backwards it looks like around 0.0059 tokens per second, or 1 token every approx. 2 minutes and 49 seconds. Really slow!
WhyYouLetRomneyWin@reddit
Really cool project!
There is a project to get LLMs on commodore 64: https://github.com/ytmytm/llama2.c64 which seems to somewhat work (not gibberish, but very much a toy). I don't know the relative power of gameboy vs commodore 64.
jwpbe@reddit
Gameboy uses a Sharp SM83 running at a max of 8 mhz, the c64 uses a 6501 mpu that was about 1 mhz
NigaTroubles@reddit
Wow just wow
Thats amazing
CockBrother@reddit
This is one of those projects that makes me sad about all of the locked-in platforms that could have seen new software and uses for many years after companies released their 'next big thing' and abandoned the old.
People are still finding ways of writing even Atari 2600 games that are 10x better than what was released when the platform was released. And other early computers as well.
1001000010000100100@reddit
My buddy wrote a game for Vectrex called Vecribbon and it’s amazing
I_HAVE_THE_DOCUMENTS@reddit
I've been following Kaze Emulator and his work with N64 and it's amazing and inspiring how far old hardware can be pushed with deep understanding and clever tricks to take full advantage of the system.
It makes me wonder what people in 30 years will be able to do with the hardware of today.
ThisWillPass@reddit
Agi on a 3060
IrisColt@reddit
All for the enjoyment of a select and shrinking circle of die-hards, sigh...
useresuse@reddit
super smash bros melee
Kingchandelear@reddit
And where are those Atari people hanging out?
ShutUpAndDoTheLift@reddit
Yeah. You let me know if you find out. So uh I can avoid that place.
AppealSame4367@reddit
Thank you for trying this.
I dreamed about neural networks running on the hardware we had in the early 2000s. I get that we wouldn't have had the hardware to train anything fast enough, but we would have already had enough for some inference on our computers. I know models were trained back then, but we lacked a lot of speed and software tech that is available now.
Kerem-6030@reddit
dayum
WithoutReason1729@reddit
Your post is getting popular and we just featured it on our Discord! Come check it out!
You've also been given a special flair for your contribution. We appreciate your post!
I am a bot and this action was performed automatically.
Thistleknot@reddit
I used to do stuff like this just to figure out some technological process
I put Linux on my ps3
But why
Just for the bragging rights?
__JockY__@reddit
Hackers gonna hack. A life without obsession is a tragedy.
jeffzyxx@reddit
Presumably, you’d learn a lot about how transformer based language models work doing something like this. Constraints breed creativity, after all.
That, and nerd bragging rights. (I’m guilty of this too!)
__JockY__@reddit
Hackers gonna hack. A life without obsession is a tragedy.
FourSquash@reddit
Kinda confused here. Writing software for a new set of constraints isn't as easy as following a tutorial to install Linux on the PS3.
Also Linux on the PS3 (at least when they allowed it) was, in fact, very useful. The government used a whole supercomputer made of them to do weather predictions. The cell processor was ahead of its time.
SuperWallabies@reddit
1990: What game machine will we have in future!
2026:
Thedudely1@reddit
No fucking way
Thebandroid@reddit
Great.
Now the price of Game Boy Colours is going to skyrocket.
Is there nothing AI won’t take from us?!?
Darth_Proton@reddit
next step would be ai-generated games on it!
brwinfart@reddit
This shit is insane.
I want a GameBoy with AI.
minedroid1@reddit
Wow, nice work! Glad to see that old tech still gets used for cool things like this.
different_tom@reddit
But... Why?
jmprog@reddit
Incredible! I wonder what would need to be done to get it to output readable text
jesusonoro@reddit
The KV cache in cartridge SRAM is clever but I'm curious how you're handling the bandwidth bottleneck there. SRAM access on GBC is already slow and you're doing it every token generation. Did you end up chunking the cache reads or just eating the latency since generation is already glacial from the fixed point ops?
ed0c@reddit
Pointless. Therefore, indispensable.
Inevitable_Emu2722@reddit
That's crazy! Love it
mystery_biscotti@reddit
Okay, this is cool.
Kahvana@reddit
Extremely impressive, well done!
VagabondTruffle@reddit
BASED BASED BASED
I did https://code.heni.lol/heni/gbalm once as a joke aha so happy to see this!!!!!!!!!