Trained a chess LLM locally that beats GPT-5 (technically)
Posted by KingGongzilla@reddit | LocalLLaMA | View on Reddit | 35 comments
Hi everyone,
Over the past week I worked on a project training an LLM from scratch to play chess. The result is a language model that can play chess and generates legal moves almost 100% of the time completing about 96% of games without any illegal moves. For comparison, GPT-5 produces illegal moves in every game I tested, usually within 6-10 moves.
I’ve trained two versions so far:
- https://huggingface.co/daavidhauser/chess-bot-3000-100m
- https://huggingface.co/daavidhauser/chess-bot-3000-250m
The models can occasionally beat Stockfish at ELO levels between 1500-2500, though I’m still running more evaluations and will update the results as I go.
If you want to try training yourself or build on it this is the Github repo for training: https://github.com/kinggongzilla/chess-bot-3000
vRAM requirements for training locally are \~12GB and \~22GB for the 100m and 250m modle respectively. So this can definitely be done on an RTX 3090 or similar.
Full disclosure: the only reason it “beats” GPT-5 is because GPT-5 keeps making illegal moves. Still, it’s been a fun experiment in training a specialized LLM locally, and there are definitely a lot of things one could do to improve the model further. Better data curation etc etc..
Let me know if you try it out or have any feedback!
oooofukkkk@reddit
Very cool. My dream is an LLM chess coach to explain the ideas behind move recommendations at a deep level.
KingGongzilla@reddit (OP)
same, that would be really cool. this isn't quite what this is though.
If I'm not mistaken, I did see some datasets on HF though that provide explanations for chess positions. Could be interesting to try something there
xatey93152@reddit
Even child can beat gpt5 in chess. It's not apple to apple comparison. It's like comparing car built specially for sport and car specially for logistics
KingGongzilla@reddit (OP)
fair, but i think it does show that how small specialized models can beat very large general models at some tasks
the_ai_wizard@reddit
known for a long time
Relevant-Yak-9657@reddit
Idk why you were downvoted, when you are correct. Narrower AI have mostly been better at the specific task they have been trained at.
egomarker@reddit
He's downvoted because most have no idea Leela Chess Zero exists for years.
Relevant-Yak-9657@reddit
Damn.
KingGongzilla@reddit (OP)
true, I wasn't claiming to have discovered or done something novel
Ok_Cow1976@reddit
I guess large general models like gpt5 are more trained on science and some other areas. Small models can never beat large models on science I think.
pier4r@reddit
info: an ad-hoc transformer model exists, it is called leela zero chess (fixed to 1 node search, hence using only the policy network). It is quite good last time I checked.
One source here
Further you can (a) hook it up as lichess bot (if you want) here and/or (b) test it against models with a good support and parsing layer here
egomarker@reddit
"Quite good" is actually world top #1-#2 chess engine for several years, way surpassing human ability to play chess. )
KingGongzilla@reddit (OP)
ah cool thanks for the info about hooking it up to lichess.
Yeah i guess you can get much better results with self play and RL compared to a purely supervised setting.
pier4r@reddit
btw, great project. One has to start somewhere and for learning it is great, despite what already exists.
pier4r@reddit
purely supervised was explored too IIRC, I think the chess engine was called DeusEx : https://www.chessprogramming.org/Deus_X
ItilityMSP@reddit
Check out this project, if you incorporate this type of learning memory system you will get much better results in theory. ACE memory try it out, and you will be on the cutting edge of agentic AI.
https://arxiv.org/abs/2510.04618
iliasreddit@reddit
Cool! Did you train the model from scratch or further trained from some public checkpoint?
KingGongzilla@reddit (OP)
this is from scratch!
StardockEngineer@reddit
Thanks for the cool project. People seem to forget the learning opportunities to be had with these small, cool projects.
Anyone who thinks you thought this was the new Deep Mind is out of their mind.
I’ve been toying in the is space from time to time myself! It’s just a fun thing to do.
KingGongzilla@reddit (OP)
haha exactly :)
RickyRickC137@reddit
Dude this is so freaking awesome! Ignore the negative comments, for real! You know, there are neural networks to play chess! Like Leela chess. So I think you don't wanna compete with it. But those neural networks can't speak! That's where your work can shine! Especially since it's not making illegal moves. Make a LLM that can analyze the evaluation of Stockfish, and it talk about the plans!
RickyRickC137@reddit
Dude this is so freaking awesome! Ignore the negative comments, for real! You know, there are neural networks to play chess! Like Leela chess. So I think you don't wanna compete with it. But those neural networks can't speak! That's where your work can shine! Especially since it's not making illegal moves. Make a LLM that can analyze the evaluation of Stockfish, and it talk about the plans!
Ardalok@reddit
I have a question: wouldn't it be better to send the entire board to the LLM each time instead of just one move? I think it should get confused less, and there'd be no need to store context.
KaroYadgar@reddit
Isn't that just what happens? LLM context works similarly.
Ardalok@reddit
In theory, it shouldn't know anyway, but you can store moves even like that if you want, although the context will fill up faster.
KaroYadgar@reddit
It probably wouldn't, but it would be able to more easily guess what the strategy in its previous moves were. With the board only, it has to guess the previous strategy only from the current present state.
Ardalok@reddit
If all the boards are saved in context, then why not?
KaroYadgar@reddit
Every board? I was under the assumption that only the current board would be sent with no saved context.
Ardalok@reddit
Well, yes, but I later suggested changing it if necessary.
Everlier@reddit
I'm not sure why other comments are like that... OP, what you built is seriously cool, you shoukd be proud!
I think, it's similar to action models in a sense, but with much better outlined reward. One other under-explored area for SLMs currently is to use a smaller model like this one to steer a larger more expensive one towards more shallow/deeper reasoning and/or response format to achieve better completion rate.
egomarker@reddit
https://lczero.org/
Now beat this
Ok-Adhesiveness-4141@reddit
Good work. Now, get it to beat GPT-5 in coding & math. Am not joking, that's super useful.
Illya___@reddit
Hmm, cool experiment ig But even tho I hate gpt 5, it's severely underperforming in your tests. You should probably tune the parameters a bit to be more fair. Gpt 5 can actually play legal moves for some time, from what I saw. Tho I saw it playing mainly main line openings so perhaps it breaks when the opponent doesn't play into the opening.
JollyJoker3@reddit
Pretty cool experiment! Can you set exact ELOs for Stockfish so you can set something up to measure the exact ELO of your model? I assume a single game is pretty fast.
KingGongzilla@reddit (OP)
Yeah exactly you can set the ELO level of stockfish. I have evals running right now and will update once i have some numbers.
Moreover during training included ELO level tokens for the individual chess games. This means you should be able to also control the ELO level that the model is playing at during inference. However I still need to evaluate how much this affects the models play in practice!