Reverse Engineering o1 Architecture (With a little help from our friend Claude)

Posted by TechnoTherapist@reddit | LocalLLaMA | View on Reddit | 55 comments

I fed Claude with released information from OpenAI (system card, blog posts, tweets from Noam Brown and others) and online discussions (Reddit, YouTube videos) relating to the o1 model.

After a bit of back and forth, this is what it came up with as a potential high level architecture for the model:

The bit about large-scale CoT storage feeding into the RL environment is my own (somewhat cheeky) assumption: I think OpenAI will likely use the CoTs generated in the real world to further adjust RL-optimise the model.

Comments / thoughts / glaring mistakes/ potential improvements, all welcome!