GPT 5.5 just leaked its chain of thought to me in codex, and it looks like an idea from 5 months ago in this sub.
Posted by Homeschooled316@reddit | LocalLLaMA | View on Reddit | 59 comments
https://www.reddit.com/r/LocalLLaMA/comments/1p0lnlo/make_your_ai_talk_like_a_caveman_and_decrease/
In the middle of a project I'm working on, I got this output:
Implemented the narrower fix in Homm3ImportUnitPreviewModelHook.cs? Need absolute path. Need know cwd absolute. v:... Use markdown. final with path. Need avoid bogus path. Use Homm3ImportUnitPreviewModelHook.cs? Format requires /abs/path. Windows abs maybe v:.... Use angle. Final no too long. Need include uncommitted. Proceed.
jakegh@reddit
Interesting to see behind the curtain; this is why GPT-5.5 uses less tokens, it's RL'd to be extremely terse in its CoT, almost "caveman mode", avoiding unnecessary verbage.
psycoee@reddit
It's a pretty obvious idea, especially since the attention mechanism generally gives a lot more weight to the stuff in the last few thousand tokens. So you want to fit as much in there as possible and caveman mode is the way to do it. Many small models perform much better if you make them use caveman speak.
jakegh@reddit
Yeah. There are all sorts of alignment issues with RL on CoT such that it’s actually been called “the forbidden technique”. It is EXTREMELY dangerous.
But maybe if you reinforce on CoT length, rather than content, those don’t apply. Still makes me nervous and I’d like to see research on it.
jakegh@reddit
After looking it up a bit, there are papers on making CoT terse in what they describe as a safe way avoiding the forbidden technique via adaptive reasoning budgets in training or reinforcement but I didn’t find any studies looking at the resulting faithfulness of that terse CoT.
Which makes me kinda nervous.
psycoee@reddit
I mean you have to be able to control the CoT process in any case. Existing models are generally trained to think in a particular language, limit the length of the thought process to something reasonable, follow instructions, etc. Simply scoring shorter thoughts higher is not really training on CoT, and to the extent the model can game it, it is what you want (the same quality output with fewer thinking tokens).
jakegh@reddit
Yes that’s how the papers describe it, either a hard or adaptive thinking budget. But none of them evaluated CoT faithfulness after doing so.
TheRealMasonMac@reddit
https://huggingface.co/datasets/microsoft/OpenMementos is an interesting alternative
damhack@reddit
The CoT traces output in the context are not the actual reasining traces used, they’re just what the LLM thinks you want to see. Therefore you cannot draw any conclusions from them.
Demonstrated by Anthropic’s research here:
https://assets.anthropic.com/m/71876fabef0f0ed4/original/reasoning_models_paper.pdf
ddavidovic@reddit
Yeah, you can observe this very clearly when reading the gpt-oss-120b chains of thought. It presumably used a similar training regime.
Toastti@reddit
That's not 5.5 directly. It's chain of thought is passed through another small LLM before. They almost certainly want to save tokens here so instruct the smaller LLM to be as consice as possible
jakegh@reddit
What do you base that on? Why would the big model spend even more tokens to shorten what it passes to the summary model? That makes no sense. Logically, this is more likely to be the actual CoT.
StupidScaredSquirrel@reddit
No, because CoT is an asset that you want to keep secret to prevent distillation. Passing it through a 2b model to summarise and obfuscate the CoT is dirt cheap and keeps your moat intact.
jakegh@reddit
Obviously you want to keep it secret. The OP is saying it leaked
4whatreason@reddit
Yep agreed, summarized cot does not replace the actual cot, that is passed back and forth encrypted to be secret.
Can't remember exactly why but I think they sometimes require reasoning to not be modified/dropped in certain situations too
dataexception@reddit
I think you're referring to speculative decoding. A prompt is filtered through a smaller model for preprocessing.
Clueless_Nooblet@reddit
I don't know, but I believe distilling "Grmny only ctry wo tmplmt Atbn" after language is stable.
MisticRain69@reddit
I kinda think this is what they do with claude, claude has the typical qwen/deepseek "actually I think its x...WAIT it might actually be y.." COT
ayylmaonade@reddit
You aren't understanding what the user is saying. Yes, CoT is summarized by smaller models, but only externally, and it is not at all uncommon for models to accidentally leak their "true" chain of thought, especially if they mess up closing the think tag. Not to mention that Gemini regularly leaks its actual CoT all the time, even to this day. And what do you know? It's identical to the CoT Gemma 4 has. The sort of cave-man like speak here reminds me of GPT-OSS, its quite similar. I can totally see this being the actual CoT.
The idea you're proposing literally costs more compute because you'd have to filter every single token from the model CoT, summarize it, then feed it back to itself instead of doing a single pass on the final CoT. It'd also degrade the hell out of performance.
holy_macanoli@reddit
There’s a subagent that gets called to create the title of each thread in codex so why not this?
jakegh@reddit
You're saying the main model has its real CoT, which it then passes to a small model to turn into a caveman CoT, just so that caveman CoT can be summarized by another small model. This does not make any sense.
alwaysbeblepping@reddit
Not the same person, but you're making quite a few assumptions here. If you're seeing the CoT and you're not supposed to, that means something went wrong and there are a number of possibilities:
Not an exhaustive list, there are probably other plausible possibilities.
jakegh@reddit
Those are all possible, but seem less likely than the simple explanation.
alwaysbeblepping@reddit
How so? I mean, in a vacuum maybe but this is a system designed to hide the original CoT from the user. If someone is seeing it, there's a reason, right? It's not everyone that's seeing it, just an isolated case with a specific user so it stands to reason that something unusual is happening.
Occam's Razor is a useful reasoning technique but it doesn't mean we should ignore the context that things happen in when we try to come up with an explanation.
jakegh@reddit
I agreed those scenarios were also possible. They just seem less likely, and we don't have the info to evaluate further.
Howdareme9@reddit
Anthropic does the same, Haiku summarizes it
jakegh@reddit
No kidding. The question is whether the real CoT would be cut-down with a second small model just for the summary model. And the answer is no.
IrisColt@reddit
It actually works. Gemma 4 handles problematic prompts much better when you really compact them first.
Jolly-Rip5973@reddit
yes they hide the chain of thought from you because it's usually completely convoluted. It's amazing anything correct ever happens because of how gimpy the chain of thought actually is when you read it.
cantgetthistowork@reddit
That's how I talk to people who I think are stupid
AdventurousVast6510@reddit
me too. i talk like this to people who dont speak english very well
florinandrei@reddit
SAVE MORE TOKENS is the prime directive now for all these companies, because of the huge demand.
Ha_Deal_5079@reddit
the compressed CoT thing is wild. wonder if theyre using a distilled model to condense the reasoning or just aggressively prompting for token efficiency
ayylmaonade@reddit
I'm pretty sure that's just OpenAI's reasoning. Go try GPT-OSS for example, it's very "cave-man" like in its way of deliberating to itself. It's probably how they're able to make the claims that 5.5 uses significantly less tokens.
TheRealMasonMac@reddit
Deepseek is also similar, though not as compressed.
joshualander@reddit
This isn’t new. If you’re using GPT in Hermes Agent and you interrupt a complex task with a simpler one, you’ll get caveman CoT 😊
eworker8888@reddit
Agents are often multiple LLMs, one that excels in summarization, another one that excels in code, and so on. It can be the same LLM, or different quantization’s of the same, or totally different llms
In our Agent (not codex), you can enable extensive logging, and look at the entire chain (the entire process)
You can always proxy any agent through tools and capture the entire communication and review it.
danielv123@reddit
For your last sentence, well no, you can't. OpenAI does the summarization on their server side, and you don't have access to capture the original tokens there.
eworker8888@reddit
in E-Worker everything is on the client side
TheThoccnessMonster@reddit
We’re not talking about your project dude
the_renaissance_jack@reddit
But they were dude.
brahh85@reddit
thats like speculative prefill https://www.reddit.com/r/LocalLLaMA/comments/1t0vp3w/pflash_10x_prefill_speedup_over_llamacpp_at_128k/ but being speculative prefill more fancy
RevolutionaryLime758@reddit
No it’s not dumbass
cpldcpu@reddit
the CoT of gpt-oss also looks like that. It seems that openai always trained for brief language in the CoT.
See also examples here: https://nousresearch.com/measuring-thinking-efficiency-in-reasoning-models-the-missing-benchmark
No_Hunter_7786@reddit
Interesting catch. Chain of thought leaking through is always funny to see. Caveman prompting actually making it into production GPT is kind of wild if true
ResidentPositive4122@reddit
That has nothing to do with caveman prompting. oAI have been training their models for succint CoT for a while now. Even gpt-oss shows this. Compare its traces with any of the open models and you'll see these patterns.
oss: do math. try this. no. ok now try this. might work. need python.
qwen: so the user is asking us to consider doing some math on this project that reminds me of a time where i was learning math doing the theorem discovery with my uncle on a shiny hilltop. I will try to... no, wait what if... but is this correct? I need to... Wait, now I see it, it wasn't a sunny hilltop it was rainy and I was cold. Wait, no, I was hot.
yopla@reddit
Careful about hilltops, goblins nearby.
Normal-Ad-7114@reddit
wren6991@reddit
the answer is obviously this. generate output. proceed. ✅ but wait, now I'm realising that...
Durian881@reddit
Seemed like instructions for subagents which are provided with specific context to work on.
jakegh@reddit
That's possible too, yeah.
HenkPoley@reddit
Yes, you are not the first to find this. Apparently GPT 5.4 already did this as well.
holy_macanoli@reddit
There’s also at least one issue filed in codex repo
HenkPoley@reddit
Uhm, yes? 😅
robberviet@reddit
TIL I thought to llm in caveman language.
OriginalTerran@reddit
I've seen Gemini pro 3 did the same in antigravity agent as well. It's reasoning leaked when responding to me.
eworker8888@reddit
Also maybe just a defect in the tools, when the LLM sends reasoning back it indicates it is reasoning, if the app has an issue it may show it in the wrong place.
cleversmoke@reddit
Interesting! Should I start writing my prompts like that? Maybe that caveman speak to AI meme was true after all!
Miriel_z@reddit
I am not surprised. Reddit is a free great training base. Need to spend less time debugging with these tools. So my efforts will not pop-up somewhere else, or should I deliberately create dumb code?
marcoc2@reddit
I think it has nothing to do with training data