Raw Chain-of-Thought from Gemini 3 Pro. It hallucinates, corrects itself, and eventually crashes.

Posted by Numerous-Campaign844@reddit | LocalLLaMA | View on Reddit | 9 comments

We know how Gemini Pro has the 'Thinking' block which shows "summary" of its reasoning process, but I somehow glitched it into outputting the raw internal monologue instead of the summary. It looks very similar to DeepSeek's R1

So it happned when I was testing Gemini 3 Pro on AI Studio with some heavy obfucsated JS. After it missed a hidden URL, I corrected it and asked why it failed.. That’s when it broke.

Instead of the usual 'Thinking' summary, it spit out its entire raw internal monologue reasoning that felt bizarrely human

My Theory:

I think I finally understand why gemini summarizes the "Thinking" block instead of showing it raw. It’s not just for a cleaner UI. I think they hide it because if the model gets "stuck" or enters a recursive loop, it looks absolutely unhinged. There might be a failsafe mechanism designed to 'reset' or sanitize the thought process when it enters a repetitive state like this, but I somehow bypassed it.

Full Chat URL

Honestly, the fact that it admitted 'I will accept the L' in its internal monologue is the most human thing I've seen from an AI

[-]

davikrehalt@reddit

I often see what looks like raw thinking traces after 100k context length. Sometimes with python access given it will write a python file with only comments and no code full of what looks like thinking trace. I think it's leaked quite easily by the model.

TheRealGentlefox@reddit

Whether or not it is a true CoT leak, Gemini 3 seems to do it a good amount.

Clear_Anything1232@reddit

What you saw is not raw thinking traces

It's just more hallucination

Numerous-Campaign844@reddit (OP)

Technically all tokens are hallucinations until they aren't. But this 'flavor' of hallucination (self-critique) is what we usually call reasoning. But either way, without seeing the weights/backends, we can't be 100% sure

In this case we can be 100% sure because google and all major providers replace the thinking traces with placeholder numbers. To avoid leakage.

During inference time the traces are replaced back to keep the kv cache valid.

True, but that assumes the model is actually following the formatting rules.

If it crashes or gets stuck in a loop, it likely messed up the start/end tags (like ). If those tags are missing or broken, the system doesn't know it's supposed to hide that text- it just lets it through as a normal answer. That’s basically what I think happened here.

Raw Chain-of-Thought from Gemini 3 Pro. It hallucinates, corrects itself, and eventually crashes.

My Theory:

davikrehalt@reddit

TheRealGentlefox@reddit

Clear_Anything1232@reddit

Numerous-Campaign844@reddit (OP)

Clear_Anything1232@reddit

Numerous-Campaign844@reddit (OP)

BobbyL2k@reddit

Frank_JWilson@reddit

kellencs@reddit