Where my Gemma 4 gets this data? Trying to explain weird behaviour. Please help!

Posted by OwnTwist3325@reddit | LocalLLaMA | View on Reddit | 8 comments

So I was playing with Gemma 4 and was trying to figure out whether the model could determine its own training data cutoff period. Got some really interesting results but that is not the main point of this post, just context :-)

Turned out that with empty system message, the model thinks that its cutoff date is early 2024. If I will ask it to re-estimate based on latest events it can recall, it can actually find Jan 2025 as a cutoff. If I will ask it to quote the system message, it gets protective and refuses to show it.

Then I added "You are Gemma 4" in the system prompt. Suddenly, it could confidently state its cutoff date - Jan 2025. When asked where that comes from, states that it comes from system prompt. And it can quote it. A lot of it - on the screenshots. Response is stable, with no changes between differently worded requests and different sessions. So, not a hallucination (?). My issue is - I do not know where that comes from! Clearly not from the system prompt I provided. I tried "You are Gemma" - model did not go protective and quoted just that.

Also, with just "You are Gemma 4" in the system message, model felt... very different. Way more confident and... smarter.

I am running it as a single-file gguf model in LMStudio. There should not be any extra weird conditional configurations embeddable, right? What am I missing?

[-]

Stepfunction@reddit

This is just what was used during training as a system prompt most likely.

[-]

computehungry@reddit

A lot of the time, it knows it's Gemma when you don't give it any prompt, and seems to hallucinate some specific system prompt. I agree it must've been baked in somehow. Also interesting how it's very willing to change its name at the same time too.

[-]

OwnTwist3325@reddit (OP)

Yeah, it knows it`s Gemma even with no prompt. But for some reason it defaults to old Gemma 2 weights. It`s like there are two personalities baked into the model and not "mixed" fully during re-training. With the second triggered by mentioning "Gemma 4" in the prompt. The first (default) one is not even aware of the existence of Gemmas past 2.

[-]

Middle_Bullfrog_6173@reddit

They've probably used something like that as a system prompt during post training. It might not be exactly right, even if that's the case. If parts of training use no/different system prompts (so it learns instruction following) it may require the Gemma 4 hint to fall into that pattern.

[-]

OwnTwist3325@reddit (OP)

That might be it. Still not 100% sure though. If it is, then its a good find - model was performing significantly better with that line present.

[-]

OwnTwist3325@reddit (OP)

Or rather significantly worse without it, in a default configuration.

[-]

Danfhoto@reddit

I think you’re chasing ghosts. The model is simply outputting the prediction of the most probable response based on its vocabulary, training set, system prompt, and your message.

[-]

OwnTwist3325@reddit (OP)

I was thinking it is a kind of hallucination. It might still be. Though it is too stable, too specific and appears only in this case. Also, it provided it as requested, word by word. Every time.