Gemma 4 constantly repeating the same token

Posted by leorgain@reddit | LocalLLaMA | View on Reddit | 10 comments

I've been updating the nightlies of llama cpp as they've come out, but for the life of me I can't get gemma 4 31b to stop repeating the same tokens after a couple messages. It starts out fine but after the third or fourth reply it just repeats the the last two or three tokens it outputs. I've deactivated all samplers and then entered google's recommended settings (even tried turning on min-p but that didn't work either), re-downloading quants (bartowski's Q6_K_L), activating xtc, dry or them both at the same time.

Does anyone have any ideas as to what's going on?

Side note: I've noticed models like step 3.5 and gemma 4 having weird issues with of, either merging them with the last word or hyphenating them. That one is less annoying but if anyone has ideas on that too I'd appreciate it