What's the deal with Qwen3.5's and Gemma 4's reasoning traces?

Posted by mags0ft@reddit | LocalLLaMA | View on Reddit | 5 comments

Hey there,

I noticed something odd when trying out the latest and greatest local reasoning models recently. First, I just noticed it for Qwen3.5, but Gemma 4 seems to do it too:

The reasoning traces do that weird thing of starting with "Here is a detailed reasoning process for the problem: ..." or similar. Also, they seem to have began to suddenly include Markdown formatting and all the SOTA models apparently now like to write their reasoning as lists with bullet points?

What I don't get is why they are doing that. How does generating a few dozens of boilerplate tokens improve performance by any means? I am no hater of reasoning, and I don't think it's just "the model yapping around with no performance gain", but come on, I don't think it's necessary to spend time and electricity computing tokens for "Here is a reasoning process: ..." and hundreds of "**" tokens that aren't even going to get rendered.

It almost seems like they messed something up with synthetic data generation: Did they prompt their teacher models to "generate a reasoning process" for each sample and "forgot" to strip the preamble and Markdown formatting from the training data? I think that would be hilarious, but I genuinely cannot think of any other way why this might have happened. You could literally pre-fill the preamble in the reasoning?!

It may just be my personal preference, but I prefer densely packed, coherent reasoning text and models that don't spend time computing formatting tokens for an internal monologue that I am only rarely going to look at.

Any thoughts on this? Maybe there's a good reason for it, because many labs seem to be adopting this behavior.

Best greets :)