Note the new recommended sampling parameters for Qwen3.6 27B
Posted by Thrumpwart@reddit | LocalLLaMA | View on Reddit | 23 comments
Taken from their Huggingface Page:
We recommend using the following set of sampling parameters for generation
Thinking mode for general tasks: temperature=1.0, top_p=0.95, top_k=20, min_p=0.0, presence_penalty=0.0, repetition_penalty=1.0
Thinking mode for precise coding tasks (e.g. WebDev): temperature=0.6, top_p=0.95, top_k=20, min_p=0.0, presence_penalty=0.0, repetition_penalty=1.0
Instruct (or non-thinking) mode: temperature=0.7, top_p=0.80, top_k=20, min_p=0.0, presence_penalty=1.5, repetition_penalty=1.0
These are different from 3.5 so I thought I would draw your attention to them.
GregoryfromtheHood@reddit
Very glad they're recommending 0.0 presence penalty now for thinking. The old 1.5 and even 1.1 was giving me so many issues.
Caffdy@reddit
what is the difference thou? what does presence penalty do in the first place?
GregoryfromtheHood@reddit
It's supposed to punish repetition, so it's supposed to help with the looping. But I guess because the model wants to repeat some tokens, when it can't it goes into a loop. That's my guess anyway.
david-deeeds@reddit
isn't "repetition penalty" the one that punishes repetition?
Shadowfita@reddit
They both do. Two different parameters with the same-ish outcome, that target different areas of the model is my understanding.
HiddenoO@reddit
Unless something has recently changed, they're the same, just that presence penalty is a boolean ("Does the token already exist? If yes, apply penalty x."), whereas repetition penalty is numerical ("Count how often the token exists already, apply penalty x for each existence").
Shadowfita@reddit
Ahhh! That would make sense. Thanks for that distinction.
david-deeeds@reddit
Thanks!
LeonidasTMT@reddit
For the Moe model I had to turn up presence penalty because otherwise it would go into loops
Either the same line repeating except for a final word. Or a larger loop of repeating logic.
Shadowfita@reddit
Yes me too. I was finding that if I was giving an agent a task and provided it with an ID, for example, it would seemingly exhaust it's "limit" of repetition for the actual value I wanted it to include in the final output and would change it slightly making it incorrect.
kroggens@reddit
why not temperature==0.0 for coding?
DefNattyBoii@reddit
so you can reroll the dice on a shit diff, vibecodeing goes brr
Evening_Ad6637@reddit
I think the recommended params are not very good. I’ve tested around and found these params better:
HiddenoO@reddit
Some of these frankly make little sense. E.g., the presence penalty becomes fairly pointless if the repeat-penalty is ten times as high since the latter also applies on the first presence.
How did you obtain these?
jwpbe@reddit
the combination of top k and a min p of 20% is insane
llitz@reddit
Interesting boost, I have been tweaking slightly as well and went into a similar direction, but I am still bothered my the behavior. I guess I will eventually arrive at your parameters.
LinkSea8324@reddit
You might want to add the preserve_thinking param
Dany0@reddit
yes u/evening_ad6637 please retest with thinking preserved
kaisurniwurer@reddit
There are likely sampling issues in llama.cpp.
Changing temperature to an extreme value, and your output will stay the same. It's likely not "Qwen" or "new models" problem since I checked the same with mistral small with the same result.
Safe-Thanks-4242@reddit
Same as unsloth already share I think 🤔
LinkSea8324@reddit
Agentic codings counts as "precise coding tasks", right ?
FinBenton@reddit
That is exactly the same for coding as the old model.
Ok-Measurement-1575@reddit
Look identical to me? Unless you mean the repeat stuff?
I deleted that and noticed no ill effects tbh.