qwen 3.6 27B looping problem
Posted by jacek2023@reddit | LocalLLaMA | View on Reddit | 12 comments
Whenever I write here that I use gemma 31B I get answers that qwen 27B is better. I switched in the pi from gemma 31B Q5 to qwen 27B Q8 and generally I manage to code, document and run tests but somewhere after exceeding 100k context qwen keeps getting into loops. Do you have any solution for this?



I tried to break it and tell him to start over, try again, etc... but it keeps looping
my current command is:
CUDA_VISIBLE_DEVICES=0,1,2 llama-server -c 200000 -m /mnt/models2/Qwen/3.6/Qwen3.6-27B-UD-Q8_K_XL.gguf --host 0.0.0.0 --jinja -fa on --keep 4096 -b 8192 --spec-type ngram-mod --parallel 1 --ctx-checkpoints 24 --checkpoint-every-n-tokens 8192 --cache-ram 65536
computehungry@reddit
I went back to 3.5. I also accept that 65k, give or take, is the effective max context, and manage my use around that limitation.
jacek2023@reddit (OP)
on the model page I see: Context Length: 262,144 natively and extensible up to 1,010,000 tokens.
computehungry@reddit
Yeah, I'm saying the model gets too dumb at 65k so I just treat it as the max and make the workload smaller for each run. I run at Q4 though, it might be better for Q8.
jacek2023@reddit (OP)
I am observing various issues after 100k
Pablo_the_brave@reddit
You have no any sampler settings and are using default jinja template from the model. It's two red flag. Focus on it.
jacek2023@reddit (OP)
This is with the settings from upvoted comment
MrShrek69@reddit
I always find 56 - 65 is basically the max before most of them start failing tool calls or get lost in the context. That’s why management of context is important for programming. That’s still a shit ton of space to work with. Just use morr sessions and have the agent write to md files so it can pick up where it left off
WetSound@reddit
How are you reaching long contexts? I /new every new task and have no problems when that still gets me over 100k.
jacek2023@reddit (OP)
what are your tasks? I have lots of docs and code
fahrenhe1t@reddit
Try:
--repeat-penalty 1.1or--presence-penalty 0.5Test with either/or, not both at the same time. I added—repeat-penalty 1.1to my config and it helped significantly.LetsGoBrandon4256@reddit
Double check your sampler settings
https://huggingface.co/Qwen/Qwen3.6-27B
mister2d@reddit
Have you tried with preserve thinking on?
chat-template-kwargs = {"preserve_thinking": true}