qwen3.6 27b poor experience

Posted by pppreddit@reddit | LocalLLaMA | View on Reddit | 23 comments

Seeing how people praise it, I tried giving it implementation plan that Sonnet generated, but qwen keeps breaking files and goes in circles:

Thinking…

The file got corrupted from multiple overlapping edits. Let me just rewrite the whole file cleanly.

⏺ The file got corrupted from multiple overlapping edits. Let me rewrite it cleanly.

Anyone else experienced this? The task was simple swift class refactoring, one file. Qwen invents python scripts to replace text instead of using Claude's built-in tools, breaks stuff, duplicates on retry and goes in circles. To me this seems pretty much unusable. Maybe I need a different harness, as I use it in Claude Code via omlx.

[-]

LetsGoBrandon4256@reddit

We recommend using the following set of sampling parameters for generation

Thinking mode for general tasks: temperature=1.0, top_p=0.95, top_k=20, min_p=0.0, presence_penalty=0.0, repetition_penalty=1.0

Thinking mode for precise coding tasks (e.g. WebDev): temperature=0.6, top_p=0.95, top_k=20, min_p=0.0, presence_penalty=0.0, repetition_penalty=1.0

Instruct (or non-thinking) mode: temperature=0.7, top_p=0.80, top_k=20, min_p=0.0, presence_penalty=1.5, repetition_penalty=1.0

Make sure you get the sampler params right.

[-]

IrisColt@reddit

I am following the parameters, (sadly, with a Q4_K_M quant, Q4_0 KV, 256k context, and presence_penalty=<0.75 is simply not enough... I am experiencing gigantic loops, abrupt ending in the middle of the thought process, you name it. Qwen 3.5 27B same quant and KV, with presence_penalty = 1.5 wasn't so troublesome.

[-]

pppreddit@reddit (OP)

tbh, I didn't configure any parameters, I'm using Qwen3.6-27B-bf16 via omlx

[-]

Ok_Helicopter_2294@reddit

At the very least, please try referring to the well-written Unsloth documentation 🙂
People like you have helped energize open source through the contributions of many dedicated contributors, even going so far as to write guides. Yet some don’t even try to understand them, nor do they make the minimal effort to ask other AIs for clarification.

[-]

Finanzamt_Endgegner@reddit

Well there your issue

[-]

tomByrer@reddit

> one file

Seems 27B has issues with large files; can you split it up?
https://youtu.be/N5eEqJVTfVI

[-]

audioen@reddit

My guess is that you have either quantized the model or its KV cache to hell, or have bad sampling parameters. I have reasonable experience with Qwen/Qwen3.6-27B-FP8, which I am executing via vllm. Even when the model is clearly capable of useful work (although somewhat slowly in my case), there is no doubt in my mind that it's already been damaged because the real model is BF16 and FP8, even if official version, must be a severe approximation.

The vllm recipe I used even quantized the KV cache to fp8, but that I did have to take away immediately, as it was obvious to me from the reasoning traces that the model was seriously confused about who had said what and when, which told me that the attention wasn't working properly anymore.

[-]

nunodonato@reddit

Hi there!
I was using Qwen3.5-27B-FP8 before, and now moved to 3.6 also FP8 but seem to be getting more failures in tools. Do you think there is a significant difference in using the F16 version? It's already a slow model so I was trying to get the FP8 to work. No KV quants.

[-]

audioen@reddit

I found a chart in this post https://www.reddit.com/r/LocalLLaMA/comments/1ssyukx/qwen3627b_klds_ints_and_nvfps/ which suggests that the FP8 is actually reasonably good.

[-]

pppreddit@reddit (OP)

here's my setup: M4 Max 128gb, omlx, Qwen3.6-27B-bf16 from huggingface, claude-code. Didn't configure any parameters, so it's as is out of the box. I did install opencode now and it seems to perform much better, but I need to test more to have a final verdict. My guess is that claude code's system prompt might be slowing things down

[-]

audioen@reddit

I don't know, it's really worth trying to nail down every detail, like what are the KV cache type during inference, and what are the sampling parameters. They recommend 0.6 for coding, but model might default to 1.0, or have some other sampling parameters altogether.

I find that 27b handles 100k token context and longer very well, without getting overly confused or anything. But I agree with you in that short system code is better than longer system prompt, and opencode is not the shortest possible, but it's around 10k only, and I think the tools and feedback available to the model is decent. So it's definitely not a poor choice for inference framework.

One thing I didn't like about opencode is that even when it's not itself doing anything, it seemed to consume about 1/3 of a cpu core at least on Linux. Maybe the terminal animation or something, hopefully they fix it.

[-]

whiteamphora@reddit

How do you expect help when you don't even provide the setup you use, command, anything.

[-]

Zyj@reddit

Op clearly lacks empathy. The ability to take the perspective of someone might help, but with a complete lack of details, can‘t. I see it a lot. I wonder how these people manage.

[-]

DinoAmino@reddit

They don't manage. They flail and then come here for hand holding because (in this case) they couldn't bother to learn for themselves. Reading the model card and learning about the model and the recommended sampling parameters is the first thing anyone should be doing.

Also, I think many people who claim to code here aren't doing it professionally either. Real ones include relevant info in their posts without being asked. At work, I send back bug report tickets that don't include steps to reproduce. I'm not going to bother even looking at it until that's provided and I can see it for myself.

[-]

tmvr@reddit

Ir works fine in OpenCode, has no issue editing or fixing things in files etc. Served with llama-server and using the recommended settings from unsloth:

We recommend using the following set of sampling parameters for generation

Thinking mode for general tasks: temperature=1.0, top_p=0.95, top_k=20, min_p=0.0, presence_penalty=0.0, repetition_penalty=1.0
Thinking mode for precise coding tasks (e.g. WebDev): temperature=0.6, top_p=0.95, top_k=20, min_p=0.0, presence_penalty=0.0, repetition_penalty=1.0
Instruct (or non-thinking) mode: temperature=0.7, top_p=0.80, top_k=20, min_p=0.0, presence_penalty=1.5, repetition_penalty=1.0

Please note that the support for sampling parameters varies according to inference frameworks.

From here:
https://huggingface.co/unsloth/Qwen3.6-27B-GGUF

[-]

pppreddit@reddit (OP)

thanks, I'll try that, though I'm using Qwen3.6-27B-bf16 with omlx

[-]

tmvr@reddit

The settings are independent of that.

[-]

Clear-Ad-9312@reddit

a lot of people don't realize but Claude Code as a harnes has dropped from top tier to being mid tier garbage. Its UI is amazing, but the harness for the LLM is bloated and confusing. Creates worse performing LLM outputs. Even the Claude Opus model performs better in a simpler harness like Pi

[-]

DeepWiseau@reddit

Does Qwen know it has access to tools? Did you tell it everything it is able to do and not able to do? Specifically how to act and when to act?

If not, why are you surprised? Local models do not know your set up. Take the time to customize a harness. Work from first principles. You will get a lot better results. Set up a memory system in postgresql and get smarter context management. Make sure your JV cache is not quantized below 8bit, in fact try not to compress it at all. You will need at least 128k context window. So with an apple setup you could run this at full FP16 KV cache. With a 5090 you could drop down to 8bit and 90K context window and probably get by. Anything less and you will really need to work on your context management and harness prompt.

[-]

chimph@reddit

The harness and its setup is important. I’m having a great time with this model in Opencode. Works fantastic

[-]

spaceman_@reddit

What harness or software are you using?

In my experience with opencode, it's been smooth sailing. Maybe it doesn't work well in Claude Code?

[-]

WetSound@reddit

I'm having a great experience using Pi

[-]

Alarming-Ad8154@reddit

No I haven’t experienced any of that… I’d love to help you figure out why you have, but your not making it very easy to do as you don’t provide any of the settings (which model, which software to run the modle which settings like temperature etc, which agent…)