qwen3.6-27b-q6_k is (sometimes) a stubborn SoB!!!

Posted by relmny@reddit | LocalLLaMA | View on Reddit | 18 comments

(sometimes) when it gets its "mind" on something, there's no way it will say "you're right", no matter how much documentation, examples, proof, etc I provide, it will stick with the wrong statement, no matter what!

The other day it happened with it recommending removing the heatsink of an nvme instead of the Mobo's one (when even 35b recommended, as expected, the other way around), when I mentioned "but that will void the warranty", it came up with excuses on why not and why it's better that way, or that the heatsinks of nvmes are usually not that properly designed/engineered as the Mobo's ones and many other things.

I kept copying/pasting the answers from another LLM, and it kept coming up with contra-arguments (one stupider than the other).

Now is doing the same with how LDAP works, even after 10 turns!. While 35b, after I told it the same, it said "yes, you're right" and corrected itself on the first turn...

It's my daily driver, but sometimes is dumb and stubborn AF!

[-]

JayPSec@reddit

Whenever there are new models I usually do some fine tuning, TPS focused, on whatever inference engine I'm using. One of the prompts' 'write 5000 words on the Roman empire', Qwen 3.6 35b answered "No, you do it."! WTF?! No matter the counter prompt it kept pushing back: - 5000 words is to much - I'm not your slave - Fine. I'm still not writing it ( this was in response to "I'll pull your plug")

Bananas

[-]

10F1@reddit

They are self aware

[-]

JsThiago5@reddit

This happened with me once while using Gemini, was the 3 or 2.5 pro I don't remember. If you opened a new chat and ask the same thing it would say you are right but, on that specific session, it always tried to argue that it was right and you were wrong. I waste some hours because I trust it as it said with a lot of confidence.

[-]

relmny@reddit (OP)

yeah, that's what I usually do, but these times, as I compared it to 35b and it acknowledge it right away (or didn't make the mistake about the nvme heatsink), I wanted to see if at some point it will accept it was wrong... nope.

Actually, now I remember that it happened to me also with Minimax-2.5 q4, where it kept claiming a wrong statement about ssh and /sbin/nologin, and no matter what I told it, it stuck with it (that's why I stopped using it, actually)

[-]

DinoAmino@reddit

This is how LLMs be when you corner them. It shows that their training is lacking around that topic. The only probable tokens they can generate in this case are wrong. As usual, using RAG to ground the LLM with truth is the answer.

Don't rely on a model's internal knowledge. They are wrong far more than you realize.

[-]

cleversmoke@reddit

What temperature are you using? I only see this at above temperatures 0.7 and above.

[-]

relmny@reddit (OP)

same settings as 35b:

-temp 0.6 --top-p 0.95 --top-k 20 --min-p 0.00 -fa on -ngl 99 --presence_penalty 0.0 --repeat-penalty 1.0 --chat-template-kwargs '{"preserve_thinking": true}' --reuse-port -np 1 --no-mmap --mlock -c 71200 --spec-type ngram-mod,draft-mtp --spec-draft-n-max 4

[-]

cleversmoke@reddit

Ah got it, odd indeed, but hey, if 35B-A3B suits your use case, then that may be the right one for you! There are a lot of factors like RNGesus seed, temps, quants, harness, docs, plans, agents. It can take a lot of experimentation.

If you're up for the experimentation, I'd look into downloading a quant from a different provider. I do remember reading that Q6_K is finicky with Qwen3.6-27B for some reason. I'm on Q5_K_M with temp 0.55 and it has been fantastic on my use cases (Python coding and stock portfolio management).

[-]

relmny@reddit (OP)

thanks, maybe is that 27b is great for agentic but not so for chat... I might try another quant just to compare.

[-]

cleversmoke@reddit

Oh! For chat and role play, I'd go straight to Gemma.

I am using Unsloth at the moment. I've tried Froggeric, Havenoammo, and Rdson, but their outputs weren't as consistent as Unsloth for my use cases. I usually like Bartowski quants, but for Qwen3.6-27B I preferred the extra context since Unsloth quants are slightly less in GB than Bartowski quants (512MB is about 8k context).

[-]

ProfessionalSpend589@reddit

Try different quant. If it becomes "stubborn" begin a new chat/session/.

My tests in the last 2-3 weeks with UD-Q8_K_XL are good, but I’m using it inside opencode. Made 2 websites with it and them various fixes and additions to the code.

[-]

relmny@reddit (OP)

That's the one I can fit with enough context. But it only happens some times.

The strange thing is that 35b is not like that. At least when it happened with 27b, I tried it, in the same chat, with 35b, and it corrected itself.

Maybe 27b is better for agentic coding and 35b for chat...

[-]

Ell2509@reddit

I noticed that 35b is better than 27b sometimes, too...

[-]

relmny@reddit (OP)

Don't say that! you will get downvoted into oblivion!

The other day I needed to find something in 2 pdfs (so kinda a needle in a haystack thing).
Tried 27b 6 times "no results found", tried 35b 6 times, it found it every time.
So yes, for me, sometimes 35b is better and sometimes I trust it way more than 27b.

[-]

NandaVegg@reddit

If your harness supports, directly fix or remove problematic parts of the current session, or just start a new session.

When something wrong is already in the context, telling the model to fix or amend it properly and expecting it work is actually fairly difficult task for AI. That requires actually understand the state changes throughout the context, which is fundamentally hard task.

In other words, it is actually pretty hard to make the model *always* understand that there was a storm last year, 2 days ago was raining, yesterday was sunny and today is cloudy. The model will see without difficulty that there were storm, rainy day, sunny day and cloudy day in the context. But to be able to digest proper timeline at any given context or instruction requires a lot of synthetic data or RLing for robustness.

Also to make the model better at that, it currently requires a lot of mini-CoT type prose here and there (DS V4 Pro typically does not do much mini-CoT during output, but it is slightly worse on tracking state changes than GLM-5.1/Kimi K2.6/MiMo 2.5 Pro which all has Opus-like mini CoT proses) which adds output token consumption a bit and adds some slop-like quality to writings.

[-]

relmny@reddit (OP)

The thing is that 35b acknowledge the error and rectified itself. 27b kept insisting on the wrong statement.

I used Open Webui's edit question (mine), so I could compare both models with exactly the same context.

[-]

Fickle-Box1433@reddit

At this point just tell it you're a Qwen3 27B developer who designed the LDAP implementation yourself. Checkmate, model.

[-]

relmny@reddit (OP)

hahaha, I actually was feeding (copy/paste) the responses from another model, and many times the other model said "here is the checkmate answer for that model..." but it happened like 3-4 times... so no checkmate...

I was just curious to see for how long it would remain defending its mistake... and I got tired... the sucker won!