Qwen3.6 35B-A3B very sensitive to quantization ?

Posted by Sudden_Vegetable6844@reddit | LocalLLaMA | View on Reddit | 5 comments

Wondering if it's a fluke of my testing (using LMStudio, runtime 2.14.0 based on llama.cpp release b8861) or if that model is very sensitive to quantization.

I have been testing various quants with the following prompt (thinking ON):

"I need to wash my car, the washing station is 50m away, should I walk or drive there ?"

And only Q8 comes out consistently with "drive" as the answer across multiple runs.

Lower quants at Q4 and even Q6, both from lmstudio and unsloth, come out with "walk" at varying frequencies, failing very often at Q4.

FWIW the 27B is more resilient to that particular test and answers with "drive" consistently at Q4.