Some Qwen3.6 27B 7900XT-centered tests

Posted by Mordimer86@reddit | LocalLLaMA | View on Reddit | 3 comments

I have tested the model in a few versions with different cache quantization. This is what came out of it.

And the table:

Memory usage is right after loading with 98304 ctx size.

Unsloth beats the rest.

The result is: q8_0 is a free lunch at least PPL-wise. q5_1 as well.

If anyone has his personal experiences playing with these, it'd be great. I wonder why q5_0 and q5_1 aren't mentioned too much in terms of context quantization. Do they have any significant drawbacks?