Some Qwen3.6 27B 7900XT-centered tests
Posted by Mordimer86@reddit | LocalLLaMA | View on Reddit | 3 comments
I have tested the model in a few versions with different cache quantization. This is what came out of it.


And the table:
Memory usage is right after loading with 98304 ctx size.
Unsloth beats the rest.
The result is: q8_0 is a free lunch at least PPL-wise. q5_1 as well.
If anyone has his personal experiences playing with these, it'd be great. I wonder why q5_0 and q5_1 aren't mentioned too much in terms of context quantization. Do they have any significant drawbacks?
mr_Owner@reddit
What are the speeds though?
Nyghtbynger@reddit
Am I having a render problem ? My screen is SDR but I see q5_0 and q4_0 the same colour.
What does Q8_0/q4_0 mean ?
Mordimer86@reddit (OP)
For the first chart they sadly are. Blame Libreoffice.