Difference between Qwen 3.6 27b quants for vLLM
Posted by Blues520@reddit | LocalLLaMA | View on Reddit | 5 comments
Hi guys, I am trying to understand what is the difference between these quants to run in on dual 3090's.
First there is the official FP8: https://huggingface.co/Qwen/Qwen3.6-27B-FP8
Then I see this 6-bit AWQ: https://huggingface.co/QuantTrio/Qwen3.6-27B-AWQ-6Bit
And I see CyanWiki also has a quant up: https://huggingface.co/cyankiwi/Qwen3.6-27B-AWQ-BF16-INT4
They are all similar sizes so I'm unsure what to select. What is BF16-INT4 and will it perform faster on ampere but be less accurate then FP8?
DeltaSqueezer@reddit
Go for the CyanWiki one, he keeps the linear layers in BF16 which makes a huge difference in output quality.
Glittering-Call8746@reddit
Perplexity vs the int4 ?
Blues520@reddit (OP)
I was wondering why that model has both BF16 and INT4 in the name but I think I understand now. Thanks!
Tormeister@reddit
Relevant: thread
pulse77@reddit
General rule: more bits (=bigger file size) is better.
For general tasks difference between 6-bit and 8-bit is very small, but for precise coding it matters.