FP16 on Qwen 3.6 27B

Posted by Forward_Jackfruit813@reddit | LocalLLaMA | View on Reddit | 20 comments

Have there been any notable difference between Q8 and FP16 on both the weights and the cache? I know the jump to Q8 is significant. I would test myself, but FP16 on my setup is painfully slow.

Also side question, is \~14TPS around the number I should be expecting on a Strix Halo running 3.6 27B at Q8 during coding tasks? I have my MTP max draft set to 3 and it seems to be slightly better than 2 which runs around \~11.

Another side note in case if you haven't ran into it, 27B is way better when context is below 100k. From my use it appears to finish specifically above 100k which was causing my issues initially.