How the heck is Qwen3-Coder so fast? Nearly 10x other models.

Posted by CSEliot@reddit | LocalLLaMA | View on Reddit | 26 comments

My Strix Halo w/ 64gb VRAM, (other half on RAM) runs Qwen3-Coder at 30t/s roughly. And that's the Unsloth Q8_K_XL 36GB quant.
Other's of SIMILAR SIZE AND QUANT perform at maybe 4-10 tok/s.

How is this possible?! Seed-OSS-36B (Unsloth) gives me 4 t/s (although, it does produce more accurate results given a system prompt.)

You can see results from benchmarks here:
https://kyuz0.github.io/amd-strix-halo-toolboxes/

I'm speaking from personal experience, but this benchmark tool is here to support.