Bad model quality qwen3.6-27b with hipfire on strix halo
Posted by sterby92@reddit | LocalLLaMA | View on Reddit | 5 comments
Hi, I'm running the default qwen3.6-27b with dflash with the latest hipfire on strix halo (Rocm 7.2). It works an gives a decently fast performance (i guess). But the output quality is really subpar. It does barely manage to do a tool call in openwebui and even messes up todays date for another date (todays date in the system prompt). I'm not sure if I'm doing something wrong, or if it is expected and we just wait for better support and better quants?
run 1/5 pp 102 tok/s | TTFT 196 ms | decode 34.9 tok/s (128 tok)
run 2/5 pp 102 tok/s | TTFT 196 ms | decode 34.9 tok/s (128 tok)
run 3/5 pp 103 tok/s | TTFT 194 ms | decode 34.7 tok/s (128 tok)
run 4/5 pp 103 tok/s | TTFT 195 ms | decode 34.7 tok/s (128 tok)
run 5/5 pp 102 tok/s | TTFT 196 ms | decode 34.9 tok/s (128 tok)
Prefill tok/s mean min max stdev ms
────────────────────────────────────────────────────────────────
pp128 165.2 164.9 165.4 0.2 775.0
pp512 270.9 270.5 271.2 0.2 1890.3
mean min max stdev
──────────────────────────────────────────────────────────
Prefill tok/s 102.3 101.8 102.9 0.4 (user prompt, 20 tok)
TTFT ms 195.5 194.4 196.4 0.7
Decode tok/s 34.8 34.7 34.9 0.1
Wall tok/s 33.1 33.0 33.1 0.0
Decode ms/tok: 28.72
Due_Net_3342@reddit
just buy a 24GB gpu , not worth trying dense models on strix halo
UnbeliebteMeinung@reddit
Dont buy a 24gb GPU ever. Its not worth it. You cant run anything expect some 4q quants... Start with 32gb or just dont... 24gb is pain in the ass.
Due_Net_3342@reddit
i disagree, I am running q5 quant of 27B with 180k context and turbo4 kv cache without any issues(via oculink - no spillover) and a bug MoE model on the strix itself
dsanft@reddit
Testing kernels without also robustly testing correctness on real model data is... bold, to say the least. Hipfire is still pretty new and experimental.
UnbeliebteMeinung@reddit
This stuff is hugely experimental...