Bad model quality qwen3.6-27b with hipfire on strix halo

Posted by sterby92@reddit | LocalLLaMA | View on Reddit | 5 comments

Hi, I'm running the default qwen3.6-27b with dflash with the latest hipfire on strix halo (Rocm 7.2). It works an gives a decently fast performance (i guess). But the output quality is really subpar. It does barely manage to do a tool call in openwebui and even messes up todays date for another date (todays date in the system prompt). I'm not sure if I'm doing something wrong, or if it is expected and we just wait for better support and better quants?

  run 1/5 pp 102 tok/s | TTFT 196 ms | decode 34.9 tok/s (128 tok)
  run 2/5 pp 102 tok/s | TTFT 196 ms | decode 34.9 tok/s (128 tok)
  run 3/5 pp 103 tok/s | TTFT 194 ms | decode 34.7 tok/s (128 tok)
  run 4/5 pp 103 tok/s | TTFT 195 ms | decode 34.7 tok/s (128 tok)
  run 5/5 pp 102 tok/s | TTFT 196 ms | decode 34.9 tok/s (128 tok)

  Prefill    tok/s      mean      min      max    stdev     ms
  ────────────────────────────────────────────────────────────────
  pp128               165.2    164.9    165.4      0.2   775.0
  pp512               270.9    270.5    271.2      0.2   1890.3

                       mean      min      max    stdev
  ──────────────────────────────────────────────────────────
  Prefill  tok/s      102.3    101.8    102.9      0.4   (user prompt, 20 tok)
  TTFT     ms         195.5    194.4    196.4      0.7
  Decode   tok/s       34.8     34.7     34.9      0.1
  Wall     tok/s       33.1     33.0     33.1      0.0

  Decode ms/tok: 28.72

[-]

Due_Net_3342@reddit

just buy a 24GB gpu , not worth trying dense models on strix halo

UnbeliebteMeinung@reddit

Dont buy a 24gb GPU ever. Its not worth it. You cant run anything expect some 4q quants... Start with 32gb or just dont... 24gb is pain in the ass.

i disagree, I am running q5 quant of 27B with 180k context and turbo4 kv cache without any issues(via oculink - no spillover) and a bug MoE model on the strix itself

dsanft@reddit

Testing kernels without also robustly testing correctness on real model data is... bold, to say the least. Hipfire is still pretty new and experimental.

This stuff is hugely experimental...