Qwen 3.6 27B llama.cpp | Multi-GPU pp t/s help

Posted by SemaMod@reddit | LocalLLaMA | View on Reddit | 26 comments

The new dense model is great, but I’m trying to figure out how to increase PP and Token generation speed. I’m running Q8 quants across 3 7900xtx GPUs and I’m consistently only getting 18-20 t/s generation speed and ~650 t/s prompt processing speeds which feels low. Wondering what other people are getting in multi gpu setups and how I can optimize the performance.