MTP is nice and all, but what about PP speeds?

Posted by milpster@reddit | LocalLLaMA | View on Reddit | 28 comments

I don't know for the rest of you, but with my setup, as soon as i enable MTP, the PP performance and GPU usage drops significantly for some reason. It's not as much a memory issue for me as it is declining performance.

My setup is: 2x Radeon VII 16gb on ROCm, 1x Rtx3080 8gb Max Q on vulkan. Running Qwen 3.6 27B with KV at Q8. The Radeon VIIs are on 4x PCIe Risers, so maybe it is a bus contention issue?

That said, i also tried going full Vulkan, but that makes it worse by a long shot.

Anyone here that could please explain why that is the case?