Is there a way to speed up prompt processing with some layers on CPU with qwen-3-coder-next or similar MoEs?

Posted by Borkato@reddit | LocalLLaMA | View on Reddit | 46 comments

I feel like I tried every combination of n cpu MoE and such. I was running Qwen3-Coder-Next-MXFP4\_MOE.gguf. It was running at 32T/s but the prompt processing was ridiculously slow. Is that just how it is or am I missing something? I have 30GB VRAM and 43GB RAM.