Is there a way to speed up prompt processing with some layers on CPU with qwen-3-coder-next or similar MoEs?
Posted by Borkato@reddit | LocalLLaMA | View on Reddit | 46 comments
I feel like I tried every combination of n cpu MoE and such. I was running Qwen3-Coder-Next-MXFP4\_MOE.gguf. It was running at 32T/s but the prompt processing was ridiculously slow. Is that just how it is or am I missing something?
I have 30GB VRAM and 43GB RAM.
46 Comments
Possible_Statement84@reddit
Borkato@reddit (OP)
Possible_Statement84@reddit
Xantrk@reddit
Borkato@reddit (OP)
Possible_Statement84@reddit
Borkato@reddit (OP)
Possible_Statement84@reddit
Borkato@reddit (OP)
Possible_Statement84@reddit
Borkato@reddit (OP)
Useful-Process9033@reddit
DistanceAlert5706@reddit
ABLPHA@reddit
lemondrops9@reddit
ABLPHA@reddit
lemondrops9@reddit
ABLPHA@reddit
notdba@reddit
Borkato@reddit (OP)
Borkato@reddit (OP)
D9scene@reddit
Responsible_Pain3278@reddit
Borkato@reddit (OP)
D9scene@reddit
Borkato@reddit (OP)
D9scene@reddit
Borkato@reddit (OP)
D9scene@reddit
Borkato@reddit (OP)
DistanceAlert5706@reddit
D9scene@reddit
ABLPHA@reddit
National_Meeting_749@reddit
ABLPHA@reddit
National_Meeting_749@reddit
ABLPHA@reddit
National_Meeting_749@reddit
Borkato@reddit (OP)
Borkato@reddit (OP)
suicidaleggroll@reddit
Borkato@reddit (OP)
Borkato@reddit (OP)
mr_zerolith@reddit
Borkato@reddit (OP)
merica420_69@reddit