How faster is Gemma 4 26B-A4B during inference vs 31B?

Posted by alex20_202020@reddit | LocalLLaMA | View on Reddit | 16 comments

I want to download one and usually do inference on CPU having old GPU so I'm concerned with speed.

One link on the web (I have posted with it and post been removed):

Multiple users are reporting that Gemma 4's MoE model (26B-A4B) runs significantly slower than Qwen 3.5's equivalent.

I guess it could be due to early versions of backend engine. How now with newest llama.cpp, what is inference speed of 26B-A4B vs 31B?