Dense Model Shoot-Off: Gemma 4 31B vs Qwen3.6/5 27B... Result is Slower is Faster.

Posted by MiaBchDave@reddit | LocalLLaMA | View on Reddit | 40 comments

Not affiliated with Kaitchup, but a fan of their testing. I was looking forward to this article... and it did not disappoint. Lots of free info in the link. The juicy part is behind a paywall. I'll respect that, but the short of it is:

It's showing that the Qwen's are more benchmaxxed, and Gemma 4 31B is far more efficient with token use. So even though Gemma is a little slower for inference because of its size, you're basically getting things done much faster. This is confirming my own use, so now really looking forward to DFlash in Gemma, MTP, and any other optimizations arriving soon.