Just how powerful is Google’s Gemma 4?
Posted by Double-Confusion-511@reddit | LocalLLaMA | View on Reddit | 15 comments
Just how powerful is Google’s Gemma 4?and what can we use it for?
Posted by Double-Confusion-511@reddit | LocalLLaMA | View on Reddit | 15 comments
Just how powerful is Google’s Gemma 4?and what can we use it for?
Zealousideal-Yard328@reddit
I benchmarked Gemma 4 E4B specifically on enterprise tasks — structured JSON output, compliance, and reasoning. Thinking mode makes a noticeable difference. Results and methodology here: https://aiexplorer-blog.vercel.app/post/gemma-4-e4b-enterprise-benchmark
Stepfunction@reddit
IT'S OVER 9000!!!
jugalator@reddit
Very good for the size for creative writing and above all language support! I've never seen this good language support from something 31B, even lesser languages.
Equal-Ad9264@reddit
Can it do a good image analysis?? More specifically if I need to understand the attributes of furniture shown in the image and tell me type of furniture, style, material and color?
AvocadoArray@reddit
I've been running it through some personal benchmarks and comparing it to Qwen 3.5 27b / 122b.
GrungeWerX@reddit
What about "not just this, it's that" slop?
AvocadoArray@reddit
It’s not just an improvement, it’s an evolution in how AI talk to humans.
Jk. It does still have some of that at times, but it’s definitely toned down compared to everything else I’ve run.
po_stulate@reddit
Which gemma4 model do you use? I've tried 31b (as I thought it would be the most capable one), but it feels like a huge step down from qwen3.5-122b-a10b for me.
AvocadoArray@reddit
Sorry I left that out, I've been running 31b dense.
I started testing with UD-Q8-K-XL, but started noticing some weird token accuracy issues. Sure enough, GH issues started popping up in the llama.cpp repo with a slew of confirmed bugs, Not sure if it's fixed yet, but I'd say hold off on judging it if you've only tested with llama.cpp so far.
The rest of my testing has been in VLLM using the full official BF16 weights since no FP8 weights were available yet. Will download an FP8 quant tonight and test with that as well.
po_stulate@reddit
Thanks. I was using UD-Q8_K_XL too and yes only llama.cpp for me. If it is really that good on VLLM I think I'll wait and test it again.
AvocadoArray@reddit
Keep an eye here: https://huggingface.co/unsloth/gemma-4-31B-it-GGUF/discussions/3
I'll update my post there once the fixes are in place and confirmed working.
po_stulate@reddit
Haha, I also had similar issues with it. The model claimed that there're some typos in my script and that it fixed it but there's no typo and it didn't fix anything:
/dev/urandom→→/dev/urandom.magick→→magick(assuming ImageMagick 7).I asked it to parallelize the script, it also didn't realize that it needs to make the cache file path different for each thread/iteration or they're going to overwrite each other. Qwen3.5-122b didn't have this issue too, wonder if this can also be a llama.cpp issue.
AvocadoArray@reddit
Yes, those are the exact problems I was having, I suspect it was also leading to other brain-damaged responses, but this one was the most obvious in my testing.
That specific issues isn't present in VLLM, but it seems they're also fighting some tool-calling bugs in the tool parser.
Either way, take all results right now with a grain of salt. I'm sure these bugs will get ironed out by the end of next week.
NotumRobotics@reddit
Asked her to build a complete inventory management system with QR scanning/generating. \~15 minutes with sub-agents, 100% local. So far so good, far less iterations than other models we've tested.
Signal_Ad657@reddit
It’s so hot right now.