Gemma 4 is good or bad at real word
Posted by Double-Confusion-511@reddit | LocalLLaMA | View on Reddit | 14 comments
Based on real-world usage by the community, roughly which version of which model is Gemma 4 comparable to? It would be great if you could also mention the hardware requirements for running it (like VRAM or GPU needs)
New_Zucchini_3843@reddit
I'm running the 31B Dense model on a dual-GPU setup with a 3090 Ti and a 3090, using Q6_K_L (28.3 GB).
I'm limiting the power consumption, and it's running at about 20 t/s to 22 t/s. It's not blazing fast, but it's running smoothly.
26B A4B model is also fast and recommended, but according to benchmarks published by Google, it performs poorly compared to the Dense model in terms of “depth of thought” and long tasks, so I’m using the Dense model.
I like it so much that I’ve deleted almost all of my other local models.
jsandhol@reddit
To me Gemma4:26b feels like the first actually usable \~30b model. And I get good results on my Acemagic F3A with 128GB unified RAM (64 dedicated to GPU, Arch + Ollama). Both accuracy and speed (speed could always be better, but this one is not frustratingly slow). Looks like my Qwen3 models are gonna get the `ollama rm` treatment very soon... :)
FigZestyclose7787@reddit
i7, 32GB 1080ti here. Running Gemma4 26B MOE. It is fantastic in real world usage. I use it to read/ fetch/ sort emails, calendars. Download and summarize yt transcripts. Deep web research and light html presentation productions (to be fair, the skill for this was made with Opus, but for running day-to-day, Gemma4 is fantastic). Even the E4B version is enough to do most of these things, it just suffers a bit on better writing tasks. Can 1 shot simple, but beautiful webpages too. light coding, and research ralph loops (i've done tens of these...). As or more reliable than qwen 3.5 counterparts on my latest testing. Good luck.
Double-Confusion-511@reddit (OP)
Qwen 3.5, oo, can not believe. Deep web research ? How to do it by gemma?
FigZestyclose7787@reddit
I had to create my own harness and turn it into a skill that I call in different ways for different levels of searches. Too many steps to describe it all here, but I'm using searxng on docker. Then I created a skill that has some python code in it (in case searxng is not running for any reason). The skill/ python can call agent-browser, patchright, and several other tools for pdf ingesting, multistep / planned research, etc. It is a bit of a troglodyte, unrefined, but It fits my needs for a few 10's of searches/ day for Free. As a bonus tip, if you have access to bigger models, go back and forth and create subskill routines for each major site you want to scrape (I was able to get it working 100% with difficult ones such as zillow, amazon, even X, after a couple of weeks). With the skill and tools, Even the E4B runs searches just fine, but I trust the 26B much better. Good luck.
Double-Confusion-511@reddit (OP)
High level information,thanks
Alex_L1nk@reddit
Model isn't good or bad. It's either fits your requirements or not.
Double-Confusion-511@reddit (OP)
That’s some deep philosophy right there. Maybe give your usage tips is better.
BigYoSpeck@reddit
I just need to know, should I drive or walk to the car wash?!?!
InstaMatic80@reddit
THIS
stddealer@reddit
The only issue I have with Gemma4 is it's overconfidence. It's refreshing when compared to all the other sycophantic models, but it will almost always trust it's "instinct" over the informations given by the user or even tool calls, even if it's instinct is wrong. But when it does in fact actually know the answer, it's fantastic.
Double-Confusion-511@reddit (OP)
Yes,sometimes tools calling like to make a mathematical probability question. Wrong function call occurred as well.
cbeater@reddit
Been using it for text consumption with Claude review, it's good so far I don't use it for tool calls asi I invoke it when I need and give it the tool results as needed. I turn of thinking as well for speed.
Double-Confusion-511@reddit (OP)
Effectively useage