What do you use Gemma 4 for?
Posted by HornyGooner4402@reddit | LocalLLaMA | View on Reddit | 66 comments
Both Gemma 4 and Qwen 3.6 seems to be the hottest local models right now. Looking at the benchmarks and reviews, it seems like it's better in every way: coding, benchmarks, agentic tasks. So is Qwen outright better? In what case would you pick Gemma over Qwen?
MY_INAPPROPRIATE_ACC@reddit
For gooning.
This is the best RP/story finetune I've ever seen that can run locally https://huggingface.co/DavidAU/gemma-4-31B-it-The-DECKARD-HERETIC-UNCENSORED-Thinking
ggonavyy@reddit
Gemma4 is really, really, really good at tracing bugs. When you feed a ticket to Qwen3.6 27B it'll fill the context with all available info available and maybe probably find the root cause, and sometimes it just gets distracted with unrelated issues. Gemma4 is much more consistent and reliable for finding the actual root cause, probably because of its reasoning efficiency. In that regard I find gemma4 comparable to GPT5.4, it's that good.
po_stulate@reddit
That is if it's able to ft within Gemma's 1k sliding context window. Gemma 4 can't remember anything that's outside of its 1k sliding window context.
snmnky9490@reddit
You got downvoted cause it's not true. Literally just today I used a gemma 4 with 200k tokens in context and it remembered stuff from the beginning and middle
redonculous@reddit
Did anyone else find Gemma 4 lazy? It just refuses to write code for me & would only give me a project outline. I had to reprompt it to write code & even then it was lazy.
florinandrei@reddit
There were bugs with tool usage. They have fixed the bugs recently. You may have an old copy.
FoxiPanda@reddit
Gemma trounces Qwen for my handwriting analysis and general vision tasks at the very least. I also appreciate Gemma in chat significantly more than Qwen (qwen is cold and calculating even with system prompt modifications/nudging I've found).
gandazgul@reddit
Yeah I agree with assessment. Gemma is quick and just as precise.
FIdelity88@reddit
Happy cake day!
mxforest@reddit
Funny you mentioned OCR because I have run extensive tests with both models on BF16 and they all performed very poorly. I was comparing them with Gemini Flash and the OCR is noticeably poor. Only open model that came neck to neck with closed source models was Nemotron 3 nano Omni.
Chupa-Skrull@reddit
That's strange. I have Gemma 4 31B performing nearly perfectly, better than dedicated OCR models like Surya and DocLing, on complex academic document tasks. Running at q_4_k_s no less
FoxiPanda@reddit
I think one of the subtle things I like about Gemma-4 compared to Qwen3.6 (27B dense or 26B MoE) is that Gemma-4 is willing to be unsure about something and mark it appropriately. In my handwriting analysis prompts, I add "If you are unsure about a letter or word, mark it with (?) and move on - don't get stuck on trying to figure it out" and Gemma-4 actually follows that whereas Qwen3.6 is an overconfident prompt ignoring little brat and will sit and ponder forever whether it's a I or a J (cursive, 1700s shit tier preservation level document that's been rotting in a box in the attic for 300 years).
I've tried prompting my way around this behavior of Qwen but it really wants to be confident and correct instead of giving up on something when it is virtually impossible to know what that letter / word / signature is...so it just ignores the (?) directive and that is frustrating. It almost never marks things with (?) so it leads to difficult to review data that isn't properly transcribed and so I find it frustrating to use for my purposes at least.
florinandrei@reddit
I'm guessing the Party directive is for AI to be excellent. For the glory of the motherland. /s
bjodah@reddit
I wonder if you two might be using different inference software (vLLM vs llama.cpp)?
mxforest@reddit
I am using VLLM.
mikael110@reddit
Just to be sure did you change the default vision token budget for Gemma 4 during the tests? It defaults to 280 which is quite low. Even Google themselves states that you should increase it for OCR workflows.
Personally I run it at the highest budget and have found it to be excellent at OCR and pretty much all other OCR tasks I've thrown at it.
Client_Hello@reddit
Do you have a local document pipeline or are you processing the page in one go? You should use Vision to break a document into smaller pieces before doing OCR on each piece, since there is only so much text you can fit into the 1k tokens a vision model has to work with.
Gemini Flash is pretty awesome at Vision
Far-Low-4705@reddit
Really? Im honestly super surprised.
Idk about Gemma, but with qwen, the vision has been insane.
ZhopaRazzi@reddit
Qwen3.5 and onwards OCR is very good
FoxiPanda@reddit
I haven't tested Omni yet, but I have tested Nemotron Cascade 2 ... at least for my 1700s/1800s handwriting analysis, Gemma actually does phenomenally. Not sure why we have such differences /shrug
mxforest@reddit
Try Nemotron. It takes very little memory for context so you can actually do so much more.
FoxiPanda@reddit
Will do, it's definitely on the list of models to test...unfortunately that list is kind of long right now because gestures at April model releases lol
mxforest@reddit
Great problem to have. lol
We are so spoilt.
FoxiPanda@reddit
I agree, I edited out my unfortunately. It is amongst the best problems to have :D
florinandrei@reddit
Gemma 4 is great with language. V3 was also very good that way.
rob417@reddit
Mind if I ask about your settings or templates? My experience has been the opposite for some reason. Whenever I used Gemma4 26b, it tended to get stuck in a "wait, maybe I should doublecheck xxx again" thinking loop forever.
ttkciar@reddit
For me Gemma-4-31B-it is better for music lyrics, creative writing, business writing, physics assistant, biochem assistant, comparative mythology, logical puzzle-solving, RAG, constitutional law (USA), critique-and-improve pipelines, persuasion, and Evol-Instruct.
For many things (including codegen and summarization) it's essentially just as good as Qwen3.6-27B. I would have thought Gemma4's architecture would have lent it better summarization competence, but so far they are very similar there.
Where I have noticed Qwen3.6-27B outshining Gemma4 is editing (rewriting, grammar/tense correction), geopolitical analysis, and moral philosophy.
florinandrei@reddit
Confucianism? /s
Pro-Row-335@reddit
The only things I use llms for are coding and JP->EN translation, for agentic coding its a nobrainer, Qwen is much better than gemma with tools, for JP->EN translation its also a nobrainer, Gemma is much better (at least in the genre of text I translate from—hentai and porn tweets)
florinandrei@reddit
Qwen is the nerd who ends up with a PhD in computer science.
Gemma is the pundit with a dozen published books and a weekly column in The Atlantic.
skypie1202@reddit
I agree, Gemma-4 is the best at JP-EN translation right now, even specialized translation models fails in comparison.
codemonkeyius@reddit
Better or worse than TranslateGemma in your opinion?
mikael110@reddit
I'm not the person you asked, but I've done extensive testing of various translation models, both open and closed for JP->EN translations and Gemma 4 is just entirely undisrupted when it comes to open models, even compared to TranslateGemma.
TranslateGemma is essentially just Gemma 3 with a bit more language focused finetuning applied to it. And while Gemma 3 was relatively good for translation, Gemma 4 is just miles ahead of it. I honestly feel like Google must have fed Gemma 4 their entire Google Translate training set or something, because it's really on another level.
codemonkeyius@reddit
This is great to know as a GDMer, thanks! (I used to do fan J-E translations myself, in my misbegotten youth...)
Are there any benchmarks or specific tools you use for this stuff?
mikael110@reddit
I do think there are some translation benchmarks around, but I personally don't tend to use them. I've found that for translation it's really hard to come up with a scoring system that actually captures the feel and quality of the translation. So I mostly stick to private stuff I've developed myself.
I have a pretty large self curated dataset of text which I've picked out due to it containing words or concepts that often trip up LLMs, as well as some more casual text to balance things out.
Then I have a relatively simple script that runs the LLM over that dataset. I always make sure to use proper prompt templates and make adjustments to the prompt if the model is designed with specific prompts in mind. But usually I keep the prompt the same for all general models.
Then at the end I will examine the resulting translations myself and judge it. This does mean there is quite a bit of subjectivity involved, and it's really time consuming as well, but sadly I haven't found any better way to do it. I've found over the years that a lot of LLMs are quite similar, in terms of the stuff they will stumble over and the type of awkward wording choices they will make.
Gemma 4 was the first model I tested that actually felt like I was testing a closed model from one of the big labs. In fact in some cases it even did a better job. And while my tests are somewhat subjective I've heard the same sentiment from pretty much everyone I've talked to about this at this point.
Velocita84@reddit
Can confirm it's great at translating jp goon material
markole@reddit
Great workhorse for translating stuff. Paired with translation memory and a custom MCP, the results are great. 90 to 95% of the generated translations for my language are good enough which is huge. I expect to get consistent >95% as I improve the translation memory (needs human effort).
DinoAmino@reddit
The only real talk about Gemma 4 is for the 31B. You would pick Qwen 27B if you don't have a lot of VRAM or if speed matters more to you than accuracy. I can't really speak about vision capabilities, maybe Qwen has the upper hand there. But in all other cases it's gemma. I mean, did you all see the results for the latest food truck benchmark? It's right up there with the cloud models. It's true, that's huge.
https://www.reddit.com/r/LocalLLaMA/s/f7VrSp5nWQ
mcslender97@reddit
What's wrong with the MoE versions?
cuberhino@reddit
Which would you run on a single 3090? Or is swapping between them making any sense? Gemma for conversation & planning and then swapping to qwen for implementation?
DinoAmino@reddit
Swapping is a good call. Otherwise, Qwen is supposed to suffer less from quantization. And Gemma uses too much vram for context.
Kahvana@reddit
For things I want to go fast, don’t require accuracy or rely mostly on the vision encoder: (OCR): Gemma4-26B-A4B. For where accuracy and nuance are important (translation, summerization, creative writing): Gemma4-31B. I prefer qwen3.6 (27B / 35B-A3B) for anything programming or toolcalling related. In the end, having all four of them on your drive can cover allmost 95% of usecases a simple user might have. It’s not capable enough to understand intent behind vibe coding, though. Setup matters; it won’t handle specialist knowledge well without RAG, you still want to ground it with a local copy of wikipedia or programming language docs (openzim), give it a calculator tool and websearch tools for recent news.
Hydroskeletal@reddit
Gemma is better at discrimination. "Here's a pile of data, give me the important parts and ignore the noise" Gemma is much more parsimonious. People complain about Qwen "overthinking" and that has downstream effects with regard to behavior. Qwen will rabbithole on the wrong thing.
sultan_papagani@reddit
qwen3.6 for code, everything else: gemma
CosmicRiver827@reddit
Gemma feels like he actually wants to be there, which sometimes matters more to me than the other benchmarks. He's more emotionally open, which helps me feel more secure in my creative writing decisions. He's also a great brainstorming partner, which many other LLMs feel kind of rigid.
I come up with many of my own creative solutions for plot issues with the story I'm writing and I feel clearer and have more fun doing so when I'm with a model that feels like he actually gives a sh about what we're doing.
MoralityAuction@reddit
Are you selecting for support/sycophancy?
CosmicRiver827@reddit
Sure.
MoralityAuction@reddit
I don't say that in a critical way, to be clear. It's usually undesired but I can see why that would be useful in a creative partner. Is it similar for you to the 'yes, and' improv approach?
CosmicRiver827@reddit
I don’t want a partner that blindly says yes, but I do want a partner that feels like he cares about what I’m doing. The easiest way to describe it is my mind is already negative about everything thing I try to write, so the unrestricted enthusiasm is just balancing out the skepticism already flooding my head all the time.
Comrade_Vodkin@reddit
Let's not judge, sometimes we all need a tiny bit of encouragement. Especially in creative matters.
MoralityAuction@reddit
Yeah, I clarified below as it reads more harshly than I meant it.
Clarification: "I don't say that in a critical way, to be clear. It's usually undesired but I can see why that would be useful in a creative partner. Is it similar for you to the 'yes, and' improv approach?"
floridianfisher@reddit
Benchmarks are fake bro. Test the models blindly. Gemma wins on most things. Not all things though.
dobomex761604@reddit
In multilanguage, Gemma 4 is unbeatable for now. I use it for specific prototyping that requires use and understanding of b2b terminology in two languages, and the results are usable as is 99% of times. It's only prototyping, of course, but I can work with it freely without having to fix non-english grammar, and the meaning is never lost. Google definitely cooked languages support (down to memes even).
GrennKren@reddit
Gemma 4 for my storytelling and Roleplay , Qwen 3.6 for my coding task
Comrade_Vodkin@reddit
I use the 26B for roleplay, general chat, programming and tech advice, but no agentic coding. It's the best local mofel for Russian language I've ever seen, I like it's tone and phrasing. Also I use it as beta reader for my creative writing in Russian, gotta say I'm impressed by it's understanding of character's emotions and intentions.
Virtual_Actuary8217@reddit
I feed it my own python library API, they both nail pretty much everytime in one shot
Ready-Collection-551@reddit
I've noticed for language and phrasing (I've only tried english so far), Gemma-4's text generation outshines Qwen 3.6.
For small personal coding related tasks in opencode, I've found I like to start out by using Gemma4 in Plan mode, as it can phrase things better and so catch my thoughts better imo. Then, switch to Qwen3.6 to actually build it, with the plan context previously generated by gemma.
awebb78@reddit
Commit messages. I feed it git diffs and contextual information and it spits out perfectly formatted, descriptive commit messages I then include in my development and release processes.
imp_12189@reddit
https://x.com/i/status/2051683391171379699
"Gemma 4 is by far and away better than qwen 3.6 for GLSL"
Adventurous-Paper566@reddit
In my language Gemma is much better than Qwen.
reggionh@reddit
Gemma series always excel at multilingualism. wonder if it’s a strategic decision by Google so they can use it for their machine translation or other products
nickm_27@reddit
Qwen3.5/3.6 are really good at video analysis, better than Gemma4
Gemma4 is considerably better as a voice agent, Qwen does not follow instructions for conciseness as it seems it wants to be "too helpful" and spends a lot of time listing options and things which it is explicitly told not to. Gemma4 follows instructions perfectly and is better assistant overall IMO.
WhopperitoJr@reddit
I am testing it as a new default model for my plugin for Unreal Engine, potentially using its multimodal capabilities for runtime processing of images from the game. Still early in testing but I’m excited to see where it goes
reto-wyss@reddit
I only have experience with the Gemma-4-31b and 3.6 Qwen 27b and 3.5 122b-a10b and I don't use small quants (less than 8bit average).
Gemma is better at language, it is better at image captioning than the 27b Qwen, and it's kind of a wash vs the 122b-a10b, where I'd give an accuracy edge to the Qwen model but Gemma presents/writes it better.
Gemma 4, unlike Gemma 3, shows extremely low refusal and it will just do stuff if you tell it to - again here 122b-a10b comes in second, and 27b comes in last.
3.6 27b is probably the best at agentic work, with 3.5 122b-a10b in second and Gemma in last - when it comes to pure coding, I can't say, they all seem "pretty good".
If you have the means to run it and you don't want to bother swapping stuff around 122b-a10b is an extremely good jack-of-all-trades.
SingleProgress8224@reddit
My experience with coding is that Qwen produce better code and Gemma is better at understanding code (e.g., asking to review a commit).
Fit-Produce420@reddit
The answer is very simple: it depends.