Complete beginner to this topic. I just heard/saw that the new Gemma 4 is pretty good and small. So a few questions...
Posted by Popular_Tomorrow_204@reddit | LocalLLaMA | View on Reddit | 18 comments
Since probably a few of you have already tried it out or started using local models, is gemma 4 worth it?
- Is it worth running compared to other smaller models and what would the direct competition for gemma 4 be?
- What would be the best use case for it?
- What Hardware is the minimum and whats recommended?
MaxKruse96@reddit
My hardware: rtx 4070 12gb + 64gb ddr5 6000.
For RP, gemma 4 26b and 31b (31b being more literal in my experience) are my goto. 31b being 4t/s for me (which is fine for RP), 26b being 30t/s.
for other nieches (coding, general agent usage with RAG), i'd use other models, depending on your hardware specifically. no obvious recommends though, my personal page on this may help with some options though https://maxkruse.github.io/vitepress-llm-recommends/ (not updated for gemma4 yet)
last_llm_standing@reddit
good blog, but seems like it has not have been updated? You're recommending Gemma3 for some tasks that can be done by Gemma4 quants
MaxKruse96@reddit
i dont instantly jump on the bandwagon of updating everything. i use the models, then update the page at a later date. Its usage based, not "should work" based.
last_llm_standing@reddit
you should seriously consider updating it, a lot of the model recs are outdated.
MaxKruse96@reddit
Please do give some other ideas for models to add, i boiled it down to what i'd take as the usable lower limits.
SHOR-LM@reddit
Gemma is a fantastic model It is a groundbreaking model actually for its size. Some people still Still like QWE and 3.5 27B... But the only time I've ever seen those situations is strictly coding, and even then some people prefer Gemma or over it. But you just don't get a decent coding agent with gemma 4.
It mops the floor conversationally against any other model in it's weight class if you use it for something like RP or chat... It recognizes images, video, and audio.... I saw where it was outscoring Claude 4.5 Sonnet. So yeah, it is fucking insane. Google messed it up for a few hours the other day but I've heard they've since fixed some issues. Gemma 4 is a technical leap in small model AI tech.... and frighteningly so.
Herr_Drosselmeyer@reddit
Gemma 4-31B is hands down the best model at it's size and can be run with consumer hardware (albeit pretty high-end). I wouldn't really want to run it with any less than 24GB of VRAM. It can easily be your daily driver for most tasks
The MoE variant, which I haven't tried yet, will probably run ok on a card with 16GB if you offload to system RAM. People report that it's only a little worse than the 31B dense model.
Popular_Tomorrow_204@reddit (OP)
I have s 9070XT (16gb vram) and 128gb ddr5 ram
TheTerrasque@reddit
The MoE's should run decently at least. Qwen3.5-35b-a3b and gemma4-26b-a4b would be my recommendations to test out.
Do note that Gemma 4 models are very new and people are still finding issues with various runtimes running it, so it might improve over time
Krowken@reddit
Then the 31b model is going to be excruciatingly slow. The 26b MoE model should work fine on your hardware though.
DrMissingNo@reddit
In my experience the 26b Moe and the 31b dense models are good tho I've heard mixed feelings about them. I think it's fair to say the closest equivalent is qwen3.5 35b (I've used this one a lot) or 27b.
Both Gemma 4 and qwen3.5 manage to use my MCPs flawlessly (tho again, I've heard people complain about Gemma's abilities to use tools). I've got MCPs for websearch, memory, filesystem access (read and write), sequential thinking and time.
I run those on my desktop (AMD 9950x3D, 64gb ddr5 ram, rtx 5090). They fit rather well on my specs.
Not sure if this helps. You should experiment with lm studio (it's beginer friendly, has a nice and intuitive interface + a lot of options), it will tell you what models can fit on your setup.
Welcome to the party and have fun discovering AI 😉
Popular_Tomorrow_204@reddit (OP)
Ty, i wanted to get away from a few stupid subscriptions, have full control and just wanted to test things a bit, so im looking for a good local option/way.
I have a r7 7700X, 9070XT (16gb vram) and 128gb ddr5 ram. Is that like, "okay" to run a few of the newer models?
dionisioalcaraz@reddit
https://www.reddit.com/r/LocalLLaMA/comments/1sgvt01/16_gb_vram_users_what_model_do_we_like_best_now/
DrMissingNo@reddit
The problem isn't new or old models it's model weight. You can run any new or old models that fit in your VRAM.
Nuance : Some models have GPU offloading possibilities (i believe but might be wrong on that that models that are MoEs are more suited for this because the GPU only loads the active experts while the rest sits in your "normal" ram).
Couldn't tell you for sure which models your GPU can handle, that's why I would recommend lm studio, you'll get a clear view of what your system can and can't handle.
Pristine-Woodpecker@reddit
You basically want to look at Qwen3.5 which is more mature than Gemma, uses less memory, and better in most tasks.
teachersecret@reddit
Is Gemma worth it...
Worth what? What are you trying to do with it? They're solid models, and even the tiny ones punch above their weight. At this point, I'd say that Gemma 4 26b/31b are basically as good as GPT 4.1, a state-of-the-art model that hit about a year ago today. So, we're not far off from what the best models in the world can do, and that's pretty amazing for something you can run on a decent home rig at speed.
Code? The big API models in their CLI are going to walk all over Gemma 4. Nobody wants to go back to coding with GPT 4.1 or gemini flash. You can do it, if you want, but you should stick to the 31b if you're going to try, and it's a silly thing to do.
RP/chat? Sure. They're great models, censorship is light, and they do a good job of holding a conversation/story at least through most short-mid-long chats.
Long form writing? They're going to struggle a bit at higher context. Better for shorter writing form writing and editing.
Image processing? Maybe worth it. It's pretty fast. You can do some significant work across the board.
Agentic work/home assistant? Sure. It makes a decent Jarvis if you're talking the bigger 26b/31b, but make sure you're using it right (use the interleaved jinja template and the most updated llama.cpp). Again, don't expect miracles, but it's a solid model.
Running a food truck? Maybe.
DeepOrangeSky@reddit
For writing/general chat, Gemma4 31b seems even stronger than Qwen3.5 27b to me so far.
Unfortunately it has this runaway memory ballooning issue where the memory usage goes totally crazy and uses up all the memory on your computer if your interaction gets even mildly long.
They talk about it in this thread for example.
The solution is apparently to use: --cache-ram 0 --ctx-checkpoints 1
But I don't know where to type that/what to do with it if I am using LM Studio, so, so far I still have the issue and haven't been able to make it stop using up all my ram other than to keep ejecting the model and re-loading the model after every single reply when I interact with it (which is obviously super annoying).
If anyone on here knows how to do the fix in LM Studio (as opposed to in Llama.cpp, which I think that fix is for) and can explain how to do it in LM Studio, it would be appreciated. It's a great model, but basically unusable for me for now, because of that.
ApexDigitalHQ@reddit
I like the tone it writes in but I still tend to hand my more difficult tasks over to qwen, if I'm working on something locally at least. In my pipelines, I've relegated gemma4 to just refine content to be readable/enjoyable to humans. I'm still experimenting and my opinion may change over time. I do find it to be great for transcribing audio though!