so…. Qwen3.5 or Gemma 4?

[-]

soyalemujica@reddit

Tried Qwen 3.5 35B A3B vs Gemma 4 A4B and Qwen won by a BIG margin. (Coding test).

[-]

LexEntityOfExistence@reddit

Dude that's a terrible comparison.

One has 35b parameters while Gemma has 8B total, 4b is effective

[-]

Answer would depend on your use case and not to mention both of them are pretty unstable. Both have issues with MLX or llama.cpp implementation so you can't judge. For local inference for me Gemma-4 has been far superior as it is much more efficient in using thinking tokens and I like the way it answers. But as I mentioned that depends on person taste and use-case...

[-]

Rich_Artist_8327@reddit

vLLM has worked with gemma4 from day 0. Why people still messing with llama.cpp?

[-]

Specter_Origin@reddit

I thought vllm for gguf is experimental and specially on apple silicon its not very stable, I do not have any experience with it, just from reading that is what I gathered...

[-]

Rich_Artist_8327@reddit

nobody should use gguf instead FP8

[-]

sisyphus-cycle@reddit

I must be doing something wrong, because Gemma 4 almost always produces 2-3x more reasoning tokens than qwen (MOE for both, f16 kv cache) in my tests. I’ll publish some of my local tests after rebuilding llama.cpp later today. I just test it on leetcode hards (they should know those easily). Gemma consistently hits between 2-5k reasoning tokens, qwen hovers around 400-1000.

I have noticed Gemma follows system prompts better.

[-]

no_witty_username@reddit

I just want to point out an interesting finding that might be of use when it comes to qwen 3.5. I found that enabling thinking with a small reasoning token budget (about 100 tokens) significantly increased performance of the qwen models while keeping the latesies low. I even tried this with 1 token for reasoning budget and intelligence was still high, though reasoning started leaking in to content... I suspect that RLHF basically conditioned the model that IF reasoning on (regardless of token output) therefore increase output quality. I know it sounds silly but try it out yourself and compare results.

[-]

Significant_Fig_7581@reddit

I think llama.cpp fixed this today

[-]

TheTerrasque@reddit

Even with latest fixes gemma4 messes up some tool calls for me. It gets the syntax messed up.

Apart from that it does better as an assistant for me. Less thinking, more effective tool calls when they work, and more concise and direct answers.

I suspect it will take over for me as local assistant when all the bugs are ironed out

[-]

Specter_Origin@reddit

I saw that, was wondering if its already in a release or just merged ?

[-]

eugene20@reddit

b8656 had the fixes.
If using LM Studio that corresponds to llama.cpp v2.11.0 which will download when you select the beta branches.

[-]

grumd@reddit

After the Gemma release I just switched to pulling the latest master branch and compiling from that (instead of latest tag)

[-]

Specter_Origin@reddit

Smart!

I also just checked, we do have a release 'b8664' today with fixes included.

[-]

Weak-Shelter-1698@reddit

Gemma 4 for me.

[-]

Exciting_Garden2535@reddit

The better to wait a week or a few weeks until ggufs, llama.cpp, LM Studio, etc., will be cleared out of all bugs related to Gemma 4.

It took almost a month for gpt-oss to shine; right at the start, it was not usable.

It took a few weeks for Qween-3.5 to get rid of the loops.

[-]

Rich_Artist_8327@reddit

those are for kids, why not to use vLLM? it works flawlessly

[-]

catlilface69@reddit

vLLM and “runs flawlessly” are incompatible. vLLM still can’t run reliably run newer models without patches. It is indeed an awesome inference tool, especially when working with multiple gpus and concurrent requests, but imo it struggles to keep up with model releases

[-]

Rich_Artist_8327@reddit

In production you rately switch models, evaluation of new model requires testing from zero.

[-]

-dysangel-@reddit

Qwen 3.5 27B is beating out Gemma 4 31B in my side by side coding tests.

Haven't tried the native audio models yet, that's a pretty great feature.

[-]

Far-Low-4705@reddit

also beating it out in general agentic use cases like web search/research in openwebui for me.

gemma will do one web search, and give results (even though i asked for deep research) while qwen will do 10 web searches and examine 8 individual full web pages before returning the results (much more accurately at that)

I think gemma is still better at non-technical writting, like human sounding emails, but qwen is better at doing actual "work".

[-]

Rich_Artist_8327@reddit

we are talking about gemma 4 here

[-]

Far-Low-4705@reddit

We are comparing Gemma 4 to qwen 3.5 here

[-]

cralonsov@reddit

Can you explain how are you doing it and what are you using exactly? I would like to create an agent to perform web search to get some leads, would it be possible to use it for that?

[-]

Far-Low-4705@reddit

Yeah u can use openwebui for that, it has built on web search tools wich work pretty well.

The only thing is that they have really strange prompt injection issues where the inject things into your prompt which cause full prompt reprocessing which is annoying, but u can fix it by changing the prompt they use causing the issue to an almost empty string

[-]

EbbNorth7735@reddit

Deep research should really be performed in an agentic loop

[-]

Far-Low-4705@reddit

it is, gemma just stopped early and didnt go deep

[-]

Woof9000@reddit

Qwen does come with marginally better technical skill set.
But Gemma excel in other areas, ie language skills, better, more natural human interactions, and languages and translations. I can freely speak only few foreign languages, but those few that I do know, Gemma can translate to, back and forth, close to maybe 95-98% accuracy, which significantly better than Qwen. Polyglot AI assistant can be quite handy.

[-]

DinoZavr@reddit

my observation as well. still Gemma4 is very very new. too early to make verdicts, as there are so many tests to run.

[-]

stormy1one@reddit

Pretty much sums up my experience using any Google Gemini related for code. Fine for small code snippets but horrible experience working on larger code base.

[-]

newcolour@reddit

Was Gemma advertised as a coder? I think of it as more of a conversational LLM.

[-]

dryadofelysium@reddit

These are literally some of the first points mentioned in the initial official Gemma 4 announcement blog post:

Agentic workflows: Native support for function-calling, structured JSON output, and native system instructions enables you to build autonomous agents that can interact with different tools and APIs and execute workflows reliably.
Code generation: Gemma 4 supports high-quality offline code, turning your workstation into a local-first AI code assistant.

[-]

unjustifiably_angry@reddit

I think they did include various coding benchmarks in their "byte for byte the best AI evarrr" post.

[-]

durden111111@reddit

Coding: Qwen Roleplay: Gemma

[-]

sexy_silver_grandpa@reddit

"roleplay"?

What the fuck is wrong with you people. Have some shame, you're embarrassing yourselves.

[-]

Kalitis2@reddit

Roleplay don't means Erotic Roleplay by default, bud.

[-]

IrisColt@reddit

Go outside and

Heh! You owe them a thank you, they blazed the trail for the tech you're using right now.

[-]

SlaveZelda@reddit

okay /u/sexy_silver_grandpa

[-]

sexy_silver_grandpa@reddit

Ya, physical women find me sexy because I'm not just obsessing with AI lol

[-]

albinose@reddit

Isn't it censored to hell?

[-]

Lorian0x7@reddit

not with thinking disabled.

[-]

FluoroquinolonesKill@reddit

Gemma’s prose is better, but Qwen seems more chatty, engaged, and friendly.

[-]

Koalateka@reddit

I agree, this is my conclusion as well.

[-]

chibop1@reddit

IMHO: Gemma4 for assistants and Qwen3.5 for agents.

[-]

r1str3tto@reddit

Eh, I don’t know. Gemma4 seems to be safetymaxxed to a ridiculous degree. And I’m NOT talking about NSFW - I’m talking about completely harmless queries like “estimate my body fat percentage in this pic”.

[-]

devilish-lavanya@reddit

Why juri went to outside ? He has work to do inside.

[-]

Rich_Artist_8327@reddit

jury had one job...

[-]

No_Mango7658@reddit

Gemma4 e4b is surprisingly useful

[-]

jedsk@reddit

Gemma4 didn’t work in OpenCode for me. Q3.5 worked great.

[-]

colorblind_wolverine@reddit

Can you explain the difference between the two? ‘Assistant’ vs ‘Agent’? Wha are the important distinctions?

[-]

thelebaron@reddit

Being able to complete requests without giving up prematurely(which gemma appears to fail at for me using e4b)

[-]

rinaldo23@reddit

Coding agents, for instance, require a much more structured output for running commands, whereas you probably won't mind if your vacations schedule has misplaced commas

[-]

chibop1@reddit

Assistants simply answer question by responding in words. Agents also perform actions like editing file, fetch stuff which require good tool calling ability.

[-]

Sensitive_Buy_6580@reddit

I guess my way to differentiate them is that Assistant work with users (Front Desk) and Agent works with infrastructure and code (Engineer).

[-]

idiotiesystemique@reddit

That's one fat ass model just for assistants. Doesn't fit consumer grade cards

[-]

chibop1@reddit

They all fit with 128k context on my Mac with 64GB. it's definitely a consumer device. :)

[-]

Swaggy_Shrimp@reddit

I mean I haven't yet encountered a small local model that is actually a good general purpose chatbot because they have very little world knowledge. Even the best small models I have tried will confidently spit out utter nonsense when you ask it stuff. And no, websearch usually doesn't stop it from inserting randomly hallucinated facts into the answers (it just does a little less of it).

I think small models are great for rewriting text, summarizing them, translating them, small logic problems - etc. Anything that doesn't require the model to actually know anything.

But for my general purpose chatbot queries I need very factual answers - so the fatter the model the better.

[-]

idiotiesystemique@reddit

Gpt OSS 20b was just fine as an assistant

[-]

Swaggy_Shrimp@reddit

If you don't mind half truths and false dates, numbers and facts sprinkled into your assistant's answers I guess.

Try it yourself, pick a topic you know a lot about and dig in a little and really quizz your small model. It doesn't take much pushing or digging to make it hallucinate.

[-]

Swimming_Gain_4989@reddit

This is where I land. Qwen is the better model if it has to interact with code, otherwise use gemma.

[-]

Spara-Extreme@reddit

Yes - the open source community is winning hard right now.

[-]

True_Requirement_891@reddit

No glm-5.1, glm-5-turbo. glm-5v-turbo, minimax-m2.7, mimo-v2-pro, qwen3.6 yet... for some reason it seems like all the chinese companies have joined together to either delay or not release their latest models at all... I feel like the next kimi model will also remain closed for a long time...

[-]

Rich_Artist_8327@reddit

They are all state owned same company behind each model

[-]

Spara-Extreme@reddit

Dude they just released a bunch of stuff like a month ago, come on

[-]

Lorian0x7@reddit

Qwen 3.5 for agentic and coding, and Gemma4 for emails and RP and writings.

Gemma 4 is honestly crazy good for RP and very flexible. With thinking disabled is the best RP model.

[-]

albinose@reddit

How's censorship? I remember Gemma 3 was quite bad at that

[-]

Lorian0x7@reddit

You won't believe it. With thinking disabled it's truly something

[-]

indigos661@reddit

General text assistant: Gemma4; better CoT structure and gemini-style answer

Multi-modality(image): Qwen3.5; gemma4 is only useful on general description as its vision tower has much less vision tokens

Tool: if you use llama.cpp, gemma4 is still broken

Coding: actually I'm waiting for Qwen3.6

[-]

Lesser-than@reddit

gemma models always come with that gemma personality , qwen models just always want to get in the dirt and go to work.

[-]

Jayfree138@reddit

It's honestly so close it's going to come down to prompt engineering, parameter settings and personal preference.

A lot of people are saying Gemma for roleplay but there's a whole catalog of uncensored roleplay tuned models of all sizes so i have no idea why people are using a small gemma agent for roleplaying if that's their thing. Check the UGI leaderboard for that.

[-]

Monkey_1505@reddit

I can't speak for the the actual use thereof, but in the benchmarks it looks like the MoE and largest dense are at least close enough to merit an A/B test depending on ones usecase, but the smaller models are thoroughly worse across the board.

People do prefer those larger Gemma's in Arena though, and by a lot, so presumably they are nicer to talk to in some manner. Maybe less reasoning, better prose or such?

My AI computer is on the fritz, so haven't played.

[-]

Chupa-Skrull@reddit

It's a much better writer in English by a significant margin, at least

[-]

SmashShock@reddit

For me Qwen is working significantly better for tool use with novel tools (things unlike what you'd expect in OpenCode or Claude Code).

But Gemma is pretty fun to talk to, reminds me of the early model whimsy.

[-]

nickm_27@reddit

The duplicated tool calling is a bug that was just fixed

[-]

SmashShock@reddit

Oh sweet thanks! could you link to that?

[-]

nickm_27@reddit

https://github.com/ggml-org/llama.cpp/commit/b8635075ffe27b135c49afb9a8b5c434bd42c502

[-]

Frosty_Chest8025@reddit

Gemma4 for all. Others could just do something else.

[-]

evilbarron2@reddit

Why does the internet always funnel everything into these dick-measuring contests? How can one model be the “best” for every situation for everyone. Not to mention how trivial it is to try different models in your specific situation and figure it out yourself.

I honestly don’t get it.

[-]

Iory1998@reddit

Qwen3.5 models especially the 27B are very good at long context and summarization. It's the first family model that I can feed it a 50K conversation and ask it to compress it, and they successfully do it, respecting User/Assistant turns and keep main ideas intact. No other family model managed to do that, including Gemma-4 models.

Gemma-4-31B seems to me a bit smarter, pragmatic, and has better token management.

[-]

qwen_next_gguf_when@reddit

Gemma always wins for writing especially in the zombie apocalypse theme. No contest.

[-]

audioen@reddit

I kicked some tired today and put it to do some coding work with the 26B-A4B. The model loaded fast, inferred > 50 tokens per second, and directly run with my default speculative decoding setup that uses no LLM, just generates long sequences of tokens from the existing context as predictions. That worked, and at times the model ran 100 tokens per second when it was just echoing the code files without edits, so it was pleasantly fast.

Then I looked into what it was actually doing in Kilo Code. I had told it to make some HTML template edits, and I had the files already open in the editor which should have told the model the paths to the files I wanted to edit -- this always works with Qwen3.5 -- but for some reason it just didn't pick up the hint. This thing started looking for the files, had discovered some compiled TypeScript artifacts, which it then read in chunks because they are large, it found all sorts of crap inside, read it and then seemed to loop in reasoning and got stuck.

I guess the poor bastard just confused itself from reading all that minimized JavaScript.

I think the non-MoE model might be fine, and I can't rule out inference problems since this is the early days. Thus far the experience is a step-down, especially as Gemma-4 did not come in some suitable 120B-A8B type size which could have been competitive against Qwen3.5's similarly sized offering which to date remains the most practical model I can run on a Ryzen AI Max. Gemma 4 may well feel like going back 6 months and again having to babysit these models in agentic tasks, which is not what I'm hoping to have to do ever again.

[-]

MikeNiceAtl@reddit

Qwen (9b) beat Gemma4 (e4b) in every bench mark I’ve (made Claude) thrown at them. I’m disappointed.

[-]

superdariom@reddit

Fixes for llama.cpp are happening in real-time so things may not be fair but so far Gemma is failing to complete the complex challenge which qwen can succeed at (24gb VRAM) it's just giving up and claiming it's succeeded when it hasn't. I'm not sure things are working right through as llama seems to have plenty of bugs relating to templates and not showing the chain of thought. I was really hoping for something to boost the intelligence I've seen with qwen. Gemma is also slower.

[-]

KSubedi@reddit

Qwen is like a person that is decently intelligent but has practiced and learnt a lot from others. Gemma is like a person thats more intelligent, but may not have as much real world experience.

[-]

Extraaltodeus@reddit

4B and 9B actually work for me.

Smallest Gemma 4 sometimes refuses to do a simple web search if not asked politely enough.

[-]

maveduck@reddit

For me Gemma is the winner because it’s multilingual capacities are better. That’s important for me as English is not my first language

[-]

DrNavigat@reddit

Piorou muito neste cenário. Parece pior que Gemma 3.

[-]

Adventurous-Paper566@reddit

Gemma 4 is better in french than Gemma 3.

[-]

Mission_Bear7823@reddit

queen for coding, gemma for chat and similar stuff. ez. not sure about other uses.

[-]

lionellee77@reddit

I don't think there is a clear winner at this moment. Let's re-evaluate when Qwen 3.6 is opened.

[-]

Septerium@reddit

Why not to use both?

[-]

joleph@reddit

Or Nemotron 3 Super NVFP4?

[-]

nickm_27@reddit

For agentic tasks like Home Management and chat with tools Gemma4 is way more reliable in my experience. Qwen3.5 failed to follow instructions effectively and sometimes narrated tool calls instead of actually calling them.

Gemma4 26B-A4B has really impressed me.

[-]

Lucis_unbra@reddit

If you want glsl and maybe other languages, Gemma. Gemma seems to also have a way better hallucination rate. So it won't make things up as often.

Gemma appears to be more certain in science topics than qwen.

I've seen Qwen change course mid code, using comments to reason, and then not get it right anyways? Gemma seems to actually use the reasoning to contain all that, and it doesn't require as much of it.

Personality? Both are ok, Gemma seems to be a bit more levelheaded? It seems to understand my intent better than Qwen, at least so far. But it's early. They're close enough overall that one will have to try both and decide based on own observations.

[-]

gpt872323@reddit

Qwen 3.5 this time.

[-]

kidflashonnikes@reddit

Qwen 3.5 is the overall winner / where Gemma 4 really wins is the small models. Google cooked but the qwen architecture later for attention is really good, like really good

[-]

cibernox@reddit

I need to test how the small ones do in tool calling/RAG which is my primary use case

[-]

Jxxy40@reddit

I personally use Gemma for any daily tasks, Qwen just for coding. I'm considering fully migrating to Gemma next week.

[-]

Hot-Employ-3399@reddit

Qwen feels better for coding and in tool calling(at least in moe, haven't tried dense gemma model)

For some reason instead of passing array of strings if sometimes passes shitty string as "["Task 1: say "hello world"", "Task 2: say "bye, world""]" which can't be decoded normally as nothing is escaped. Sometimes it works fine (["."]).

Qwen understand it well.

[-]

segmond@reddit

Yes, the users are the winner. Pick whichever one that works for you and the one you like. They are both great models. I long posted a comment on here that at this point, these models are so good that folks would be better served spending their time using it than arguing bout which one is better.

[-]

JacketHistorical2321@reddit

Figure out what works best for you and that's the winner. This sub is becoming a huge benchmark circle-jerk where discussions are more centered on the new and shiny and less on practical use or innovation

[-]

jzn21@reddit

For my workflow (data separation and Dutch text correction) Gemma 4 31b is much better than Qwen 3.5 27b.

[-]

No_Conversation9561@reddit

In my usage with Hermes agent, Gemma4 MoE > Qwen3.5 MoE.

[-]

sleepingsysadmin@reddit

My personal benchmarking confirms the 77% livecodebench for 26b. Which places it around gpt20b high in strength. Good, but very meh, but Term Bench Hard places 26B below Qwen3.5 4b. Which means 26b is worthless. Lets just forget it exists. A4B is rather poor, I was expecting big intel boost for that tradeoff, but man we didnt get that.

So with the independent benchmarking

31b vs 27b.

Now there's a big debate. Google's numbers suggested that the model is less than 27b, but indie benchmarks place it slightly ahead in some places.

Term Bench Hard; one of the most important benchs to me.

Minimax: 39%

31B: 36%

27B: 33%

Tau Telecom:

Minimax: 85%

31B: 60%

27B: 94% WOWZERS

Long Context:

Minimax 66%

31B: 18%

27B: 20%

Obviously running Minimax at home isnt all that plausible. However, 1x 5090 can run either of these. It seems to me that you probably have to keep context length on these models below 128,000, even if you have the available vram. It'll get dumb over that.

Otherwise, very similar capability. So probably going to come down to personality.

[-]

VoiceApprehensive893@reddit

qwen for coding/math/tool usage
gemma for knowledge,rp and writing

[-]

FinBenton@reddit

For prose, gemma is the clear winner hands down, for coding and other stuff, I think qwen will be the winner.

[-]

LirGames@reddit

Still Qwen3.5 27B for me in coding tasks. I've been trying to run Gemma4 with Roo Code but keeps on getting stuck even with the latest llama.cpp and updated gguf from unsloth. Chat works though. I will try again in a few days.

[-]

Prestigious-Use5483@reddit

Qwen3.5 27B on my PC Gemma 4 E2B on my phone

[-]

gpalmorejr@reddit

The benchmarks seems to suggest that Gemma4 really didn't give us anything more than Qwen3.5. Also, Gemma4 wouldn't even load in LMStudio with Llama.cpp. So there is that. Not sure about others but with only a few niche weirdnesses when using Qwen3.5-9B and smaller (and they are still really good), Qwen3.5 has been flawless for me for everything from simple conversations to college EM Physics problems to refactoring this ancient git repo to update it and play with it. And that is with me running it on ancient and underpowered hardware. So my vote is still Qwen3.5 for now, but since Alibaba has had a sudden change of approach, we'll see.

[-]