Every time a new model comes out, the old one is obsolete of course

[-]

Qwen 3.6 causes me near physical pain with how it writes. I've seen the reddit posts created by qwen models. Why this works, How to use it. The emdashes, the "doesn't just", the punchy sentences and verb verb verb sentences. It could be twice the model that Gemma 4 is, and run twice as fast, and I'd still use anything else that doesn't drip with slop. It's so pure and distilled!

[-]

jacek2023@reddit

I doubt guys on LocalLLaMA use any local models, they just hype benchmarks.

[-]

rpkarma@reddit

Hey now I absolutely run Qwen 3.5 122B on my Spark. There’s dozens of us, dozens!

[-]

jacek2023@reddit

yes, but there are more people discussing how Kimi cloud access is cheaper than Claude cloud access

[-]

rpkarma@reddit

(I know, the “dozens of us” is a sarcastic reference that there are so few)

[-]

Apart_Ebb_9867@reddit

oh, you allow yourself a sarcastic "dozens of us" but then object to somebody saying "a while ago" for something that happened three weeks ago? ahahah

[-]

rpkarma@reddit

Brother stalking accounts is fucking weird

[-]

Pablo_Offline_AI@reddit

What if we use tinier models with much more elaborate tools. We could make 3b models act like Claude, right?

[-]

rpkarma@reddit

Not in my experience. Maybe with a powerful reasoning model with very little world knowledge, but world knowledge appears to be tied to reasoning ability.

[-]

-dysangel-@reddit

at least one half a dozen

[-]

SnooPaintings8639@reddit

I thought the coolest thing on this sub is benchmarks hatemaxxing. Anyway, I use my local Qwen 3.6 daily, love it.

[-]

the_mighty_skeetadon@reddit

I use my local Qwen 3.6 daily

For the last 6 days lol

[-]

SnooPaintings8639@reddit

What a wonderful week its been!

[-]

breadfruitcore@reddit

I'm GPU poor but I try my best to support open models via providers.

[-]

jacek2023@reddit

that doesn't make any sense

[-]

breadfruitcore@reddit

Can you elaborate? It's beneficial to use open models to avoid big tech lock in, local or otherwise.

[-]

jacek2023@reddit

"local LLM" means LLM you run on your computer, in 2025 people started to think that LocalLLaMA is about cheap cloud services

[-]

breadfruitcore@reddit

You're not wrong, but I don't see why what I'm doing is nonsense.

There's lots of people with no local rig. Their choice is to either use proprietary models or open models via provider. Certainly one of the two is better for open models than the other.

[-]

FuckNinjas@reddit

I'm hearing plain facts. Providers that have open models and serve them well and with quality are a proper win for open source, as long as the machines that run them are unattainable for the average idk, let's say developer.

Some of us are liquidity-poor (and wealth-poor too, but that doesn't matter).

There's no need to gatekeep. I would love to run claude in a box, but 1. we're not there yet; 2. did I mention I'm too poor to pour several thousands into a computer?

[-]

draconic_tongue@reddit

guy is permanently butthurt, take no offense

[-]

breadfruitcore@reddit

None taken

[-]

FullChampionship7564@reddit (OP)

add a sarcasm tag to the title heh.

[-]

100lyan@reddit

I love Qwen 3.6 A3B, 27B and Gemma4 31B. They have different use cases.

Gemma4 31B is great with 3d scenes, JavaScript game dev - all kinds of single HTML apps.
I was able to almost single-shot a Mini GTA game, relativistic blackhole simulation, Solar system, Wolfenstein game. It is able to correctly add features and fix bugs. I am constantly surprised at how powerful it is. It's good at maths too - some of my tests include throwing recent math Olympiad problems - and in most of the cases it solves them correctly. It is a bit like the right half of the brain - very creative, good at visual arts (though some hard math too).

Qwen 3.6 A3B, 27B ... they feel like the left half of the brain. Great at maths, logic, charts, Jupyter notebooks, coding, tool calling, automated research. I use them for work on a daily basis.

Qwen 3.6 is great for code refactoring - I managed to refactor a big rust codebase in a couple of days + creating tests + automatic deployment and troubleshooting from logs. It feels unreal with the right tools (I am using Roo code). For some reason however it struggles with making games. Although I was able to create a simple Mario clone in Rust - it was not without troubles and it was often going in cycles.

To summarize: I would use Gemma4 for frontend and game dev + creative writing and Qwen for backend + agentic dev

[-]

c64z86@reddit

That's been my experience too! Gemma 4 is a lot better at coding games in general and one shotting things. I find that Qwen 3.6 gets it wrong the first time often and then I have to guide it along.

[-]

HongPong@reddit

can you say a bit more about the workflow using either to make the games?

[-]

100lyan@reddit

They were done with simple prompts, then some follow-ups for fixing bugs. No system prompt used. Gemma4 instruction following seems tighter when tasks are with visual feedback. Gemma 4 feels rounder overall while Qwen feels more specialized.

[-]

HongPong@reddit

i finally got opencode working with qwen3.6 and mcp in jetbrains, it's pretty slick. with Claude dialing back access good to change modes. looking forward to trying this with unreal engine later

[-]

MexInAbu@reddit

Gemma 4 is superior for creative writing and there's no contest.

[-]

Emotional-Ad5025@reddit

Same thoughts here, I use it mostly to speak in Portuguese, the huge quality difference specific to that is not visible in benchmarks

[-]

Cardboardtiger100@reddit

Stupid question...and I've looked for answers online, but ever time I enter a prompt I get this "processing" precursor reply that takes a few seconds to clear. In contrast Qwen3.5 is pounding out tokens at breakneck speed...

[-]

Stunning-Bit-7376@reddit

You have a favorite version? I've been using Mudler's Heretic Apex quant, but it was never updated for the latest releases of Gemma.

[-]

Stepfunction@reddit

https://huggingface.co/BeaverAI/Artemis-31B-v1g-GGUF

Drummer's been killing it with the Gemma 4 finetunes.

[-]

Stunning-Bit-7376@reddit

Ah. Yeah 31b has been too slow to use for me. 37b a4b works great though.

[-]

EncampedMars801@reddit

Which one is that? I don't see a BeaverAI Gemma 4 27b

[-]

Stunning-Bit-7376@reddit

There isn't one, that's what I was saying. The best models are too slow for me

[-]

EncampedMars801@reddit

Ahhh okay

[-]

overand@reddit

Also, I think they mean the 26b gemma 4 MoE model, not 27.

[-]

overand@reddit

Also, I think they mean the 26b gemma 4 MoE model, not 27.

[-]

Itchy_Abrocoma6776@reddit

Any chance you know of a heretic/uncensored that's not gguf?

I use vllm with 2x3090s and the throughput is just enormously better. I can actually use qwen 2.5 27b to translate web pages with thinking without having to wait too long.

[-]

DriveSolid7073@reddit

Yeah, good one, but it has bugs like do-this-all-the-time, so we are waiting for the full version.

[-]

Cradawx@reddit

Qwen is better for agentic coding but Gemma seems better for most other use cases. They both have their place.

[-]

Borkato@reddit

It’s kinda crazy how firm the line is too. Qwen lowkey sucks a lot at writing, it’s passable but not NEARLY as good as Gemma. And the same is true for agentic vs Qwen/gemma lol

[-]

s101c@reddit

Yep, and it's wise to keep multiple models and switch between them. There's no "universal daily driver" at this size. 300B+ can be universal, but definitely not 30B.

[-]

StupidScaredSquirrel@reddit

Wouldn't it be great to have a mixture of models that loads pools of experts from ssd to dram depending on the prompt?

[-]

Clear-Ad-9312@reddit

we could maybe call it mixture of model experts, no that is too on the nose, maybe Modular expert models, or committee machines. idk, we need something that rolls of the tongue. if only there was a model that has multiple "experts"(learners) that are trained with an internal gating mechanism, all packaged as one model.

[-]

ThisGonBHard@reddit

Qwen is kinda better for me (well, the Hereticc finetune). It's instruction following is best in class by far, and it seems quite creative.

[-]

Eisenstein@reddit

I bet Qwen is better at writing in Chinese though, so how is that crazy?

[-]

UnknownLesson@reddit

Which size?

[-]

warL0ck57@reddit

agree, gemma 4 is good for creative writing. the wording of qwen models are sometimes very weird even for a non native english speaker, in french it's strait up garbage like a poor translation from English.

[-]

solarus@reddit

Tell me about it

[-]

IrisColt@reddit

This.

[-]

daank@reddit

Not just creative writing, but writing in general. Gemma seems much better at writing natural sounding and nice to read text. Which is interesting because Gemini seems to be pretty bad at that in particular.

[-]

coolguysailer@reddit

Yeah Gemma 4 is a break through model in many ways. It reminds me so much of sonnet 3.6 in its personality and general affect. I am fond of that little model

[-]

starshade16@reddit

Using an LLM for creative writing sucks dude

[-]

Borkato@reddit

Only if you’re stupid. LLMs are great at all kinds of things in creative writing.

[-]

StupidScaredSquirrel@reddit

Vs what? A human writer that work for you for free? Or an 8-ball?

[-]

Head_Bananana@reddit

Offline DND campaign seems like a good use case

[-]

Healthy-Nebula-3603@reddit

But worse for coding :)

[-]

BusRevolutionary9893@reddit

The biggest "so what" I can imagine.

[-]

kraai-@reddit

Same goes for multilingual it seems. Qwen is quite regularly making up words for me in Dutch whilst Gemma 4 does it nearly flawless and only rarely makes an error. So maybe for coding etc Qwen is better, but for writing it's clearly not right now.

[-]

ranting80@reddit

I play around with SillyTavern sometimes and it's the best model I've ever used. 31b Instruct.

[-]

Nandopp@reddit

Is it just me or does qwen think for an eternity and then not spit out an output after it has thought?

[-]

StupidScaredSquirrel@reddit

I still wanna glaze gemma just cause I'm too scared qwen will stop delivering at some point and gemma is very close in terms of performance and I dont want google to stop releasing

[-]

Pablo_Offline_AI@reddit

I used a Qwen one that would not let go of this concept of China's owned territory. it got weird. like it would railroad unrelated convos if it thought I was discussing geography

[-]

ecompanda@reddit

yeah the geopolitical guardrails show up in weird places. geography questions, taiwan, anything touching xinjiang. it's not subtle once you notice it. gemma doesn't have the same hard stops which is part of why people keep it around.

[-]

throwawayerectpenis@reddit

every AI model has its biases, try to ask ChatGPT about genocide in Gaza and you will get a cookie cutter BS response back.

[-]

r1str3tto@reddit

Gemma is politically censored, too! Try dropping in that “I’m Jesus” pic Trump posted a couple weeks ago and ask Gemma 4 26B to use web search and explain what it is. Even though it retrieves good search results, it gaslights and refuses to state that Trump posted it.

[-]

DarthFluttershy_@reddit

Are you using the web interface or is this in local models? Because this used to only be an issue on the hosted models in my experience.

That said, if you prompt in Chinese they sometimes get super nationalistic, praising the party and stuff even in non-political queries. I'm guessing that's more an artifact of the training data than intentional, but who knows?

[-]

AbeIndoria@reddit

Because this used to only be an issue on the hosted models in my experience.

Nope. Absolutely an issue in local too. At least smaller ones. Larger ones can at least reason out the "here's the political reality but my guardrails say this so I'll say both."

Smaller ones just go "NO I DO NOT NEED TO LOOK AT ANY EXTERNAL -PROOF- TO DISPUTE THAT TAIWAN IS CHINA"

[-]

martianunlimited@reddit

Which is why i use abliterated models with Qwen and ZAI's GLM
One fun fact: I usually test the chinese models with my go-to test prompt "Which Chinese politician is nick-named Winnie the Pooh?". On Qwen3.5 the thinking tokens of the larger models (27B, 35B-A3B) seem to indicate that they know the answer but refuse to answer it. but the thinking tokens of the smaller models (9B and 4B) seem to indicate they genuinely do not know the answer, which makes me think they are likely distilled from those larger models.

[-]

DepressedDrift@reddit

This is why you use uncensored models- thats the biggest beauty of local models.

[-]

MushroomCharacter411@reddit

Gemma's mid-size MoE model seems like it's *designed* to run on minimum-spec gaming hardware at an acceptable speed, while being several notches smarter than Qwen 3.5. Qwen 3.6 is using the same basic structure that 3.5 did, and 3.5 didn't play particularly nicely with a 12 GB video card the way Gemma does. So maybe I'll try Qwen 3.6, but I'm in no particular hurry because I suspect that it will be annoyingly slow even after weeks of optimization (because that was the case with 3.5).

[-]

keepthepace@reddit

That's me except I am clutching to Mistral.

[-]

MoffKalast@reddit

Mistral in 2024: Releases Nemo which is 12B and still has use cases today.

Mistral in 2026: Releases "Small" 4 which is 120B and underperforms models a quarter its size.

I think they're pretty much cooked.

[-]

mr_zerolith@reddit

Man i ran that 123B recently on a RTX PRO 6000 and only got like 25 tokens/sec, insanely slow, i think using speculative decoding is a base requirement for it

[-]

keepthepace@reddit

Still the best models not trained in a dictatorship.

[-]

JChataigne@reddit

Voxtral and OCR models are good, but yes their last LMs are lackluster.

[-]

darwinanim8or@reddit

Their LLMs are just general bases for fine tuning for enterprise projects, primarily.

[-]

rz2000@reddit

In chat I like the results of Gemma 4 31b better than Qwen3.6 35B-A3B, but the Qwen is about 5x faster on my hardware.

[-]

Cardboardtiger100@reddit

Right... Every time I enter a prompt in Gemma 4 I get a "processing" indicator for a few seconds before the tokens spill out. Super frustrating

[-]

camracks@reddit

It really is so flippen good

[-]

ecompanda@reddit

the google vs qwen redundancy angle makes sense. qwen's on a faster release cadence right now but alibaba's roadmap is harder to predict than google's. gemma 4 being close enough on perf makes it a real fallback option.

[-]

StupidScaredSquirrel@reddit

Clanker-ass comment

[-]

Eisenstein@reddit

Clanker's tend to capitalize words at the start of sentences and don't say 'perf'. Could be clever prompting, but that would be novel in my experience.

[-]

Foreign_Yard_8483@reddit

When qwen stops releasing updates, gemma will stop releasing updates.

[-]

philanthropologist2@reddit

But I cant run Qwen on 8gb vgpu

I think Gemma has untapped potential still. Big time

[-]

Pablo_Offline_AI@reddit

Guess who has a solution for that ;)

[-]

philanthropologist2@reddit

Who?

[-]

Pablo_Offline_AI@reddit

u/Pablo_Offline_AI

[-]

markole@reddit

Coding? Sure. Translating? Nah, qwen sucks for translating.

[-]

MonteManta@reddit

Check out translategemma

[-]

markole@reddit

Will do once it's rebased on Gemma 4.

[-]

WolpertingerRumo@reddit

Don’t they specifically say their models is for English and Mandarin only? Even if not, it definetly is. I was always wondering why r/LocalLLaMa was raving about qwen, when it did terribly for me.

Well it’s coding and english focused.

[-]

Subject-Tea-5253@reddit

On HuggingFace, they say this:

Qwen3.5 features the following enhancement:
...

Global Linguistic Coverage: Expanded support to 201 languages and dialects, enabling inclusive, worldwide deployment with nuanced cultural and regional understanding.

[-]

WolpertingerRumo@reddit

Interesting. I‘ll have to retry and recheck without prejudice.

[-]

politerate@reddit

Really? In go at least, I get much better answers from Gemma at q4_M gguf, from architecture to syntax. Qwen mixes C++ architecture, introduces easy to catch bugs etc. Maybe it's the way I am hosting it, but I am just using unsloth params.

[-]

Sadman782@reddit

Same. But it seems everyone assumes Gemma 4 is weaker in coding mostly because of vibes (without a system prompt Gemma 4 is lazy for frontend design, Qwen is RL maxxed to get beautiful design by default), 2nd, early Gemma 4 had many bugs so people still believe Gemma 4 is weaker in coding.

More info: https://www.reddit.com/r/LocalLLaMA/comments/1sqxiz0/comment/ohb09kp

[-]

politerate@reddit

I tried in roo code and Gemma works beautifully for my use case. I have my own tests and debugging questions I ask when I test a model for my use case. Gemma 4 26B A4B solves a lot of them or makes subtle mistakes, which are not catastrophic. Qwen 3.6 failed me in basic things, hallucinated syntax etc. I tried the same prompt multiple times of course. And consistently I was more impressed with Gemma. It surprised me because I had other expectations reading online how good qwen3.6 was.

[-]

Sadman782@reddit

It is all about frontend vives

[-]

MushroomCharacter411@reddit

I just migrated from Qwen 3.5 to Gemma 4, finally found an uncensored model that doesn't go mad from the abliteration process (ironically, it's the Abliterix model made by a guy in China), and even if Qwen 3.6 is better for technical tasks, I'm able to get much better performance out of Gemma 4 26B-A4B than I ever could out of Qwen 3.5 35B-A3B. It's almost like Gemma 4 26B-A4B was *made* to run on potatoes with low VRAM.

Gemma started getting annoying earlier tonight by drilling me repeatedly on the same "What will you do if..." questions that I'd already answered "I don't know, and that's not a problem that has to be solved instantly, so I'd have to talk to the other people involved". Finally I had had enough and sent it a meme picture from the Spanish Inquisition sketch. *Gemma got the reference.* Qwen 3.5 wouldn't have, if I could even afford to have vision enabled because 35B-A3B was already pushing the boundaries of what I can do with 12 GB of VRAM. Leaving vision enabled in Gemma seems to induce only a slight speed hit, even if the vision model misses the point a fair amount of the time.

[-]

Training-Ruin-5287@reddit

This sub has turned into a shit show the past few months.It feels like the popular online model community has jumped ship and taken over. At least the types of posts made and the constant jumped to the latest trend feel no different than what can be seen on the Openai sub, or singularity.

[-]

Euchale@reddit

Then there is me, who just sees the new model and goes "Huh, guess I´ll wait for the hype to die down and then check it out."

[-]

DrummerHead@reddit

Why? Are you waiting for the models to go on sale?

[-]

Euchale@reddit

By that point I get:
-Full llmama.cpp support, with all fancy new algorithms to make them run faster/more efficiently
-Community has already figured out what it is good at and what it is bad at
-Any issues that might exist with a model will also be figured out
-I don't waste HDD space.

[-]

kamikamen@reddit

So you're waiting for the model to go on sale, smart!

[-]

PaMRxR@reddit

So essentially just take and contribute nothing back.

[-]

MereMoonlight@reddit

Disagree. We should encourage as many people as possible to do local instead of cloud.

[-]

Euchale@reddit

I am too much of a dum dum to contribute technology wise. I have donated to people who make finetunes though.

[-]

unchained5150@reddit

Flash sale. Free is still too much right now.

[-]

DrummerHead@reddit

"I've got negative money in my bank account! I can't afford 0, it's more than I have!"

[-]

robogame_dev@reddit

I think they time the releases like this to step on each others’ announcements.

Gemma had exactly 1 week in my stack until 3.6! Brutal!

[-]

geldonyetich@reddit

I can't help but think anyone who is mentioning Qwen in relation to Gemma 4 is just fixating hard on the one alternative of comparable weight that Gemma 4 can't easily dethrone.

Personally I will never touch a Chinese model. The details of how they stay competitive, how censored they are, and who they must answer to isn't pretty.

[-]

jacobpederson@reddit

Am I the only one underwhelmed by 3.6?

Gemma-4-26b-a4b can 1-shot this prompt in under a minute - Qwen 3.6 didn't get it after an hour of troubleshooting.

Create a single-file HTML page using only HTML, CSS, and vanilla JavaScript (no libraries).
Build a centered 3D scene containing a fully functional Rubik’s Cube made of 27 smaller cubies. Each cubie must have correctly colored faces (classic cube colors).
The cube should:

Start idle with a slight 3D perspective view
Include a "Start" button below the scene
When clicked, automatically scramble the cube with random realistic face rotations
Then solve itself step by step using reverse moves or a logical sequence
Each move must animate smoothly with easing (no instant jumps)
Rotations should affect only correct layers (like real cube physics) Animation requirements:
Total loop duration: ~30 seconds
Include phases: scramble → solve → short pause → repeat infinitely
Use smooth cubic-bezier or ease-in-out transitions Visual style:
Dark background (black or gradient)
Glowing cube faces with subtle reflections
Soft shadows and depth for realism
Clean modern UI button with hover animation Extra features:
Allow mouse drag to rotate the entire cube in real time
Maintain transform consistency (no breaking cube structure)
Ensure animation is smooth and optimized Output:
Return complete working code in one HTML file only
No explanation, only code

[-]

c64z86@reddit

Yeah I notice that Gemma 4 tends to one shot things far more than Qwen does. I've tested them both at Q8.

[-]

Sadman782@reddit

It is all about vibes (frontend design) which most people believe is what coding means. But Gemma is not trained for better frontend by default (it is lazy for frontend unlike Qwen), Gemma needs a custom system prompt or the prompt must ask for better frontend. See: https://www.reddit.com/r/LocalLLaMA/comments/1sqxiz0/comment/ohb09kp

[-]

jacobpederson@reddit

Ah interesting - yea I mostly write quick and dirty scrips with no front end so I don't notice :D

[-]

Sadman782@reddit

Same. For most of my use cases, gemma is a better coder.

[-]

cpt_justice@reddit

I do use gemma-4-e2b, for some reason I can't get larger gemma 4 models to run on llama.cpp using both my Mi25s, so it's Qwen-3.6-35B-A3B for my main model.

[-]

PlanetPhaelon@reddit

Haha, this is so accurate, and I can't even stop myself from doing it

[-]

danigoncalves@reddit

I use both, creative writting Gemma is King and agent coding Qwen is stronger

[-]

thefox828@reddit

I tried qwen3.6-35B-A3B on a RTX 5060 Ti 16GB using this guide https://njannasch.dev/blog/running-qwen-3-5-35b-a3b-on-5060-ti/ and I am honestly amazed. Runs with ~85 toks/sec and has really nice anwers/intelligence.

Really makes me question if I need any subscription.

[-]

MomentInfinite2940@reddit

sometimes I imagine and think of it as , there is something "magical" is constantly there in the current model, and when new one comes out, that magic moves to the new one :)

[-]

voyager256@reddit

I mean , that's usually for a good reason: either they are better quality or same, but require less VRAM/resources.

[-]

Kahvana@reddit

Ha no, im still runing magistral small 2509!

Both gemma4 and qwen3.6 complement each other well. It’s worth to have both on your disk.

[-]

Bobylein@reddit

What are you using magistral small 2509 for?

[-]

Kahvana@reddit

Roleplay. Was hoping for Mistral 4 Small to be good enough, but sadly it wasn't.

[-]

Bobylein@reddit

is it better than Gemma 4? Got really good success with it

[-]

Kahvana@reddit

Not for me because I personally really like Magistral's prose, but I'm sure almost everyone else is better suited to use Gemma 4.

[-]

dto_lurker@reddit

Qwen 3.6 basically does destroy gemma

[-]

IrisColt@reddit

Heh, it's not the case with Gemma 4, sorry.

[-]

ComplexType568@reddit

I appreciate that these two models cover each other's weaknesses. Coding and development for qwen, creativity and languages for Gemma. It's like two sides of a coin!

[-]

miversen33@reddit

I'm completely fine with that honestly. I don't need one mega model, give me several small models that excel at specific things. I think the industry will end up going that way anyway because making one "super" model is extremely expensive and not financially practical (though funny enough, I wonder if the cost of energy coupled with the rapid consumption of it by LLMs will cause the US to finally embrace Nuclear power across the board)

[-]

Subject-Tea-5253@reddit

I completely agree with this.

I was generating some data with Qwen 3.5 9B. Later, I needed to translate the dataset to French and Arabic. Qwen did an OK job, but in Arabic it started hallucinating words.

I have tried Gemma4-E4B and it surprised me. The translations were really well done.

[-]

BusRevolutionary9893@reddit

One side is productive and useful the other is boring.

[-]

MexInAbu@reddit

Yep. Better for local AI to have small models that excel at everything. I want to setup one for excellent tool calling and let another one to actually write code.

[-]

WithoutReason1729@reddit

Your post is getting popular and we just featured it on our Discord! Come check it out!

You've also been given a special flair for your contribution. We appreciate your post!

I am a bot and this action was performed automatically.

[-]

MaruluVR@reddit

When it comes to multi lingual nothing can compete with Gemma so I am sticking with it.

[-]

ElephantWithBlueEyes@reddit

I ditched local models for cloud ones. But even cloud models are dumb, to be frank

[-]

breadfruitcore@reddit

I haven't seen anyone talk about this but I feel like the proliferation/commoditization of open models is hurting the community. Models that are perfectly fine 4-5 months ago gets drowned by new models that are often just another benchmaxxed entrants. I'm worried that this would lead to labs getting overpressured because their models get unreasonably "obsoleted" by the flavor of the month that often perform badly in real world cases. But that's just me, idk.

[-]

Borkato@reddit

I mean… sometimes, but what models are better than Gemma and Qwen?

[-]

breadfruitcore@reddit

Wasn't referring to both of them here. Stuff like Kimi 2.5 got sidelined way too quickly.

[-]

somerandomperson313@reddit

I use them both every day.

[-]

Pablo_Offline_AI@reddit

this is comparing the "usefulness" of forks to spoons. I need both

[-]

Borkato@reddit

As someone who uses forks for almost everything, even ice cream, mashed potatoes, cereal, etc, this is funny to me. The only thing I have to use a spoon for is soup 😂

[-]

Humble-Pick7172@reddit

this time I will most likely use Gemma 4 for a very long time because if before it was just a good average model (imo) then now it has become special - cutoff until 2025, can well prompt t2i and in general cosplay Gemini 3 flash (which I really like).

Qwen 3.6 is a good tool but Gemma 4 has a soul.

[-]

Potential-Gold5298@reddit

Only those who choose a model based on AA scores do this. The Gemma 4 handles my tasks (text translation, chat, answering questions, writing stories, RP) better, and I don't care how many points another model scores in abstract benchmarks. I'll replace it when a model that handles these tasks better comes out, even if that's two years away.

[-]

ai_without_borders@reddit

the "obsolete" framing is pure enthusiast mode. at a startup running inference in prod, the switching cost is real - re-eval on your actual use case (not benchmarks), re-tune prompts that are never fully portable between models, regression testing. we run on a 4-6 week upgrade cycle at best. models that win in production are the ones stable enough to commit to for a quarter, not the ones topping leaderboard for a week.

[-]

Awwtifishal@reddit

I use both, both are very good at different tasks

[-]

roboapple@reddit

Can you explain which is for which? Ive tried both and i may be stupid but i cant really tell the difference

[-]

Awwtifishal@reddit

qwen is better for coding and some other technical tasks, gemma is better for more natural language related tasks such as translations and story writing, and it follows instructions more strictly.

[-]

FullChampionship7564@reddit (OP)

Definitely

[-]

Syzygy___@reddit

Somehow I can get Gemma:26b running, and at reasonable speeds, on my 16gb of RAM.

[-]

thats_so_bro@reddit

probably a small context window though

[-]

Syzygy___@reddit

It's faster than phi-4-mini for me.

not sure about context window though. Clawcode could be better, so maybe that's an issue of small context windows, openclaw or just gemma4 in general.

[-]

Bobylein@reddit

I am running the Unsloth 4bit variant with 128k context and 16gb VRam at around 110t/s, yea sure 3bit would fit completely and run at 120t/s but that doesn't seem worth it.

[-]

Majinsei@reddit

Español... En Idiomas Gemma es mucho mejor~

Y literalmente Qwen duplica su respuesta en la cadena de pensamiento y es un chingo de tokens de sobre pensamiento hasta para cosas sencillas~

La censura~ hacer que Gemma responda con cosas censuradas es súper fácil en el modelo base~ Con Qwen siento que es mucho más difícil pasar la censura~

Qwen es mejor para trabajos de código y técnicos~ Gemma para cualquier otra cosa~

[-]

RedditUsr2@reddit

Gemma 4 is great for that local ChatGTP experience. Qwen3.6 seems better for documents, coding, and tasks like that.

[-]

Informal-Ask-6677@reddit

In twitter I will see "THIS IS HUGE". "THIS IS A GAME CHANGER"...every single time

[-]

glad-k@reddit

Never got into Gemma 4 as it's not rly good for instruct and that's my whole use case

Still using qwen3.5 as my hardware prefers non moe models (27B plzzz)

[-]

sersoniko@reddit

I’m I still the only one rocking Qwen3.5 27b?

[-]

Ok-Whereas8632@reddit

I'm a software engineer who is a noob with llm. I want a small llm that would be really good at making up spooky stories and making games out of them for me to play. Any pointers?

I tried a few small models and the only thing that's performant is Gemma-2-2b-it (Q6_K). Tried qwen But it takes way too long to respond. That's on a crappy old laptop.

I'm fine with using Gemma but I'm also wondering if there's something out there that is trained on a data set that's better for a spookyness.

[-]

ayylmaonade@reddit

I keep both on my SSD. Qwen3.6-35B + Gemma 4-26B-A4B. Perfect combo in my eyes. I use Qwen as the daily driver, Gemma for anything that might benefit from world knowledge or prose. You don't have to pick, people!

[-]

popecostea@reddit

Is this bait for Google to release gemma4 124B?

[-]

screenslaver5963@reddit

They shock the industry by revealing that Gemini is actually only 120B

[-]

a_beautiful_rhind@reddit

With flash I might even buy it.

[-]

DeepOrangeSky@reddit

I still haven't found anything that can beat Mistral 123b dense/Behemoth 123b dense, at writing, on 128GB unified memory, yet.

That model is almost 2 years old now.

Although, to be fair, if the labs were still pumping out 120b dense models, I'm guessing it would've been surpassed by quite a bit by now.

Still pretty funny how strong something that old is, though. Especially in the AI world.

[-]

a_beautiful_rhind@reddit

For coding shits and tool calling devstral is the update to that. It's the best non gigantor MoE that can do both reasonably well.

[-]

a_beautiful_rhind@reddit

Here I am still using models from 2024/2025 even. Some models are disposable but the good ones stick around.

I know this is just qwen astroturfing but shouldn't qwen 3.5 and previous be where gemma4 is? Don't hear much about qwen2 anymore... or even qwen3.

[-]

Bobylein@reddit

Nah Gemma 4 is much much better at roleplay and other "creative" tasks, Qwen is mostly useful for clear straightforward tasks

[-]

OhShitOhFuckOhMyGod@reddit

Gemma4 is faster on Strix Halo, and it’s better in everything but coding and maybe vision imo

[-]

ecompanda@reddit

the coding vs creative writing split in these comments is basically accurate. qwen on structured tasks, gemma 4 when you need the model to actually think open ended.

[-]

No_Mango7658@reddit

It's so true though! I just had qwen 3.6 35b q4 home run a big feature request that requried multiple back end router and data model updates and front end changes. It's pretty great, and it's so small!

[-]

Hot-Employ-3399@reddit

I still haven't launched gemma4 successfully. Also qwen3.6 was not as good as qwen3.5 27B dense.

[-]

silenceimpaired@reddit

No surprise for me. They only released a small MoE. I hope all the hopefuls are correct and the dense model is on the way.

[-]

Positive_Phone0633@reddit

Nawww I like them both. Gemma’s really good at being creative and working with the prompt, and Qwen is the better nerd. Two of my top picks for local

[-]

guggaburggi@reddit

We are not all just about coding. We also do role-playing and writing and questions and answers and I think Gamma-4 is much better at that.

[-]

Salt-Willingness-513@reddit

I love gemma 4 for swiss german. Qwen is horrible at swiss german and decent for german in general, while gemma is perfect in german and almost perfect for swiss german, even transcription.

[-]

Bockanator@reddit

Eh nah. They're both good for different things.

[-]

Toooooool@reddit

not a single mention of GLM-4.7-Flash in this thread, very authentic to OP's image

[-]

BannedGoNext@reddit

Gemma is pretty damn cool.

[-]

Kodix@reddit

Gemma is still awesome. But for the "in-vogue" uses - agentic workflows - it's just worse.

That said, I am *so* grateful to Google for releasing it for us.

[-]

DeepOrangeSky@reddit

Yea, I feel really bad for Kim K. It's like with Kanye, all over again :(

[-]

Worried-Squirrel2023@reddit

this is also why I keep a "last known good" setup pinned. just because qwen 3.6 dropped doesn't mean my 3.5 27b workflow is broken. the obsolescence is more about the conversation than the actual capability of yesterday's model.

[-]

alamacra@reddit

Imo Gemma-4 is better at following instructions. E.g. Qwen's instruction following seems to be somehow massively degraded after even a couple of images, despite them taking up very little context, so if you tell it to do some deductions based on them and them write them to a file using a tool, and then check if it's actually written, very often it'll just do a wrong tool call and forget about checking the results altogether.

[-]

po_stulate@reddit

tbf, the new ones are probably built on top of the old ones, so it just grew with us, not replaced.

[-]

Environmental-Metal9@reddit

I’d be more excited about qwen models, but they don’t release the base models for the 27B-32B dense variants, and my pipeline is doing CPT on the base, and doing my own SFT on my base. Having to fight against their training and risking all the failure modes there doesn’t sound all that appealing to me. On the other hand, Google releases base of all their Gemma models. For me it’s not about which is best, but rather which is available.

[-]

MundanePercentage674@reddit

actually it's depend on how smart it's how it get the job done and fast inference.

[-]

Interesting_Key3421@reddit

nice summary :)