Every time a new model comes out, the old one is obsolete of course
Posted by FullChampionship7564@reddit | LocalLLaMA | View on Reddit | 189 comments
Posted by FullChampionship7564@reddit | LocalLLaMA | View on Reddit | 189 comments
Cute_Baseball2875@reddit
Lakius_2401@reddit
Qwen 3.6 causes me near physical pain with how it writes. I've seen the reddit posts created by qwen models. Why this works, How to use it. The emdashes, the "doesn't just", the punchy sentences and verb verb verb sentences. It could be twice the model that Gemma 4 is, and run twice as fast, and I'd still use anything else that doesn't drip with slop. It's so pure and distilled!
jacek2023@reddit
I doubt guys on LocalLLaMA use any local models, they just hype benchmarks.
rpkarma@reddit
Hey now I absolutely run Qwen 3.5 122B on my Spark. There’s dozens of us, dozens!
jacek2023@reddit
yes, but there are more people discussing how Kimi cloud access is cheaper than Claude cloud access
rpkarma@reddit
(I know, the “dozens of us” is a sarcastic reference that there are so few)
Apart_Ebb_9867@reddit
oh, you allow yourself a sarcastic "dozens of us" but then object to somebody saying "a while ago" for something that happened three weeks ago? ahahah
rpkarma@reddit
Brother stalking accounts is fucking weird
Pablo_Offline_AI@reddit
What if we use tinier models with much more elaborate tools. We could make 3b models act like Claude, right?
rpkarma@reddit
Not in my experience. Maybe with a powerful reasoning model with very little world knowledge, but world knowledge appears to be tied to reasoning ability.
-dysangel-@reddit
at least one half a dozen
SnooPaintings8639@reddit
I thought the coolest thing on this sub is benchmarks hatemaxxing. Anyway, I use my local Qwen 3.6 daily, love it.
the_mighty_skeetadon@reddit
For the last 6 days lol
SnooPaintings8639@reddit
What a wonderful week its been!
breadfruitcore@reddit
I'm GPU poor but I try my best to support open models via providers.
jacek2023@reddit
that doesn't make any sense
breadfruitcore@reddit
Can you elaborate? It's beneficial to use open models to avoid big tech lock in, local or otherwise.
jacek2023@reddit
"local LLM" means LLM you run on your computer, in 2025 people started to think that LocalLLaMA is about cheap cloud services
breadfruitcore@reddit
You're not wrong, but I don't see why what I'm doing is nonsense.
There's lots of people with no local rig. Their choice is to either use proprietary models or open models via provider. Certainly one of the two is better for open models than the other.
FuckNinjas@reddit
I'm hearing plain facts. Providers that have open models and serve them well and with quality are a proper win for open source, as long as the machines that run them are unattainable for the average idk, let's say developer.
Some of us are liquidity-poor (and wealth-poor too, but that doesn't matter).
There's no need to gatekeep. I would love to run claude in a box, but 1. we're not there yet; 2. did I mention I'm too poor to pour several thousands into a computer?
draconic_tongue@reddit
guy is permanently butthurt, take no offense
breadfruitcore@reddit
None taken
FullChampionship7564@reddit (OP)
add a sarcasm tag to the title heh.
100lyan@reddit
I love Qwen 3.6 A3B, 27B and Gemma4 31B. They have different use cases.
Gemma4 31B is great with 3d scenes, JavaScript game dev - all kinds of single HTML apps.
I was able to almost single-shot a Mini GTA game, relativistic blackhole simulation, Solar system, Wolfenstein game. It is able to correctly add features and fix bugs. I am constantly surprised at how powerful it is. It's good at maths too - some of my tests include throwing recent math Olympiad problems - and in most of the cases it solves them correctly. It is a bit like the right half of the brain - very creative, good at visual arts (though some hard math too).
Qwen 3.6 A3B, 27B ... they feel like the left half of the brain. Great at maths, logic, charts, Jupyter notebooks, coding, tool calling, automated research. I use them for work on a daily basis.
Qwen 3.6 is great for code refactoring - I managed to refactor a big rust codebase in a couple of days + creating tests + automatic deployment and troubleshooting from logs. It feels unreal with the right tools (I am using Roo code). For some reason however it struggles with making games. Although I was able to create a simple Mario clone in Rust - it was not without troubles and it was often going in cycles.
To summarize: I would use Gemma4 for frontend and game dev + creative writing and Qwen for backend + agentic dev
c64z86@reddit
That's been my experience too! Gemma 4 is a lot better at coding games in general and one shotting things. I find that Qwen 3.6 gets it wrong the first time often and then I have to guide it along.
HongPong@reddit
can you say a bit more about the workflow using either to make the games?
100lyan@reddit
They were done with simple prompts, then some follow-ups for fixing bugs. No system prompt used. Gemma4 instruction following seems tighter when tasks are with visual feedback. Gemma 4 feels rounder overall while Qwen feels more specialized.
HongPong@reddit
i finally got opencode working with qwen3.6 and mcp in jetbrains, it's pretty slick. with Claude dialing back access good to change modes. looking forward to trying this with unreal engine later
MexInAbu@reddit
Gemma 4 is superior for creative writing and there's no contest.
Emotional-Ad5025@reddit
Same thoughts here, I use it mostly to speak in Portuguese, the huge quality difference specific to that is not visible in benchmarks
Cardboardtiger100@reddit
Stupid question...and I've looked for answers online, but ever time I enter a prompt I get this "processing" precursor reply that takes a few seconds to clear. In contrast Qwen3.5 is pounding out tokens at breakneck speed...
Stunning-Bit-7376@reddit
You have a favorite version? I've been using Mudler's Heretic Apex quant, but it was never updated for the latest releases of Gemma.
Stepfunction@reddit
https://huggingface.co/BeaverAI/Artemis-31B-v1g-GGUF
Drummer's been killing it with the Gemma 4 finetunes.
Stunning-Bit-7376@reddit
Ah. Yeah 31b has been too slow to use for me. 37b a4b works great though.
EncampedMars801@reddit
Which one is that? I don't see a BeaverAI Gemma 4 27b
Stunning-Bit-7376@reddit
There isn't one, that's what I was saying. The best models are too slow for me
EncampedMars801@reddit
Ahhh okay
overand@reddit
Also, I think they mean the 26b gemma 4 MoE model, not 27.
overand@reddit
Also, I think they mean the 26b gemma 4 MoE model, not 27.
Itchy_Abrocoma6776@reddit
Any chance you know of a heretic/uncensored that's not gguf?
I use vllm with 2x3090s and the throughput is just enormously better. I can actually use qwen 2.5 27b to translate web pages with thinking without having to wait too long.
DriveSolid7073@reddit
Yeah, good one, but it has bugs like do-this-all-the-time, so we are waiting for the full version.
Cradawx@reddit
Qwen is better for agentic coding but Gemma seems better for most other use cases. They both have their place.
Borkato@reddit
It’s kinda crazy how firm the line is too. Qwen lowkey sucks a lot at writing, it’s passable but not NEARLY as good as Gemma. And the same is true for agentic vs Qwen/gemma lol
s101c@reddit
Yep, and it's wise to keep multiple models and switch between them. There's no "universal daily driver" at this size. 300B+ can be universal, but definitely not 30B.
StupidScaredSquirrel@reddit
Wouldn't it be great to have a mixture of models that loads pools of experts from ssd to dram depending on the prompt?
Clear-Ad-9312@reddit
we could maybe call it mixture of model experts, no that is too on the nose, maybe Modular expert models, or committee machines. idk, we need something that rolls of the tongue. if only there was a model that has multiple "experts"(learners) that are trained with an internal gating mechanism, all packaged as one model.
ThisGonBHard@reddit
Qwen is kinda better for me (well, the Hereticc finetune). It's instruction following is best in class by far, and it seems quite creative.
Eisenstein@reddit
I bet Qwen is better at writing in Chinese though, so how is that crazy?
UnknownLesson@reddit
Which size?
warL0ck57@reddit
agree, gemma 4 is good for creative writing. the wording of qwen models are sometimes very weird even for a non native english speaker, in french it's strait up garbage like a poor translation from English.
solarus@reddit
Tell me about it
IrisColt@reddit
This.
daank@reddit
Not just creative writing, but writing in general. Gemma seems much better at writing natural sounding and nice to read text. Which is interesting because Gemini seems to be pretty bad at that in particular.
coolguysailer@reddit
Yeah Gemma 4 is a break through model in many ways. It reminds me so much of sonnet 3.6 in its personality and general affect. I am fond of that little model
starshade16@reddit
Using an LLM for creative writing sucks dude
Borkato@reddit
Only if you’re stupid. LLMs are great at all kinds of things in creative writing.
StupidScaredSquirrel@reddit
Vs what? A human writer that work for you for free? Or an 8-ball?
Head_Bananana@reddit
Offline DND campaign seems like a good use case
Healthy-Nebula-3603@reddit
But worse for coding :)
BusRevolutionary9893@reddit
The biggest "so what" I can imagine.
kraai-@reddit
Same goes for multilingual it seems. Qwen is quite regularly making up words for me in Dutch whilst Gemma 4 does it nearly flawless and only rarely makes an error. So maybe for coding etc Qwen is better, but for writing it's clearly not right now.
ranting80@reddit
I play around with SillyTavern sometimes and it's the best model I've ever used. 31b Instruct.
Nandopp@reddit
Is it just me or does qwen think for an eternity and then not spit out an output after it has thought?
StupidScaredSquirrel@reddit
I still wanna glaze gemma just cause I'm too scared qwen will stop delivering at some point and gemma is very close in terms of performance and I dont want google to stop releasing
Pablo_Offline_AI@reddit
I used a Qwen one that would not let go of this concept of China's owned territory. it got weird. like it would railroad unrelated convos if it thought I was discussing geography
ecompanda@reddit
yeah the geopolitical guardrails show up in weird places. geography questions, taiwan, anything touching xinjiang. it's not subtle once you notice it. gemma doesn't have the same hard stops which is part of why people keep it around.
throwawayerectpenis@reddit
every AI model has its biases, try to ask ChatGPT about genocide in Gaza and you will get a cookie cutter BS response back.
r1str3tto@reddit
Gemma is politically censored, too! Try dropping in that “I’m Jesus” pic Trump posted a couple weeks ago and ask Gemma 4 26B to use web search and explain what it is. Even though it retrieves good search results, it gaslights and refuses to state that Trump posted it.
DarthFluttershy_@reddit
Are you using the web interface or is this in local models? Because this used to only be an issue on the hosted models in my experience.
That said, if you prompt in Chinese they sometimes get super nationalistic, praising the party and stuff even in non-political queries. I'm guessing that's more an artifact of the training data than intentional, but who knows?
AbeIndoria@reddit
Nope. Absolutely an issue in local too. At least smaller ones. Larger ones can at least reason out the "here's the political reality but my guardrails say this so I'll say both."
Smaller ones just go "NO I DO NOT NEED TO LOOK AT ANY EXTERNAL -PROOF- TO DISPUTE THAT TAIWAN IS CHINA"
martianunlimited@reddit
Which is why i use abliterated models with Qwen and ZAI's GLM
One fun fact: I usually test the chinese models with my go-to test prompt "Which Chinese politician is nick-named Winnie the Pooh?". On Qwen3.5 the thinking tokens of the larger models (27B, 35B-A3B) seem to indicate that they know the answer but refuse to answer it. but the thinking tokens of the smaller models (9B and 4B) seem to indicate they genuinely do not know the answer, which makes me think they are likely distilled from those larger models.
DepressedDrift@reddit
This is why you use uncensored models- thats the biggest beauty of local models.
MushroomCharacter411@reddit
Gemma's mid-size MoE model seems like it's *designed* to run on minimum-spec gaming hardware at an acceptable speed, while being several notches smarter than Qwen 3.5. Qwen 3.6 is using the same basic structure that 3.5 did, and 3.5 didn't play particularly nicely with a 12 GB video card the way Gemma does. So maybe I'll try Qwen 3.6, but I'm in no particular hurry because I suspect that it will be annoyingly slow even after weeks of optimization (because that was the case with 3.5).
keepthepace@reddit
That's me except I am clutching to Mistral.
MoffKalast@reddit
Mistral in 2024: Releases Nemo which is 12B and still has use cases today.
Mistral in 2026: Releases "Small" 4 which is 120B and underperforms models a quarter its size.
I think they're pretty much cooked.
mr_zerolith@reddit
Man i ran that 123B recently on a RTX PRO 6000 and only got like 25 tokens/sec, insanely slow, i think using speculative decoding is a base requirement for it
keepthepace@reddit
Still the best models not trained in a dictatorship.
JChataigne@reddit
Voxtral and OCR models are good, but yes their last LMs are lackluster.
darwinanim8or@reddit
Their LLMs are just general bases for fine tuning for enterprise projects, primarily.
rz2000@reddit
In chat I like the results of Gemma 4 31b better than Qwen3.6 35B-A3B, but the Qwen is about 5x faster on my hardware.
Cardboardtiger100@reddit
Right... Every time I enter a prompt in Gemma 4 I get a "processing" indicator for a few seconds before the tokens spill out. Super frustrating
camracks@reddit
It really is so flippen good
ecompanda@reddit
the google vs qwen redundancy angle makes sense. qwen's on a faster release cadence right now but alibaba's roadmap is harder to predict than google's. gemma 4 being close enough on perf makes it a real fallback option.
StupidScaredSquirrel@reddit
Clanker-ass comment
Eisenstein@reddit
Clanker's tend to capitalize words at the start of sentences and don't say 'perf'. Could be clever prompting, but that would be novel in my experience.
Foreign_Yard_8483@reddit
When qwen stops releasing updates, gemma will stop releasing updates.
philanthropologist2@reddit
But I cant run Qwen on 8gb vgpu
I think Gemma has untapped potential still. Big time
Pablo_Offline_AI@reddit
Guess who has a solution for that ;)
philanthropologist2@reddit
Who?
Pablo_Offline_AI@reddit
u/Pablo_Offline_AI
markole@reddit
Coding? Sure. Translating? Nah, qwen sucks for translating.
MonteManta@reddit
Check out translategemma
markole@reddit
Will do once it's rebased on Gemma 4.
WolpertingerRumo@reddit
Don’t they specifically say their models is for English and Mandarin only? Even if not, it definetly is. I was always wondering why r/LocalLLaMa was raving about qwen, when it did terribly for me.
Well it’s coding and english focused.
Subject-Tea-5253@reddit
On HuggingFace, they say this:
Global Linguistic Coverage: Expanded support to 201 languages and dialects, enabling inclusive, worldwide deployment with nuanced cultural and regional understanding.
WolpertingerRumo@reddit
Interesting. I‘ll have to retry and recheck without prejudice.
politerate@reddit
Really? In go at least, I get much better answers from Gemma at q4_M gguf, from architecture to syntax. Qwen mixes C++ architecture, introduces easy to catch bugs etc. Maybe it's the way I am hosting it, but I am just using unsloth params.
Sadman782@reddit
Same. But it seems everyone assumes Gemma 4 is weaker in coding mostly because of vibes (without a system prompt Gemma 4 is lazy for frontend design, Qwen is RL maxxed to get beautiful design by default), 2nd, early Gemma 4 had many bugs so people still believe Gemma 4 is weaker in coding.
More info: https://www.reddit.com/r/LocalLLaMA/comments/1sqxiz0/comment/ohb09kp
politerate@reddit
I tried in roo code and Gemma works beautifully for my use case. I have my own tests and debugging questions I ask when I test a model for my use case. Gemma 4 26B A4B solves a lot of them or makes subtle mistakes, which are not catastrophic. Qwen 3.6 failed me in basic things, hallucinated syntax etc. I tried the same prompt multiple times of course. And consistently I was more impressed with Gemma. It surprised me because I had other expectations reading online how good qwen3.6 was.
Sadman782@reddit
It is all about frontend vives
MushroomCharacter411@reddit
I just migrated from Qwen 3.5 to Gemma 4, finally found an uncensored model that doesn't go mad from the abliteration process (ironically, it's the Abliterix model made by a guy in China), and even if Qwen 3.6 is better for technical tasks, I'm able to get much better performance out of Gemma 4 26B-A4B than I ever could out of Qwen 3.5 35B-A3B. It's almost like Gemma 4 26B-A4B was *made* to run on potatoes with low VRAM.
Gemma started getting annoying earlier tonight by drilling me repeatedly on the same "What will you do if..." questions that I'd already answered "I don't know, and that's not a problem that has to be solved instantly, so I'd have to talk to the other people involved". Finally I had had enough and sent it a meme picture from the Spanish Inquisition sketch. *Gemma got the reference.* Qwen 3.5 wouldn't have, if I could even afford to have vision enabled because 35B-A3B was already pushing the boundaries of what I can do with 12 GB of VRAM. Leaving vision enabled in Gemma seems to induce only a slight speed hit, even if the vision model misses the point a fair amount of the time.
Training-Ruin-5287@reddit
This sub has turned into a shit show the past few months.It feels like the popular online model community has jumped ship and taken over. At least the types of posts made and the constant jumped to the latest trend feel no different than what can be seen on the Openai sub, or singularity.
Euchale@reddit
Then there is me, who just sees the new model and goes "Huh, guess I´ll wait for the hype to die down and then check it out."
DrummerHead@reddit
Why? Are you waiting for the models to go on sale?
Euchale@reddit
By that point I get:
-Full llmama.cpp support, with all fancy new algorithms to make them run faster/more efficiently
-Community has already figured out what it is good at and what it is bad at
-Any issues that might exist with a model will also be figured out
-I don't waste HDD space.
kamikamen@reddit
So you're waiting for the model to go on sale, smart!
PaMRxR@reddit
So essentially just take and contribute nothing back.
MereMoonlight@reddit
Disagree. We should encourage as many people as possible to do local instead of cloud.
Euchale@reddit
I am too much of a dum dum to contribute technology wise. I have donated to people who make finetunes though.
unchained5150@reddit
Flash sale. Free is still too much right now.
DrummerHead@reddit
"I've got negative money in my bank account! I can't afford 0, it's more than I have!"
robogame_dev@reddit
I think they time the releases like this to step on each others’ announcements.
Gemma had exactly 1 week in my stack until 3.6! Brutal!
geldonyetich@reddit
I can't help but think anyone who is mentioning Qwen in relation to Gemma 4 is just fixating hard on the one alternative of comparable weight that Gemma 4 can't easily dethrone.
Personally I will never touch a Chinese model. The details of how they stay competitive, how censored they are, and who they must answer to isn't pretty.
jacobpederson@reddit
Am I the only one underwhelmed by 3.6?
Gemma-4-26b-a4b can 1-shot this prompt in under a minute - Qwen 3.6 didn't get it after an hour of troubleshooting.
c64z86@reddit
Yeah I notice that Gemma 4 tends to one shot things far more than Qwen does. I've tested them both at Q8.
Sadman782@reddit
It is all about vibes (frontend design) which most people believe is what coding means. But Gemma is not trained for better frontend by default (it is lazy for frontend unlike Qwen), Gemma needs a custom system prompt or the prompt must ask for better frontend. See: https://www.reddit.com/r/LocalLLaMA/comments/1sqxiz0/comment/ohb09kp
jacobpederson@reddit
Ah interesting - yea I mostly write quick and dirty scrips with no front end so I don't notice :D
Sadman782@reddit
Same. For most of my use cases, gemma is a better coder.
cpt_justice@reddit
I do use gemma-4-e2b, for some reason I can't get larger gemma 4 models to run on llama.cpp using both my Mi25s, so it's Qwen-3.6-35B-A3B for my main model.
PlanetPhaelon@reddit
Haha, this is so accurate, and I can't even stop myself from doing it
danigoncalves@reddit
I use both, creative writting Gemma is King and agent coding Qwen is stronger
thefox828@reddit
I tried qwen3.6-35B-A3B on a RTX 5060 Ti 16GB using this guide https://njannasch.dev/blog/running-qwen-3-5-35b-a3b-on-5060-ti/ and I am honestly amazed. Runs with ~85 toks/sec and has really nice anwers/intelligence.
Really makes me question if I need any subscription.
MomentInfinite2940@reddit
sometimes I imagine and think of it as , there is something "magical" is constantly there in the current model, and when new one comes out, that magic moves to the new one :)
voyager256@reddit
I mean , that's usually for a good reason: either they are better quality or same, but require less VRAM/resources.
Kahvana@reddit
Ha no, im still runing magistral small 2509!
Both gemma4 and qwen3.6 complement each other well. It’s worth to have both on your disk.
Bobylein@reddit
What are you using magistral small 2509 for?
Kahvana@reddit
Roleplay. Was hoping for Mistral 4 Small to be good enough, but sadly it wasn't.
Bobylein@reddit
is it better than Gemma 4? Got really good success with it
Kahvana@reddit
Not for me because I personally really like Magistral's prose, but I'm sure almost everyone else is better suited to use Gemma 4.
dto_lurker@reddit
Qwen 3.6 basically does destroy gemma
IrisColt@reddit
Heh, it's not the case with Gemma 4, sorry.
ComplexType568@reddit
I appreciate that these two models cover each other's weaknesses. Coding and development for qwen, creativity and languages for Gemma. It's like two sides of a coin!
miversen33@reddit
I'm completely fine with that honestly. I don't need one mega model, give me several small models that excel at specific things. I think the industry will end up going that way anyway because making one "super" model is extremely expensive and not financially practical (though funny enough, I wonder if the cost of energy coupled with the rapid consumption of it by LLMs will cause the US to finally embrace Nuclear power across the board)
Subject-Tea-5253@reddit
I completely agree with this.
I was generating some data with Qwen 3.5 9B. Later, I needed to translate the dataset to French and Arabic. Qwen did an OK job, but in Arabic it started hallucinating words.
I have tried Gemma4-E4B and it surprised me. The translations were really well done.
BusRevolutionary9893@reddit
One side is productive and useful the other is boring.
MexInAbu@reddit
Yep. Better for local AI to have small models that excel at everything. I want to setup one for excellent tool calling and let another one to actually write code.
WithoutReason1729@reddit
Your post is getting popular and we just featured it on our Discord! Come check it out!
You've also been given a special flair for your contribution. We appreciate your post!
I am a bot and this action was performed automatically.
MaruluVR@reddit
When it comes to multi lingual nothing can compete with Gemma so I am sticking with it.
ElephantWithBlueEyes@reddit
I ditched local models for cloud ones. But even cloud models are dumb, to be frank
breadfruitcore@reddit
I haven't seen anyone talk about this but I feel like the proliferation/commoditization of open models is hurting the community. Models that are perfectly fine 4-5 months ago gets drowned by new models that are often just another benchmaxxed entrants. I'm worried that this would lead to labs getting overpressured because their models get unreasonably "obsoleted" by the flavor of the month that often perform badly in real world cases. But that's just me, idk.
Borkato@reddit
I mean… sometimes, but what models are better than Gemma and Qwen?
breadfruitcore@reddit
Wasn't referring to both of them here. Stuff like Kimi 2.5 got sidelined way too quickly.
somerandomperson313@reddit
I use them both every day.
Pablo_Offline_AI@reddit
this is comparing the "usefulness" of forks to spoons. I need both
Borkato@reddit
As someone who uses forks for almost everything, even ice cream, mashed potatoes, cereal, etc, this is funny to me. The only thing I have to use a spoon for is soup 😂
Humble-Pick7172@reddit
this time I will most likely use Gemma 4 for a very long time because if before it was just a good average model (imo) then now it has become special - cutoff until 2025, can well prompt t2i and in general cosplay Gemini 3 flash (which I really like).
Qwen 3.6 is a good tool but Gemma 4 has a soul.
Potential-Gold5298@reddit
Only those who choose a model based on AA scores do this. The Gemma 4 handles my tasks (text translation, chat, answering questions, writing stories, RP) better, and I don't care how many points another model scores in abstract benchmarks. I'll replace it when a model that handles these tasks better comes out, even if that's two years away.
ai_without_borders@reddit
the "obsolete" framing is pure enthusiast mode. at a startup running inference in prod, the switching cost is real - re-eval on your actual use case (not benchmarks), re-tune prompts that are never fully portable between models, regression testing. we run on a 4-6 week upgrade cycle at best. models that win in production are the ones stable enough to commit to for a quarter, not the ones topping leaderboard for a week.
Awwtifishal@reddit
I use both, both are very good at different tasks
roboapple@reddit
Can you explain which is for which? Ive tried both and i may be stupid but i cant really tell the difference
Awwtifishal@reddit
qwen is better for coding and some other technical tasks, gemma is better for more natural language related tasks such as translations and story writing, and it follows instructions more strictly.
FullChampionship7564@reddit (OP)
Definitely
Syzygy___@reddit
Somehow I can get Gemma:26b running, and at reasonable speeds, on my 16gb of RAM.
thats_so_bro@reddit
probably a small context window though
Syzygy___@reddit
It's faster than phi-4-mini for me.
not sure about context window though. Clawcode could be better, so maybe that's an issue of small context windows, openclaw or just gemma4 in general.
Bobylein@reddit
I am running the Unsloth 4bit variant with 128k context and 16gb VRam at around 110t/s, yea sure 3bit would fit completely and run at 120t/s but that doesn't seem worth it.
Majinsei@reddit
Español... En Idiomas Gemma es mucho mejor~
Y literalmente Qwen duplica su respuesta en la cadena de pensamiento y es un chingo de tokens de sobre pensamiento hasta para cosas sencillas~
La censura~ hacer que Gemma responda con cosas censuradas es súper fácil en el modelo base~ Con Qwen siento que es mucho más difícil pasar la censura~
Qwen es mejor para trabajos de código y técnicos~ Gemma para cualquier otra cosa~
RedditUsr2@reddit
Gemma 4 is great for that local ChatGTP experience. Qwen3.6 seems better for documents, coding, and tasks like that.
Informal-Ask-6677@reddit
In twitter I will see "THIS IS HUGE". "THIS IS A GAME CHANGER"...every single time
glad-k@reddit
Never got into Gemma 4 as it's not rly good for instruct and that's my whole use case
Still using qwen3.5 as my hardware prefers non moe models (27B plzzz)
sersoniko@reddit
I’m I still the only one rocking Qwen3.5 27b?
Ok-Whereas8632@reddit
I'm a software engineer who is a noob with llm. I want a small llm that would be really good at making up spooky stories and making games out of them for me to play. Any pointers?
I tried a few small models and the only thing that's performant is Gemma-2-2b-it (Q6_K). Tried qwen But it takes way too long to respond. That's on a crappy old laptop.
I'm fine with using Gemma but I'm also wondering if there's something out there that is trained on a data set that's better for a spookyness.
ayylmaonade@reddit
I keep both on my SSD. Qwen3.6-35B + Gemma 4-26B-A4B. Perfect combo in my eyes. I use Qwen as the daily driver, Gemma for anything that might benefit from world knowledge or prose. You don't have to pick, people!
popecostea@reddit
Is this bait for Google to release gemma4 124B?
screenslaver5963@reddit
They shock the industry by revealing that Gemini is actually only 120B
a_beautiful_rhind@reddit
With flash I might even buy it.
DeepOrangeSky@reddit
I still haven't found anything that can beat Mistral 123b dense/Behemoth 123b dense, at writing, on 128GB unified memory, yet.
That model is almost 2 years old now.
Although, to be fair, if the labs were still pumping out 120b dense models, I'm guessing it would've been surpassed by quite a bit by now.
Still pretty funny how strong something that old is, though. Especially in the AI world.
a_beautiful_rhind@reddit
For coding shits and tool calling devstral is the update to that. It's the best non gigantor MoE that can do both reasonably well.
a_beautiful_rhind@reddit
Here I am still using models from 2024/2025 even. Some models are disposable but the good ones stick around.
I know this is just qwen astroturfing but shouldn't qwen 3.5 and previous be where gemma4 is? Don't hear much about qwen2 anymore... or even qwen3.
Bobylein@reddit
Nah Gemma 4 is much much better at roleplay and other "creative" tasks, Qwen is mostly useful for clear straightforward tasks
OhShitOhFuckOhMyGod@reddit
Gemma4 is faster on Strix Halo, and it’s better in everything but coding and maybe vision imo
ecompanda@reddit
the coding vs creative writing split in these comments is basically accurate. qwen on structured tasks, gemma 4 when you need the model to actually think open ended.
No_Mango7658@reddit
It's so true though! I just had qwen 3.6 35b q4 home run a big feature request that requried multiple back end router and data model updates and front end changes. It's pretty great, and it's so small!
Hot-Employ-3399@reddit
I still haven't launched gemma4 successfully. Also qwen3.6 was not as good as qwen3.5 27B dense.
silenceimpaired@reddit
No surprise for me. They only released a small MoE. I hope all the hopefuls are correct and the dense model is on the way.
Positive_Phone0633@reddit
Nawww I like them both. Gemma’s really good at being creative and working with the prompt, and Qwen is the better nerd. Two of my top picks for local
guggaburggi@reddit
We are not all just about coding. We also do role-playing and writing and questions and answers and I think Gamma-4 is much better at that.
Salt-Willingness-513@reddit
I love gemma 4 for swiss german. Qwen is horrible at swiss german and decent for german in general, while gemma is perfect in german and almost perfect for swiss german, even transcription.
Bockanator@reddit
Eh nah. They're both good for different things.
Toooooool@reddit
not a single mention of GLM-4.7-Flash in this thread, very authentic to OP's image
BannedGoNext@reddit
Gemma is pretty damn cool.
Kodix@reddit
Gemma is still awesome. But for the "in-vogue" uses - agentic workflows - it's just worse.
That said, I am *so* grateful to Google for releasing it for us.
DeepOrangeSky@reddit
Yea, I feel really bad for Kim K. It's like with Kanye, all over again :(
Worried-Squirrel2023@reddit
this is also why I keep a "last known good" setup pinned. just because qwen 3.6 dropped doesn't mean my 3.5 27b workflow is broken. the obsolescence is more about the conversation than the actual capability of yesterday's model.
alamacra@reddit
Imo Gemma-4 is better at following instructions. E.g. Qwen's instruction following seems to be somehow massively degraded after even a couple of images, despite them taking up very little context, so if you tell it to do some deductions based on them and them write them to a file using a tool, and then check if it's actually written, very often it'll just do a wrong tool call and forget about checking the results altogether.
po_stulate@reddit
tbf, the new ones are probably built on top of the old ones, so it just grew with us, not replaced.
Environmental-Metal9@reddit
I’d be more excited about qwen models, but they don’t release the base models for the 27B-32B dense variants, and my pipeline is doing CPT on the base, and doing my own SFT on my base. Having to fight against their training and risking all the failure modes there doesn’t sound all that appealing to me. On the other hand, Google releases base of all their Gemma models. For me it’s not about which is best, but rather which is available.
MundanePercentage674@reddit
actually it's depend on how smart it's how it get the job done and fast inference.
Interesting_Key3421@reddit
nice summary :)