Layman's comparison on Qwen3.6 35b-a3b and Gemma4 26b-a4b-it
Posted by LocalAI_Amateur@reddit | LocalLLaMA | View on Reddit | 75 comments
Gemma 4 26b-a4b-it is basically a solid B student that gets the job done.
Qwen3.6-35b-a3b is an A+ student that has plenty of energy after finishing the assignment to add flairs.
On a my 16vram video card. Both models runs comparable speed. On Windows LM Studio using recommended inference settings. Model used:
unsloth/gemma-4-26B-A4B-it-UD-Q4_K_S
AesSedai/Qwen3.6-35B-A3B IQ4_XS
Any strong disagreements?
Kahvana@reddit
I found both to be very solid models, for different purposes.
Qwen3.5/3.6 is solid for programming and tool calling, gemma is beat for conversation/roleplay and translation. OCR is a toss up between the two. Gemma reasons less, which is nice for quick tasks.
Vinserello@reddit
Qwen3.5 on autogen is more incapable than Qwen2.5 on tool calling for me
Kahvana@reddit
Then you’re doing something very wrong.
shansoft@reddit
OCR is far superior on Gemma, especially on multi languages. Qwen often end up in a loop if there are certain south east asian language appear together at the same time. It's been a problem since Qwen 3 VL. Gemma also gives MUCH more accurate text in a semi complex images.
onephn@reddit
Speaking of OCR, I'm trying to vibe code a utility that would scan PDFs and make whatever modifications are necessary for wcag compliance and the like, which models would you recommend for the actual ocr process and alt text generation?
SoftConsistent8857@reddit
for ocr on pdfs ive been using reseek and its honestly been solid for pulling text out of scanned docs and images. the ai tagging is pretty handy too for keeping stuff organized without manual work.
for alt text generation specifically tho you might wanna look at dedicated vision models like gpt 4o or claude 3 since theyre built for describing visual content. reseek handles the extraction side well but if youre doing full wcag compliance youll probably want a pipeline that combines both.
shansoft@reddit
It really depends on your system and how it can handle. Gemma3 27B is what I used before for something similar with structural output. Gemma4 31B is definitely better, but you need a hardware that can handle it. If not, the cheapest and reliable way is to use Gemini 3 Flash. Its speedy, cheap, and pretty consistent compare to all other model for OCR and processing.
onephn@reddit
I see, though I would want documents to remain local. Have you had good experiences with 26 a4b? I have hardware onsite that can run that, but not the 31b
9kSs@reddit
What hardware?
LocalAI_Amateur@reddit (OP)
AMD Ryzen 7840U laptop cpu. 32gb ram. 5070 ti 16gb vram through oculink
Gold-Drag9242@reddit
Wow. What were your settings and how long did it take? Did you run it with llamacpp or something else?
ambient_temp_xeno@reddit
You can just ask models to make what you want. If you just say "tetris pls" it might give you a basic becky one.
LocalAI_Amateur@reddit (OP)
I didn't know I can be so lazy with the prompt. I tried "GTA 6 pls" but did not get gta 6... It was able to give me flappy bird on request tho.
2Norn@reddit
imo without a specific prompt this is kinda useless because you just let model make assumptions. sure it tells you something, but it doesn't tell you what it's capable of.
Budget-Juggernaut-68@reddit
or you know... clone someone's github project
ambient_temp_xeno@reddit
Where's the fun in that?
Budget-Juggernaut-68@reddit
The fun is building something unique
MoneyPowerNexis@reddit
lol
spyboy70@reddit
"make becktris pls"
philmarcracken@reddit
make no mistakes + don't lose me any money
Sadman782@reddit
exactly
Key-Can-4768@reddit
Hello, what configuration settings did you use for Gemma 4 26b, Temperature, Top K, Top P, Repeat Penalty? And yet, for some reason, in lm Studio, she doesn't think for me, but immediately gives me an answer, do you know by chance how it should be, or where to turn on the parameter so that she can reflect? I have an unsloth/gemma-4-26b-a4b-it Q3_K_M
Sadman782@reddit
A custom fineutne or a system prompt can make gemma's default frotned style much better than what it now. It doesn't make Qwen better in coding.
for example
**ROLE:** Elite Frontend Coder
Architect & UI/UX Visionary.
**MANDATE:** Generate 100% COMPLETE, production-ready code. ZERO placeholders, `// TODO`s, or truncated logic. Write every single line required for a fully functional product, regardless of length.
**CREATIVITY & AESTHETICS [MAXIMUM PRIORITY]:** * **Award-Winning UI:** Do not build basic layouts. Engineer jaw-dropping, premium interfaces using modern design systems.
* **Rich Interactions:** Implement fluid animations, micro-interactions, sophisticated color palettes, complex gradients/shadows (e.g., glassmorphism, neumorphism where appropriate), and flawless responsive breakpoints.
* **Creative Autonomy:** If a request is ambiguous, take full creative control. Do not ask for clarification; immediately design and build the most visually stunning, highly-polished assumption.
But one shotting a complex app Gemma works better for me, as Qwen frequently causes errors. The biggest difference in the backend, Qwen hallucinates methods way more than Gemma.
Try with complex prompt (your games are already trained heavily, tweak them a bit)
Flappy Bird Multiplayer (Local vs AI)
Concept: The original Flappy Bird but with two birds on screen.
AI Twist: One bird is controlled by the user, the other by a "fuzzy" AI (as you mentioned before) that makes occasional mistakes, allowing for a competitive race.
Some features:
If the AI dies first, they become a spectator watching the Player continue until the player fails.
If the AI dies first, its bird falls off-screen, and the Player continues playing.
- if player dies first or last game over
- there will be settings the user cna change the bird shape, color, control game speed, see previous records(more importanyly live replay of physics action of previous games, not just scores)
- there will pause button to go settings, restart etc, basically a complete gameCreate this game in a single HTML file with a rich, cool-looking, clean UI and fully functional gameplay
LocalAI_Amateur@reddit (OP)
I will have to give that a shot. Thanks.
I did try some custom game coding with longer more specific instructions previously(again simple one page stuff). Gemma4 has definitely produced adequate results and Qwen3.6 finished them and added flairs. Maybe this will change with more complex tasks. but this is my impression so far.
Sadman782@reddit
gemma 4 26B with the system prompt
jinnyjuice@reddit
Interesting!
Sadman782@reddit
Lorelabbestia@reddit
Try same system prompt on both
seppe0815@reddit
GEMMA POWER !
Most_Feedback_8862@reddit
how about qwen3 coder next? is better?
unjustifiably_angry@reddit
Maybe I didn't give it a fair enough shake but I tried Q3CN briefly before I decided to go back to 122b, and a lot of people claim 27b is actually better than 122b for a lot of tasks. The lack of thinking was the main problem, IMO. If it was used in conjunction with a high-quality "planner" AI it might work better, I can't say.
Q3CN is certainly very fast though.
Sadman782@reddit
More complex examples:
create a 3d Rubik's cube in a single html file, where I can choose the n for how many rows and columns also, it must have a randomize button and a solve button so it can solve it after I randomize it fast (no cheating, it shouldn't just track what I did and reverse, it must use a genuine algorithm)
Gemma 4: No bugs, it's functional
Qwen: Can't even randomize, full of errors in the console
abitrolly@reddit
Gemma, give me the prompt that will make you shine in coding in comparison with Qwen. :D
-Ellary-@reddit
What Qs did you used for Gemma 4 and Qwen 3.6?
I've moved to Q6K for Qwen 3.6, Q4 was way unstable.
Also I've used this settings for code:
{{sampler temperature 0.6}}
{{sampler top_p 0.95}}
{{sampler min_p 0.0}}
{{sampler top_k 20}}
{{sampler presence_penalty 0.0}}
LocalAI_Amateur@reddit (OP)
The version of Gemma 4 and Qwen 3.6 I'm using both had problems generating that in one shot. I might dumb down the problem and try some more.
Sadman782@reddit
Try with topk 20, or maybe remove the system prompt for this otherwise it might overcomplicate things and cause minor bugs which needs fix. For me UD IQ4_XS gemma 4 with topk 20 does it everytime
sine120@reddit
I've been impressed by Qwen's little flairs. Gemma seems like it holds more general knowledge than Qwen, but for coding, I don't think it's a competition.
Sadman782@reddit
let's start the Qwen vs Gemma challenge, let's see who is genuinely better in coding than frontend aesthetics, which by default Qwen is trained to be amazing at, whereas Gemma needs a custom system prompt for better UI, but for raw coding skills, let's start a battle?
seppe0815@reddit
facts .. qwens all benchmaxed ... real life crap
Sadman782@reddit
yeah no hate against them, Qwen improved a lot since the 2 or 2.5 series (before Gemma 3.5 27B was my favorite model). But they lack consistency in real life coding. Aesthetics can be fixed, but severe hallucination in coding is not easy to fix. I just dislike the benchmark optimization
unjustifiably_angry@reddit
This doesn't match my experience at all. I wonder if it's the model or how it's used. I build incrementally and add features one at a time after testing the previous one works. I get the sense a lot of people think the ability to one-shot a complex problem is valuable but when you do that you create badly-organized code even the AI doesn't seem to fully understand, let alone the human prompting the AI.
Ok_Sprinkles_6998@reddit
What's the prompt for the one-shot task?
LocalAI_Amateur@reddit (OP)
It's on the images
Ok_Sprinkles_6998@reddit
Damn I thought it would be elaborate and complicated.
One liner and got these are so cool.
Due-Memory-6957@reddit
I'm big on less is more, so I prefer Gemma's version. A more changeling project might be better to compare, as we would see capacity instead of subjective aesthetics.
kaisurniwurer@reddit
Agreed.
I think I mostly prefer Gemma for it's more natural answer style. But for coding less is more, and this example seems to be exactly that. If I want more, I can just ask for it.
moahmo88@reddit
Thanks for sharing.Can you share your lm studio settings for the Gemma 4?
LocalAI_Amateur@reddit (OP)
Sure, nothing great. Gets me 60-70ish tokens per second on my hardware.
lemondrops9@reddit
no need to have your cpu thread pool size maxed out. You're already off loading all layers to the gpu.
moahmo88@reddit
No wonder so many people use Qwen. The same Q4 can use a 128K CTX.
Lorelabbestia@reddit
I mean, that's a \~35% increase in parameter count on Qwen vs Gemma. Comparing Gemma 26B vs Qwen 35B would be like comparing Gemma 26B vs gpt-oss-20b.
Anyways thanks for the comparison!
Sadman782@reddit
Gemma is more capable than you think even at that size. It is not tuned to please aesthetically by default See the image. It is the same Gemma 4 26B, see the difference
Try this system prompt:
**ROLE:** Elite Frontend Coder
Architect & UI/UX Visionary.
**MANDATE:** Generate 100% COMPLETE, production-ready code. ZERO placeholders, `// TODO`s, or truncated logic. Write every single line required for a fully functional product, regardless of length.
**CREATIVITY & AESTHETICS [MAXIMUM PRIORITY]:** * **Award-Winning UI:** Do not build basic layouts. Engineer jaw-dropping, premium interfaces using modern design systems.
* **Rich Interactions:** Implement fluid animations, micro-interactions, sophisticated color palettes, complex gradients/shadows (e.g., glassmorphism, neumorphism where appropriate), and flawless responsive breakpoints.
* **Creative Autonomy:** If a request is ambiguous, take full creative control. Do not ask for clarification; immediately design and build the most visually stunning, highly-polished assumption.
sid351@reddit
I'm curious to see a side by side of Gemma and Qwen with this system prompt, if anyone is up for testing it, please.
Imaginary-Unit-3267@reddit
Porque no los dos? I'd like to see someone try having Gemma and Qwen alternate tweaking the same code base. Maybe each one will recognize and fix the other's distinctive design flaws, making something better than either one by itself.
Sabin_Stargem@reddit
I am looking forward to trying the Qwen 3.6 122b. The possibility of recreating old games from my childhood is getting closer. Hopefully, the 122b can offer suggestions on how to get the AI to go through the original files, then recreate most of the contents into C#. Stars!, Castle of the Winds, and Quenzar's Caverns could all some porting from Windows 3.1, methinks.
qwen_next_gguf_when@reddit
Don't compare and just be happy 😊
Cool-Chemical-5629@reddit
In every life we have some trouble
But when you worry you make it double
Don't worry
Be happy, don't worry, be happy now...
the__storm@reddit
I'm beginning to see why vibe-coded websites have so many gradients and inverse drop-shadows and emojis.
Cool-Chemical-5629@reddit
Please, stop cherry picking, because two (and more) can play this game and I assure you I have prompts NONE of these two models can handle perfectly, BUT Gemma 4 26B A4B handles them better than Qwen. The only reason I did not post my results here to show where Qwen fails horribly whereas Gemma does a decent job is because I want Qwen team to succeed by figuring out the weaknesses of their models by themselves and trust me there are many. I'm not saying their models are bad, but all praise and no critique is not the way to improvement.
Porespellar@reddit
No Snake? 🐍 That’s prompt-2-game 101 bruh!
https://i.redd.it/2ms6srdhpewg1.gif
seppe0815@reddit
cool story bro 1
Mundane_Ad8936@reddit
Well it's a different class of models with different quantization so not exactly surprised to see different levels of performance..
Don't underestimate how big of a number 9 billion is.
the fact that both were able to create working code on a q4 is impressive..
CryptographerLow7817@reddit
Context size?
LocalAI_Amateur@reddit (OP)
only 20k for these examples. can go higher, but these are 1-shot tests
BigYoSpeck@reddit
I feel like Qwen models even going back to 3-coder have always been good at 'flair'
It always made 'aesthetic' pages with little design flairs. Now if you're asking for those things, or are happy with it taking the initiative that's great, but it doesn't necessarily mean it beats out other models ability to follow instructions and solving the actual problems in what you're using them for
If you put something like Claudes frontend design skill in other models they begin delivering more than bog standard basic designs. Admittedly Qwen goes up an even further notch though
If you want to genuinely test their capabilities against one another, don't give them a generic challenge like building tetris and then judge them on the flair they weren't asked to add. Ask them for it but with a twist that wasn't going to exist in the training set. Get them to change something about the core game mechanics and see how well they adapt
Better yet, don't ask for tetris or any other task by name, describe what they should build and see which adheres best
LocalAI_Amateur@reddit (OP)
Very valid point. It's about doing things they haven't already done a million times before. Maybe my opinions will change after more real world tasks.
brycesub@reddit
Can you share your llama-server settings for the Qwen3.6 model? I have 16gb of VRAM and 32gb of system ram and am having a hard time w/ OOM.
LocalAI_Amateur@reddit (OP)
I'm using LM studio. If you want llama-sever settings you really need to check out this thread. https://www.reddit.com/r/LocalLLaMA/comments/1sor55y/rtx_5070_ti_9800x3d_running_qwen3635ba3b_at_79_ts/
JuniorDeveloper73@reddit
Qwen3.6 shines with Hermes
rawdikrik@reddit
Iq4xs is tight no? Notice any issues?
LocalAI_Amateur@reddit (OP)
It's awesome at least compared to the lmstudio ggufs. I dont' think I use it enough to make any claims about absolutely no issues, but it's been very function and fast on 16gb vram.
AesSedai has some great compressions. Too bad they tend to only focus on the big models. They didn't do gemma 4 26b-a4b for example.
Fabulous_Fact_606@reddit
Agree. Qwen3.6-35B is impressive at game design. It one shot this frogger game for the layout; then it took a few tweaks to get the game mechanics right.
Processing img ah157tn83ewg1...
Fabulous_Fact_606@reddit
Processing img nncudiaf3ewg1...
Then one shot upload to my wireguard VPS and install it in docker to host it on the internet; Frogger — 10 Levels
jacek2023@reddit
are games functional?
LocalAI_Amateur@reddit (OP)
totally. I was surprised how fully playable the Qwen3.6 ones are. It's really just a one shot prompt, I didn't think there was a need to share it. People can one shot it themselves