Can Gemma4-26B-A4B replace Gemma3-27B as general assistant + RP?

Posted by simracerman@reddit | LocalLLaMA | View on Reddit | 16 comments

So far, Gemma3-27B and its finetunes has been the best as general assistants , and RP due to their depth of personality.

The 26B is overshadowed by the 31B in the amount of reviews. Anyone testing the 26B as a general purpose assistant, web search agent, and occasional RP?

[-]

ea_nasir_official_@reddit

Absolutely! Its much smarter and much faster IME. It's more than twice as fast on my AMD APU (8840HS)

[-]

simracerman@reddit (OP)

I’m gonna run it on 5070 Ti, and inactive MoE experts offloaded to CPU, so no issues there. I used to run everything on AMD 890m iGPU. It’s about 20% faster than your 780m.

[-]

IORelay@reddit

How do you offload inactive experts to normal ram?

[-]

simracerman@reddit (OP)

Llama.cpp does it automatically nowadays.

[-]

Kahvana@reddit

From my own quick testing with vanilla (unsloth's quants):

- General assistant: Works fine, happy to confirm it kept dense internal knowledge
- Roleplay: Matter of taste. 31B performs better than 27B as it's dense, 26B-A4B feels more capable as long as it's reasoning is on.
- Web search: it will handle general searches well, but once world news or politics is involved it has a bit of trouble with it (the current world is just too non-credible for a model with cutoff to January 2025)

Still have to test it with my quants of heretic, I suspect it will perform better on web searches for being less restricted by it's internal policy and questioning the contents less.

Overall I found Qwen3.5-35B-A3B the stronger model for general assistance / websearch, Gemma4-26B-A4B better for roleplay.

[-]

simracerman@reddit (OP)

Very helpful!! Thank you! I’ll wait on Sillytavern. OpenWebUI is the main interface, and that’s where I was gonna start.

Interesting you said that reasoning enhanced the 26B responses. The consensus from the community before was that Thinking models usually appeared too strict. I’ll put it through paces tomorrow.

[-]

RandumbRedditor1000@reddit

Just use Gemma 4 31B at a slightly lower quant, wouldn't it be better?

[-]

simracerman@reddit (OP)

I could at Q3 quants, but the speed of 26B is enticing. I don’t really need a philosopher, just something that comes close or slightly better that Gemma3-27B

[-]

RandumbRedditor1000@reddit

I think Gemma 4 26b still beats Gemma 3 27b

[-]

Adventurous-Paper566@reddit

Je pense que 26B peut largement remplacer Gemma 3 27B pour tous les usages.

[-]

brixon@reddit

I think their charts say the Gemma 3 27b is similar to the Gemma 4 4b.

Stuff that can run on real people hardware just keeps getting better all the time.

[-]

simracerman@reddit (OP)

Good to know! I’ll try that tonight.

[-]

Lorian0x7@reddit

Yes, I tried RP with the 31B at Q3... it's amazing, the best I ever tried .

[-]

svachalek@reddit

I’m having some trouble with reliability on 4 but assuming we get that ironed out, I think A4B is the replacement if you want something faster, and 31B is where to go if you want smarter.

[-]

simracerman@reddit (OP)

Nice to know! I’ll eventually try 31B at Q3 quants but am starting with 26B to see if that is a good replacement first.

[-]

lemondrops9@reddit

Tried a bit of RP, really fast and seems good. Not sure yet if it compares to the GLM Steam 106B that I'm used to.