Gemma 4 26B-A4B on Apple M1 Max is very fast

Gemma 4 26B-A4B quantized at Q5K_S running on Apple M1 Max 32GB

Using LMStudio, Unsloth Q5K_S Context 65536 use around 22GBish memory (Metal llama 2.11.0)

On average Tok/s = 50.x

On the other hand Gemma 4 31B (Q4K_S) is quite slow on average Tok/s = 10-11

[-]

What are you doing with it?

[-]

anything I can do with chatGPT it's benchmarked close to gpt 4o

[-]

I call BS

[-]

why?!

[-]

50 tok/s on an M1 with 32GB ram doing "anything that ChatGPT can do"? Yeah fucking right.

[-]

read this post if you don't believe it, current local models surpass GPT-4 at this point

[-]

Yeah I run both on a 128GB M5 and I don’t care what the benchmarks say, this is bullshit in reality.

[-]

it's 33 tok/s on M1 64Go, and yes it can do it what I usually do with chatGPT (brainstorming)

[-]

Like ask general questions and find recommendations for irrelevant stuff.

[-]

Yes, it is awesome
I have M1 max 64Go

[-]

Hello, where did you get Q5K_S? I see 3 options but not the Q5_S i am planing to buy a m1 max 64 go so i will be happy to use it

[-]

I don’t even bother with models unless I can run on oMLX. Night and day