Gemma 4 26B-A4B on Apple M1 Max is very fast
Posted by Beamsters@reddit | LocalLLaMA | View on Reddit | 12 comments
Gemma 4 26B-A4B quantized at Q5K_S running on Apple M1 Max 32GB
Using LMStudio, Unsloth Q5K_S Context 65536 use around 22GBish memory (Metal llama 2.11.0)
On average Tok/s = 50.x
On the other hand Gemma 4 31B (Q4K_S) is quite slow on average Tok/s = 10-11
Nonomomomo2@reddit
What are you doing with it?
TheOnlyOne011001@reddit
anything I can do with chatGPT it's benchmarked close to gpt 4o
Nonomomomo2@reddit
I call BS
TheOnlyOne011001@reddit
why?!
Nonomomomo2@reddit
50 tok/s on an M1 with 32GB ram doing "anything that ChatGPT can do"? Yeah fucking right.
Insipid_Menestrel@reddit
read this post if you don't believe it, current local models surpass GPT-4 at this point
Nonomomomo2@reddit
Yeah I run both on a 128GB M5 and I don’t care what the benchmarks say, this is bullshit in reality.
TheOnlyOne011001@reddit
it's 33 tok/s on M1 64Go, and yes it can do it what I usually do with chatGPT (brainstorming)
Beamsters@reddit (OP)
Like ask general questions and find recommendations for irrelevant stuff.
TheOnlyOne011001@reddit
Yes, it is awesome
I have M1 max 64Go
Historical-Curve-235@reddit
Hello, where did you get Q5K_S? I see 3 options but not the Q5_S i am planing to buy a m1 max 64 go so i will be happy to use it
eclipsegum@reddit
I don’t even bother with models unless I can run on oMLX. Night and day