Gemma 4 26B-A4B on Apple M1 Max is very fast

Posted by Beamsters@reddit | LocalLLaMA | View on Reddit | 12 comments

Gemma 4 26B-A4B quantized at Q5K_S running on Apple M1 Max 32GB

Using LMStudio, Unsloth Q5K_S Context 65536 use around 22GBish memory (Metal llama 2.11.0)

On average Tok/s = 50.x

On the other hand Gemma 4 31B (Q4K_S) is quite slow on average Tok/s = 10-11