Gemma4:31b-coding-mtp-bf16 - slow on Macbook M5 128gb

Posted by chimph@reddit | LocalLLaMA | View on Reddit | 15 comments

Very quick initial test of Gemma 4 new MTP model via Ollama (llama.cpp doesnt support yet)

https://blog.google/innovation-and-ai/technology/developers-tools/multi-token-prediction-gemma-4/

Running in Open Webui to view token/s output and I get 10-12 tok/s

Will have to wait for benchmarks to see if this is worth running instead of Qwen3.6 27b or Qwen3 Coder Next for tasks that dont need babysat.