Gemmini 4 31b draft model benchmarks

Posted by tecneeq@reddit | LocalLLaMA | View on Reddit | 12 comments

https://docs.google.com/spreadsheets/d/1NzZC4JShGluwH2fdjlMbZ2ke99AcTctUnM7rG12_cYE/edit?usp=sharing

The benchmarks have been run in a LXC-Container on Proxmox on a Bosgame M5 Strix Halo 128GB board. Software was llama.cpp on ROCm 7.2.

Best compromise between speed and precision, i think, is unsloth/gemma-4-31B-it-GGUF:UD-Q8_K_XL with unsloth/gemma-4-E2B-it-GGUF:UD-Q3_K_XL as the drafting model.