Best settings for gemma-4 on a 3090?

Posted by Deadhookersandblow@reddit | LocalLLaMA | View on Reddit | 15 comments

3090 (24G) + 32G DDR4

Currently running

--mmproj mmproj-BF16.gguf
--chat-template-kwargs '{"enable_thinking":true}' \
--flash-attn on \
--cache-type-k q4_0 \
--cache-type-v q4_0 \
-np 1 \
-c 160000 \
--jinja

at 26B-A4B-it-UD-Q5_K_XL and generally quite happy with it but it does oom die occasionally (usually when I do something quite convoluted figuring out a workflow, etc.)

I get around 90-95 tok/s. What can I improve on? I'm completely OK with trading speed for performance (by like half, so lets say 40 tok/s would be OK)

Thanks