How much can you push RTX3090 in terms of Tokens Per Second for Gemma4 E2B?

Posted by last_llm_standing@reddit | LocalLLaMA | View on Reddit | 14 comments

I'm trying to maximize the throuhgput, I can already get gemma-4-E2B-it-GGUF 8bit to give me \~5 tokens per second on my intel i9 cpu. How much can i push this if I get an RTX3090 rtx.

If you are running on CPUs, how much TPS were you able to squish out for Gemma4 (any quant, any model)?

And on RTX3090, how much were you able to push the boundaries?