Why is gemma4 using so much ram.

Posted by BestSeaworthiness283@reddit | LocalLLaMA | View on Reddit | 12 comments

Im sorry if this is a really beginner question, but im trying to get into how LLMs work under the hood.

From my testing i have observed that when running gemma4:e4b I see a usage of about 4gb of vram and 8 gb of ram. As context, i have a rtx 4060 with 8gb of vram. From my understanding the chunks cant load entirely in vram and they offload in ram.

What do you think the problem is ?