On my RTX 4060 8GB laptop, I can run Gemma 4 E4B Q6 K XL with mmproj at only 6GB of VRAM usage despite sources recommending Q4 K M for my hardware. What is going on?

Posted by ProducerOwl@reddit | LocalLLaMA | View on Reddit | 1 comments

I can set my context length as high as 64k and the vram usage is not even remotely close to the maximum utilisation. My TPS is also 40+.