Is 200k context realistic on Gemma 31B locally? LM Studio keeps crashing

Posted by Open_Gur_4733@reddit | LocalLLaMA | View on Reddit | 9 comments

Hi everyone,

I’m currently running Gemma 4 31B locally on my machine, and I’m running into stability issues when increasing the context size.

My setup:

I’m mainly using it with OpenCode for development.

Issue:
When I push the context window to around 200k tokens, LM Studio eventually crashes after some time. From what I can tell, it looks like Gemma is gradually consuming all available VRAM.

Has anyone experienced similar issues with large context sizes on Gemma (or other large models)?
Is this expected behavior, or am I missing some configuration/optimization?

Any tips or feedback would be really appreciated