Gemma 4 26B A4B is still fully capable at 245283/262144 (94%) contex !

Posted by cviperr33@reddit | LocalLLaMA | View on Reddit | 84 comments

It solved a issue with a script that pulls realtime data from nvidia smi , gemini 3.1 failed to fix it at a fresh session start lol.

Kinda mind blowing how in 2026 we can already have stable 200k+ contex local models. I tested it out my putting as much reddit posts and like random documentaions and just raw files from llama.ccp repo , so i can bumb it as much as possible and see how it affects my vram. But during this testing gemma still had his mind intact !

245283/262144 (94%) at this contex , if i ask it to tell me what this user said and perfectly matches it and tells me , within 2-5 seconds

From previous tests i did , i had to decrease the temp and bump the penalty to 1.18 so it doesnt fall into a loop of self questioning , above 100k it started to loop into his own thoughts and arguing , and instead of deciding to print one final answer it just kinda goes forever , so these settings helped a lot !.

Using the latest llama.ccp that gets like new updates every hour , and latest unsloth gguf that got updated 2-6 hours ago , so redownload !

Model : gemma-4-26B-A4B-it-UD-IQ4_NL.gguf , unsloth (unsloth bis)
These are my current settings for llama.ccp , that i start with pshel script :

# --- [2. OPTIMIZATION PARAMETERS] ---
$ContextSize = "262144" 
$GpuLayers = "99"
$Temperature = "0.7"
$TopP = "0.95"
$TopK = "40"
$MinP = "0.05"
$RepeatPenalty = "1.17"
# --- [3. THE ARGUMENT CONSTRUCTION] ---
$ArgumentList = @(
    "-m", $ModelPath,
    "--mmproj", $MMProjPath,
    "-ngl", $GpuLayers,
    "-c", $ContextSize,
    "-fa", "1",
    "--cache-ram", "2048",
    "-ctxcp", "2",
    "-ctk", "q8_0",
    "-b", "512",               # Smaller batch for less activation overhead
    "-ub", "512",
    "-ctv", "q8_0",
    "--temp", $Temperature,
    "--top-p", $TopP,
    "--top-k", $TopK,
    "--min-p", $MinP,
    "--repeat-penalty", $RepeatPenalty,
    "--host", "0.0.0.0",
    "--port", "8080",
    "--jinja",
    "--metrics"


)

What else i can test ? honestly i ran out of ideas to crash it! It just gulps and gulps whatever i throw at it