Fix: OpenClaw + Ollama local models silently timing out? The slug generator is blocking your agent (and 4 other fixes)

Posted by After-Confection-592@reddit | LocalLLaMA | View on Reddit | 19 comments

I spent a full day debugging why Gemma 4 26B (and E4B) would never respond through OpenClaw on Telegram, even though ollama run gemma4 worked perfectly fine. Sharing everything I found.

Hardware: Mac Studio M4 Max, 128GB unified memory

Setup: OpenClaw 2026.4.2 + Ollama 0.20.2 + Gemma 4 26B-A4B Q8_0

The Symptoms

/new works instantly, shows correct model
Send "hi" and nothing happens. No typing indicator, no response
No visible errors in the gateway log
Model responds in <1s via direct ollama run

Root Cause #1: The Slug Generator Jams Ollama

This was the big one. OpenClaw has a session-memory hook that runs a "slug generator" to name session files. It sends a request to Ollama with a hardcoded 15s timeout. The model can't process OpenClaw's system prompt in 15s, so:

OpenClaw times out and abandons the request
Ollama keeps processing the abandoned request
The main agent's request queues behind it
Ollama is now stuck. Even curl to Ollama hangs

This is a known issue but the workaround isn't documented anywhere:

openclaw hooks disable session-memory

Root Cause #2: 38K Character System Prompt

OpenClaw injects \~38,500 characters of system prompt (identity, tools, bootstrap files) on every request. Cloud APIs process this in milliseconds. Local models need 40-60s just for the prefill.

Fix: Skip bootstrap file injection to cut it in half:

{
  "agents": {
    "defaults": {
      "skipBootstrap": true,
      "bootstrapTotalMaxChars": 500
    }
  }
}

This brought the system prompt from 38K down to \~19K chars.

Root Cause #3: Hidden 60s Idle Timeout

OpenClaw has a DEFAULT_LLM_IDLE_TIMEOUT_MS of 60 seconds. If the model doesn't produce a first token within 60s, it kills the connection and silently falls back to your fallback model (Sonnet in my case). The config key is undocumented:

{
  "agents": {
    "defaults": {
      "llm": {
        "idleTimeoutSeconds": 300
      }
    }
  }
}

Root Cause #4: Ollama Processes Requests Serially

Even with OLLAMA_NUM_PARALLEL=4, abandoned requests from the slug generator hold slots. Add this to your Ollama plist/service config anyway:

OLLAMA_NUM_PARALLEL=4

Root Cause #5: Thinking Mode

Gemma 4 defaults to a thinking/reasoning phase that adds 20-30s before the first token. Disable it:

{
  "agents": {
    "defaults": {
      "thinkingDefault": "off"
    }
  }
}

Full Working Config

{
  "agents": {
    "defaults": {
      "model": {
        "primary": "ollama/gemma4:26b-a4b-it-q8_0",
        "fallbacks": ["anthropic/claude-sonnet-4-6"]
      },
      "thinkingDefault": "off",
      "timeoutSeconds": 600,
      "skipBootstrap": true,
      "bootstrapTotalMaxChars": 500,
      "llm": {
        "idleTimeoutSeconds": 300
      }
    }
  }
}

Pin the model in memory so it doesn't unload between requests:

curl http://localhost:11434/api/generate -d '{"model":"gemma4:26b-a4b-it-q8_0","keep_alive":-1,"options":{"num_ctx":16384}}'

Result

First message after /new: \~60s (system prompt prefill, unavoidable for local models)
Subsequent messages: fast (Ollama caches the KV state)
31GB VRAM, 100% GPU, 16K context
Fully local, zero API cost, private

The first-message delay is the tradeoff for running completely local. After that initial prefill, the KV cache makes it snappy. Worth it if you value privacy and zero cost.

Hope this saves someone a day of debugging.

[-]

Hopeful-Fix3130@reddit

As someone who is new and oblivious, where does this config go?

CuchulainnPrimus@reddit

You're a life saver! I have been struggling with this for days....

I switched to llama.cpp directly instead of ollama. Takes some getting used to, but it works so much better!

nugentgl@reddit

This is great information but a little worried about #1. If you disable session-memory, then when I issue /new it will not save any session information before starting a new session. Is this an issue or not really?

OrchidBig7036@reddit

you share to fix debugging for helping me, thanks for you.

therepfella@reddit

You're a lifesaver, I was going crazy trying to make this work until coming across your thread.

It's still slow for me on e4b so i tried adding LiteLLM to address this issue where on boot, its background sync actively ignores upstream Ollama metadata (like num_ctx), aggressively defaults to a hardcoded llama3.3 state, and forcefully pushes a 128,000 token context window onto whatever model you select. With this context window limit (like 16k) for gemma4:e2b, the model's memory management completely collapses under the forced 128k assumption, causing the exact stalling, hallucination, and instruction-forgetting.

Any suggestions?

After-Confection-592@reddit (OP)

Glad it helped!

Emotional-Breath-838@reddit

this was why i left openclaw for hermes

Fair enough. The defaults are definitely not local-friendly out of the box. Once you get past the config though, the tool calling and Telegram integration are solid.

Fix: OpenClaw + Ollama local models silently timing out? The slug generator is blocking your agent (and 4 other fixes)

The Symptoms

Root Cause #1: The Slug Generator Jams Ollama

Root Cause #2: 38K Character System Prompt

Root Cause #3: Hidden 60s Idle Timeout

Root Cause #4: Ollama Processes Requests Serially

Root Cause #5: Thinking Mode

Full Working Config

Result

Hopeful-Fix3130@reddit

CuchulainnPrimus@reddit

CuchulainnPrimus@reddit

nugentgl@reddit

OrchidBig7036@reddit

OrchidBig7036@reddit

therepfella@reddit

UnderstandingFew2968@reddit

Leaxpm@reddit

ComfortableSafe7085@reddit

Substantial-Dot-2916@reddit

EmilyWong_LA@reddit

maxedbeech@reddit

Character_Split4906@reddit

draconisx4@reddit

Even_Minimum_4797@reddit

After-Confection-592@reddit (OP)

Emotional-Breath-838@reddit

After-Confection-592@reddit (OP)