Qwen3.6-35b stuck in infinite loop

Posted by ConfidentSolution737@reddit | LocalLLaMA | View on Reddit | 8 comments

Has any one else faced the issue, where the model keeps responding a with a repeated text/tool call without ever stopping ?

Using this attached config.

[-]

MoistApplication5759@reddit

The repeat_penalty helps but won't fully solve it — infinite tool call loops are a fundamental issue with reasoning models that don't have a hard stopping condition outside the model itself.

Beyond sampling params, worth adding an external loop guard: a max tool call count per run, or a budget cap that kills the run if it exceeds N steps. That way it can't spiral regardless of how the model is behaving.

We built SupraWall for exactly this kind of enforcement — hard caps on tool call counts, execution budgets, and blocked categories before they execute. Works as a wrapper around local agent setups like llama.cpp-based servers: github.com/wiserautomation/SupraWall

[-]

Holiday_Bowler_2097@reddit

Try llama.cpp vulkan. I heard Nvidia admitted bug in cuda 12.? Check Unsloth's guide for broken cuda version for 3.6 qwen 35b

[-]

ConfidentSolution737@reddit (OP)

My cuda version is 13.0, which is safe (13.2 had issues)

[-]