Qwen3.6-35b stuck in infinite loop
Posted by ConfidentSolution737@reddit | LocalLLaMA | View on Reddit | 8 comments
Has any one else faced the issue, where the model keeps responding a with a repeated text/tool call without ever stopping ?
Using this attached config.
MoistApplication5759@reddit
The repeat_penalty helps but won't fully solve it — infinite tool call loops are a fundamental issue with reasoning models that don't have a hard stopping condition outside the model itself.
Beyond sampling params, worth adding an external loop guard: a max tool call count per run, or a budget cap that kills the run if it exceeds N steps. That way it can't spiral regardless of how the model is behaving.
We built SupraWall for exactly this kind of enforcement — hard caps on tool call counts, execution budgets, and blocked categories before they execute. Works as a wrapper around local agent setups like llama.cpp-based servers: github.com/wiserautomation/SupraWall
Holiday_Bowler_2097@reddit
Try llama.cpp vulkan. I heard Nvidia admitted bug in cuda 12.? Check Unsloth's guide for broken cuda version for 3.6 qwen 35b
ConfidentSolution737@reddit (OP)
My cuda version is 13.0, which is safe (13.2 had issues)
RedAdo2020@reddit
Can't help you with your problem, but I thought Batch has to be larger than or equal to Ubatch.
Long_comment_san@reddit
You forgot presence penalty
Factemius@reddit
Try with bare minimum args
Ok-Mongoose-3614@reddit
Try temp at 1
Shoddy_Cook_864@reddit
Try this project out, its a free open source project that lets you use large models like Kimi K2 with claude code for completely free by utilizing NVIDIA Cloud.
Github link: https://github.com/Ujwal397/Arbiter/