GLM-5.1 Overthinking?

Posted by Specific-Rub-7250@reddit | LocalLLaMA | View on Reddit | 2 comments

I am running GLM-5.1 UD-Q4_K_XL locally with Claude Code (temp=1.0, top_k=40, top_p=0.95, min_p=0.0, reasoning=on). However, it has a strong tendency to overthink. It often acknowledges the behavior but then continues anyway. Setting a reasoning budget works for the WebUI, but with Claude Code, it just keeps reading half the repo. I didn't have this problem with GLM-4.7. Does anyone else have the same experience?

[-]

Status_Record_1839@reddit

GLM-5.1 has a longer default thinking budget than 4.7. With Claude Code you can try adding a system prompt like "Keep your reasoning brief" or set `num_ctx` lower to cap token generation. The Q4_K_XL quant also tends to ramble more than smaller quants in my experience.

chisleu@reddit

You were likely running 4.7 in a larger quant where it is more reliable.