DeepSeek 3.2 eating the opening think tag on llama.cpp server?

Posted by Winter_Engineer2163@reddit | LocalLLaMA | View on Reddit | 3 comments

Hey guys. Having a weird issue with the new DeepSeek V3.2 Unsloth GGUF via llama-server. The model starts reasoning fine, but the actual opening think tag is missing from the output stream. I just see the plain text reasoning, and then the closing tag at the end.

Because of this, Open WebUI doesn't collapse the thought block. Im on a 512GB box, command is just llama-server -m model_name -t 32 --flash-attn on. Tried toggling reasoning on/off, didn't help.

Is the chat template broken in these specific GGUFs or am I missing a flag?

[-]

Winter_Engineer2163@reddit (OP)

Update:
Just tried adding the --jinja flag to llama-server to force the internal chat template, but no luck. Still getting the same behavior: the reasoning starts as plain text, the opening tag is nowhere to be found, and only the closing </think> tag shows up at the end.

Current startup command: numactl --interleave=all llama-server -m [model] -t 32 --flash-attn on --no-mmap --numa numactl --jinja --host 0.0.0.0 --port 8080

Starting to think it’s either a specific issue with how these Unsloth shards handle the BOS (Beginning of String) token or some weirdness in how Open WebUI intercepts the initial stream. Any other ideas?

[-]

fairydreaming@reddit

Maybe try to run it with recently added `--chat-template-file models/templates/deepseek-ai-DeepSeek-V3.2.jinja`

[-]

Winter_Engineer2163@reddit (OP)

Thanks for the tip, fairydreaming! That was exactly what was missing.

I just did a fresh rebuild of llama.cpp from the latest master and updated my startup command to point directly to the template file. It finally stopped eating the opening tag, and Open WebUI is now correctly collapsing the reasoning block.

Appreciate the help, brother!