TheaterFire

Does 'preserve_thinking' work with openwebui?

Posted by sterby92@reddit | LocalLLaMA | View on Reddit | 33 comments

I'm running qwen3.6-35b with llama.cpp connected to openwebui. And I noticed the model fails the number guessing game test on openwebui while it works perfectly with the llama.cpp web ui. Am I missing something and need to activate it somewhere? Otherwise I guess I'll open an Issue on GH or create a PR. Thanks a lot! 😄

Reply to Post

33 Comments

ayylmaonade@reddit

So, I just found this thread because I noticed Qwen's performance had been a little more spotty than usual, then tracked it down to an issue with preserve_thinking. Turns out, it's actually just any version of Open-WebUI past 0.9.2. I just ended up pulling that version and I'll be staying on it for the forseeable future.
View on Reddit #86267484

HuskyTheSniffer@reddit

Even without preserve thinking, afaik openwebui always injects the thinking from previous turns See [openwebui doc](https://docs.openwebui.com/features/chat-conversations/chat-features/reasoning-models/#configuration--behavior)
View on Reddit #85750270

sterby92@reddit (OP)

Hm, in my tests it doesn't seem to work 🤔
View on Reddit #85752159

BankjaPrameth@reddit

Just tested with Qwen 3.5 397B which has no preserve\_thinking support and it works https://preview.redd.it/c6u7hb53in0h1.jpeg?width=1206&format=pjpg&auto=webp&s=80e158a866f90a9f8f8b351d757161a194a8e673
View on Reddit #85805879

HuskyTheSniffer@reddit

Hmm, i tried with gpt oss, I asked it to output the full reasoning of earlier turns exactly, eord for word, and it was able to do that
View on Reddit #85786389

Synthetic451@reddit

I tried the two number test in OpenWebUI and it did not work without adding preserve\_thinking to chat\_template\_kwags
View on Reddit #85752272

sterby92@reddit (OP)

for me it also did not work with the preserve\_thinking in chat\_template\_kwags. But in the native llama.cpp WebUI it worked...
View on Reddit #85752481

nickless07@reddit

You content does look like more a high temp rather then 'preserved' anything. Maybe it will tell you locking in a number after 10 more turns too. Can you log the incoming token? As for me it works as expected with all reasoning content send back to the model each turn. I even wrote a script to stip the CoT as it bloated the ctx too much. [https://docs.openwebui.com/features/chat-conversations/chat-features/reasoning-models](https://docs.openwebui.com/features/chat-conversations/chat-features/reasoning-models)
View on Reddit #85754820

sterby92@reddit (OP)

It works 100% of the time with the same configuration with the llama.cpp web interface. What version of openwebui are you running? Might be broken recently
View on Reddit #85755517

nickless07@reddit

0.92 and the full thing get send: Received request: POST to /v1/chat/completions with body  { "stream": true, "model": "qwen3.6-35b-a3b", "messages": \[ { "role": "user", "content": "hmm do you know whow to deal with \\"8197a522-c63f-4681-8ab0-58c558af5ef9\\" ?" }, { "role": "assistant", "content": "<think>The user is asking about a specific ID: \\"81... <Truncated in logs> ...not have this ID. Let's check.\\nI will call</think>" }, { "role": "user", "content": "hmm" } \], "tools": \[ Let me update and test again.
View on Reddit #85756213

sterby92@reddit (OP)

It was resolved, see the updated post. TLDR: provider of the connection needs to be changed to llama.cpp to support recent changes
View on Reddit #85756768

nickless07@reddit

Yeah, thanks. Hehe i wouldn't have noticed it for quite some time too.
View on Reddit #85757745

nickless07@reddit

Looks a bit different now. { "role": "user", "content": "\[08/05/2026, Friday, 05:09:28 PM\]\\nhmm do you know whow to deal with \\"8197a522-c63f-4681-8ab0-58c558af5ef9\\" ?" }, { "role": "assistant", "content": "<details type=\\"reasoning\\" done=\\"false\\">\\n<summary>T... <Truncated in logs> ... ID. Let\&#x27;s check.\\n\&gt; I will call\\n</details>" }, { "role": "user", "content": "\[11/05/2026, Monday, 06:35:26 PM\]\\nhmm" } \], "tools": \[ I stopped generation mid thinking, so "role": "assistant" only contains CoT, no finished reply. However full content still get send. Perhaps a parsing error? https://preview.redd.it/lonfbtu5ej0h1.png?width=2391&format=png&auto=webp&s=597f2ebe859645418f75d1427a304797a649d6cd
View on Reddit #85757028

AdamLangePL@reddit

I have forked openwebui and added some features loke context compaction and progress bar with usage and tps speed :) let me check preserve thinking
View on Reddit #85746169

Medium_Chemist_4032@reddit

Some heroes don't wear capes - they eat pierogi
View on Reddit #85746558

WyattTheSkid@reddit

Polish moment
View on Reddit #85747518

NoStage9115@reddit

https://preview.redd.it/eerchbaqdj0h1.jpeg?width=1280&format=pjpg&auto=webp&s=9a9d5cfaefd266ef4df445c39e0db2d0a1578b13
View on Reddit #85756809

AdamLangePL@reddit

Yup!
View on Reddit #85747287

apetersson@reddit

thanmk you for your service! do you have a link to your fork? is OWU main kinda slow with rolling out such obvious contributions? my video input capable models are waiting to be usable since forever because of OWU not simply passing through the file
View on Reddit #85746720

AdamLangePL@reddit

It was for private use, a bit vibe coded (no time to go deep dive) but works. I will share it later today :)
View on Reddit #85747249

AdamLangePL@reddit

https://preview.redd.it/bm8874hwui0h1.png?width=1211&format=png&auto=webp&s=3c6bb7c7d169a9e254d9888c7186fb6fe213e5b1 Progress and details above the chat box :)
View on Reddit #85747437

sterby92@reddit (OP)

Thanks a lot! Would be great to know 😃
View on Reddit #85746786

TechSwag@reddit

After messing about, I think I see what happened. There was a change to specify what kind of provider type a connection is. Apparently llama.cpp (among others) handle reasoning differently than Open WebUI's "default". You have to switch the provider type to `llama.cpp` so Open WebUI sends the reasoning_content back to llama.cpp properly. [[docs](https://docs.openwebui.com/features/chat-conversations/chat-features/reasoning-models/#path-2--reasoning-captured-into-a-structured-output-array)] After swapping it looks to work now.
View on Reddit #85754897

Synthetic451@reddit

Nice this worked for me as well. Is this settable via an environmental variable?
View on Reddit #85756129

sterby92@reddit (OP)

This is the SOLUTION! 🥳 🥳 Thanks a lot! I will add it to the post!
View on Reddit #85755704

Synthetic451@reddit

It used to work just a week ago. Something broke in the latest update
View on Reddit #85753162

sterby92@reddit (OP)

looks like it... :/ I hope its an easy fix 😃
View on Reddit #85753332

Synthetic451@reddit

Yeah I just tested with the llama.cpp server Web UI and that worked every time. So something definitely broke in OpenWebUI because it used to work reliably there too.
View on Reddit #85754903

TechSwag@reddit

It did. I just tested it now and it seems to not be working anymore, not sure if it's an Open WebUI or llama.cpp issue. For clarity, I tried this when the first PSA/FYI post gained some traction, and it worked fine. I updated Open WebUI just now and no change. Verified through llama-swap's logs that `preserve_thinking` was set to true. I'll rebuild llama.cpp/llama-swap now just in case.
View on Reddit #85749159

TechSwag@reddit

Yeah it's not working unfortunately. Maybe I'm hallucinating, but I could've sworn it was working at one point. I was running the `dev` branch for a short period though so maybe it was a change made in `dev` that never got pushed to prod. I did find [a comment](https://github.com/open-webui/open-webui/issues/23175#issuecomment-4285894634) made by a maintainer saying it was "likely fixed in dev", but then reverted in dev due to an issue, and that it should be instead handled externally (which based on my understanding, is not fundamentally possible lmao). It has been brought up to the maintainers though, see below: https://github.com/open-webui/open-webui/issues/23339 https://github.com/open-webui/open-webui/discussions/23895
View on Reddit #85754143

AltruisticList6000@reddit

Stuff usually works on textgen webui maybe check it there too. If it doesn't fail their either then probably openwebui has some problems.
View on Reddit #85747965

Digital_Soul_Naga@reddit

try it and let us know
View on Reddit #85745987

sterby92@reddit (OP)

I mean, thats what I did I guess 🤔 it seems not to work, but I cannot 100% confirm it yet. And I'm interested to know if that is expected or something wrong on my part
View on Reddit #85746139