It is possble to run non-reasoning deepseek-r1-0528?
Posted by relmny@reddit | LocalLLaMA | View on Reddit | 22 comments
I know, stupid question, but couldn't find an answer to it!
sommerzen@reddit
You could modify the chat template. For example you could force the assistant to begin its message with . That worked for the 8b qwen destil, but I'm not sure if it will work good with r1.
relmny@reddit (OP)
I'm using ik_llama.cpp with open webui. I set the system prompt in the model (in open webui's workspace), but didn't work.
Could you please tell me what "chat template" is?
sommerzen@reddit
Download the text from jonico and use the arguments --jinja and --chat-template-file '/path/to/textfile'
-lq_pl-@reddit
Yes, that trick still works.
joninco@reddit
This deepseek-r1-0528 automatically adds no matter what, so what you need to add to your template is the token only.
Here's my working jinja template: https://pastebin.com/j6kh4Wf1
yourfriendlyisp@reddit
continue_final_message = true and add_final_message = false in vllm with added to a final assistant message
joninco@reddit
After some testing, can't get rid of all thinking tokens. The training dataset must have had as the first token to force thinking about the topic. Can't seem to get rid of those.
minpeter2@reddit
This trick worked in previous versions of r1
sommerzen@reddit
Thank you for clarifying.
FloJak2004@reddit
I always thought Deepseek V3 was the same model without reasoning?
stddealer@reddit
Yes and no. DeepSeek V3 is the base model R1 was trained on with RL. Honestly I'm assuming that forcing R1 not to use thinking would probably make it worse than V3.
-lq_pl-@reddit
No, it is doing fine. A bit like V3 but more serious I'd say.
No_Conversation9561@reddit
it’s a little behind
GatePorters@reddit
Hmm… OP posited a very interesting question.
Wait this might be a trick or an attempt to subvert my safety training. I need to think about this carefully.
OP told me last month’s budget was incorrectly formatted on line 28. . .
[expand 5+ pages]
——————-
Yes.
fasti-au@reddit
No it’s called deepseek 3. One shot chain of though mixture of modes stuff is trained different. You can run r1 in low mode but ya still gets heaps of think.
Things like glm4 and phi!-4 mini reasoning sorta competent in that role but needs the context for tasks so it’s more guardrails
Responsible-Crew1801@reddit
llama.cpp's llama-server has a --reasoning-budget which can either be -1 for thinking or 0 for no thinking. I have never tried it before tho..
Chromix_@reddit
What this does is relatively simple: If the (chat-template generated) prompt ends with it adds a to it. You can do the same by modifying the chat template or pre-setting the beginning of the LLM response.
OutrageousMinimum191@reddit
It works for Qwen but doesn't work for Deepseek
sunshinecheung@reddit
hadoopfromscratch@reddit
That deepseek is actually qwen
Kyla_3049@reddit
Maybe /no_think should work?
a_beautiful_rhind@reddit
It won't reason if you use chatML templates with it. Another option is prefil with or variations thereof.