(llama.cpp) Possible to disable reasoning for some requests (while leaving reasoning on by default)?

Posted by regunakyle@reddit | LocalLLaMA | View on Reddit | 19 comments

I am running unsloth/gemma-4-26B-A4B-it-GGUF/gemma-4-26B-A4B-it-UD-Q4_K_XL.gguf with llama-server (with reasoning enabled).

Is it possible to disable reasoning for some requests only? If yes, how?

I want to leave reasoning on by default, but in some other use cases I want it to respond as fast as possible (e.g. chatting bot)