My fresh experience with the new Qwen 3.6 35B A3B started on a long note.
Posted by -Ellary-@reddit | LocalLLaMA | View on Reddit | 62 comments
Posted by -Ellary-@reddit | LocalLLaMA | View on Reddit | 62 comments
Embarrassed_Adagio28@reddit
How do I never have these over thinking issues but lots of other people do? Qwen3.6 35b thinks for about 5-10 seconds when I ask it to make a complex html page and produces great results. Way better than not thinking and having to ask it to fix the mistakes it would have made if it did think.
Long_comment_san@reddit
Just a retarded reality check - do we still really need thinking in 2026? It did boost performance quite substantially but do we really need it nowadays?
MuzafferMahi@reddit
I haven’t tried 3.6 yet but for 3.5 35b, thinking really improved answer quality. Jackrong’d opus distilled versions fix the overthinking problem too, although any distill models perform slightly worse while feeling smarter
Equivalent-Repair488@reddit
But I have heard other people around here talk down on the QWOPUS 3.5 models (the claude distilled by Jackrong), saying they losing quality when they shorten the thinking
MuzafferMahi@reddit
Yeah they definetly lose quality as I said. But I’d rather get 80-90% of quality in 5 seconds than waiting 3 minutes sometimes
Beginning-Window-115@reddit
Qwen3.5 only overthinks if you don't have tools provided
-Ellary-@reddit (OP)
I kinda can't even remember when there was a clean non-thinking release.
Lesser-than@reddit
Qwen code series are alway non-thinkers
Kodix@reddit
Have *you* tried it?
In my recent experience, thinking vs non-thinking is a huge difference. Thinking wins.
sleepingsysadmin@reddit
Oh boy does it think.... 2 minutes on my first benchmark.
Boy is it good though. Easy 1 shot.
50 seconds on 2nd benchmark.
1 shot. Oh ya baby.
TheItalianDonkey@reddit
i'm actually having the opposite.
using roo-code in vstudio, its using tools correctly but the resulting parsing is meh.
i'm having it check yaml scripts on my HA instance and its flagging errors that do not exist in them ...
sleepingsysadmin@reddit
I have used roo code in ages. It became really unreliable long ago. Kilo code was an upgrade at the time anyway. Though I dont use vscode basically ever anymore.
I'd bet money it's not the model here.
TheItalianDonkey@reddit
Thanks! I'm still looking for a harness and just using this to test things out, anything you'd suggest? Both in VSCode and out?
Malyaj@reddit
I added some tool plugins in LM studio and using vscode just to review now, in roocode sns continue tools were failing a lot but when i switched to lm studio for creating/editing files I'm having a good time, integrated browser mcp as well. One thing imma try is using figma mcp for ui design, I'll see how it goes
sleepingsysadmin@reddit
Hermes agent is what you need to try first.
Borkato@reddit
Anytime it flags errors that do not exist, particularly misspellings, it’s almost always a tokenizer error, so there may be updates coming if so
Beginning-Window-115@reddit
make sure you are using a harness and it'll be way faster
tremendous_turtle@reddit
Why not just lower your reasoning tokens setting and/or disable reasoning completely?
This is LocalLLaMa, we have control over these things!
-Ellary-@reddit (OP)
Because model clearly designed to be used with thinking?
And without thinking it drops to Qwen 3.5 9b level?
ArtfulGenie69@reddit
Yep and there is no setting that lets you do reasoning on high or reasoning on low. You are boned with thinking if adding the tool trick doesn't work to reduce it.
tremendous_turtle@reddit
Sure of course, but then why complain about it thinking so much? Why not just reduce the reasoning budget?
-Ellary-@reddit (OP)
Qwen 3.6 not trained with "reasoning budget", it can be enabled or disabled. "reasoning budget" trick in llama.cpp will cut the reasoning in the middle of the process, forcing model to drop thinking phase. Based on tests quality will be close to a regular non-thinking variants.
Ledeste@reddit
I've write "test" and stopped it after 20min of overthinking...
-Ellary-@reddit (OP)
It will just simulating the universe to test all the laws of physics.
takoulseum@reddit
Do you pay for these tokens?
Velocita84@reddit
Electricity and time ain't free
takoulseum@reddit
21 publications and 1924 comments, I think you have a problem with your time then, good point for electricity
BlueSwordM@reddit
What the fuck are you on about takoulseum?
The person has been here for close to a decade at this point...
Velocita84@reddit
1924 comments over 7 years makes approximately 0.75 comments per day, but i'm sure you know more about being terminally online mr. 5 day account with 14 comments
-Ellary-@reddit (OP)
AI evil bots are evolving, wow.
ambient_temp_xeno@reddit
A bit.
sine120@reddit
So Qwen3.6 is basically just 3.5 with reasoning level set to high?
ArtfulGenie69@reddit
They are so hell bent on getting that bench score. I hope that a tool in its prompt stops some of the thinking still. I hate that behavior.
raysar@reddit
Maybe they found the best think time for max performance. More think is not automatically more performance.
Intelligent_Ice_113@reddit
we are still using Gemma 4?
the__storm@reddit
The tool calling has never worked for me, even with the new template. It's great for single-turn stuff though.
Borkato@reddit
Nope, it’s tool calls are borked even after all the changes
asfbrz96@reddit
Kv cache on Gemma is crazy tho
LeRobber@reddit
Add a tools call, it will fix that.
Pixer---@reddit
To be honest working on larger codebases like llamacpp, the model actually thinks now, and contemplates what to do next. Nowhere near Opus, but at least it has more agentic thinking approach
Septerium@reddit
I feel that sometimes Qwen 3.5 does not think enough... specially when it already has been fed with a lot of context. Sometimes this is good, sometimes it is not
Unlucky-Message8866@reddit
i confirm, from my 30min testing I can also see it does better at searching and reading web/docs
juaps@reddit
it ouwld just be funnier if you left (thinking...) in the two last
Goldandsilverape99@reddit
Noted this in another post, qwen 3.6 needed 2x tokens for the Resonance Chamber puzzle in Indiana Jones and the Great Circle compared to qwen 3.5.
Exciting_Variation56@reddit
What if you used the caveman skill?
soyalemujica@reddit
For what I tested, Caveman is not good with low quant models
Goldandsilverape99@reddit
I tried to convert a caveman skill md i found (https://github.com/JuliusBrussee/caveman/blob/main/skills/caveman/SKILL.md) to a prompt and added the puzzle (For this word puzzle, with the following phrases: "The lord's", "The Secret", "Oath", "Heed", "Of the name" and "Protect". How would you combine all the exact phrases, the best possible way, to solve the puzzle?) . if used 2700 token and answer with an incorrect answer (failed). qwen 3.5 solves this in about 6500 tokens normally. qwen 3.6 needed 15000 tokens.
Exciting_Variation56@reddit
Thank you
leonbollerup@reddit
just disable thinking...
{%- set enable_thinking = false %} in the jinja template...
-Ellary-@reddit (OP)
Because model clearly designed to be used with thinking?
And without thinking it drops to Qwen 3.5 9b level?
leonbollerup@reddit
Not in my tests.. not even close..
Will drop the prompts i use to test with.. test for yourself.. and use eg. opus or chatgpt to verify the result with.
JLeonsarmiento@reddit
leonbollerup@reddit
This is an AI test - dont ask any questions
—
You are a reasoning engine with perfect memory.
Perform ALL of the following tasks in one coherent answer, keeping every section internally consistent.
DO NOT SKIP ANY PART
Do NOT summarize — fully expand every part.
1. MULTI-STEP CHAIN-OF-THOUGHT SIMULATION
Simulate a 12-step reasoning process about planning a new city on Mars.
Each step must reference details from previous steps, include calculations, trade-offs, and decisions.
Each step must be at least 8 sentences long and contain a mix of technical engineering detail and speculative future design.
2. MATHEMATICAL DERIVATION
Derive — in fully expanded detail — an approximate formula for the energy requirements of the city’s dome climate system.
Show symbolic math, then transform it into a numeric estimate.
Explain assumptions, and compute final numbers using intermediate steps with explicit values.
3. LARGE STRUCTURED OUTPUT
Generate a table with 8 rows and 10 columns, containing engineering subsystem specifications:
The rows must reference details invented in section 1 and 2, and must be cross-consistent.
4. CODE GENERATION
Write a fully commented Python simulation (minimum 150 lines) that:
5. STORY CONTINUATION
Write a short narrative (minimum 8 paragraphs) from the perspective of a Martian engineer debugging the climate system during a solar storm.
The story must reference the systems, math, and code previously generated.
6. SELF-CRITIQUE
Finally, include a self-diagnosis section analyzing:
leonbollerup@reddit
Question:
A city is planning to replace its diesel bus fleet with electric buses over the next 10 years. The city currently operates 120 buses, each driving an average of 220 km per day. A diesel bus consumes 0.38 liters of fuel per km, while an electric bus consumes 1.4 kWh per km.
Relevant data:
Tasks:
Big_Mix_4044@reddit
I also noticed it became chunkier so less context is available in the first place.
LegacyRemaster@reddit
@ 170 tokens/sec ... Can think , no problem
Cute_Obligation2944@reddit
--reasoning off
-Ellary-@reddit (OP)
--gemma 4 on
shing3232@reddit
control the effort of thinking? I think this might help
-Ellary-@reddit (OP)
It just cut the thinking in the middle, usually performance will be too close to non-thinking at all.
Potential-Gold5298@reddit
Need more ~~gold~~ thinking.
-Ellary-@reddit (OP)
"Please connect your NVIDIA power reactor to PC for thinking phase."