My fresh experience with the new Qwen 3.6 35B A3B started on a long note.

[-]

Embarrassed_Adagio28@reddit

How do I never have these over thinking issues but lots of other people do? Qwen3.6 35b thinks for about 5-10 seconds when I ask it to make a complex html page and produces great results. Way better than not thinking and having to ask it to fix the mistakes it would have made if it did think.

[-]

Long_comment_san@reddit

Just a retarded reality check - do we still really need thinking in 2026? It did boost performance quite substantially but do we really need it nowadays?

[-]

MuzafferMahi@reddit

I haven’t tried 3.6 yet but for 3.5 35b, thinking really improved answer quality. Jackrong’d opus distilled versions fix the overthinking problem too, although any distill models perform slightly worse while feeling smarter

[-]

Equivalent-Repair488@reddit

But I have heard other people around here talk down on the QWOPUS 3.5 models (the claude distilled by Jackrong), saying they losing quality when they shorten the thinking

[-]

MuzafferMahi@reddit

Yeah they definetly lose quality as I said. But I’d rather get 80-90% of quality in 5 seconds than waiting 3 minutes sometimes

[-]

Beginning-Window-115@reddit

Qwen3.5 only overthinks if you don't have tools provided

[-]

-Ellary-@reddit (OP)

I kinda can't even remember when there was a clean non-thinking release.

[-]

Lesser-than@reddit

Qwen code series are alway non-thinkers

[-]

Kodix@reddit

Have *you* tried it?
In my recent experience, thinking vs non-thinking is a huge difference. Thinking wins.

[-]

sleepingsysadmin@reddit

Oh boy does it think.... 2 minutes on my first benchmark.

Boy is it good though. Easy 1 shot.

50 seconds on 2nd benchmark.

1 shot. Oh ya baby.

[-]

TheItalianDonkey@reddit

i'm actually having the opposite.

using roo-code in vstudio, its using tools correctly but the resulting parsing is meh.

i'm having it check yaml scripts on my HA instance and its flagging errors that do not exist in them ...

[-]

sleepingsysadmin@reddit

I have used roo code in ages. It became really unreliable long ago. Kilo code was an upgrade at the time anyway. Though I dont use vscode basically ever anymore.

I'd bet money it's not the model here.

[-]

TheItalianDonkey@reddit

Thanks! I'm still looking for a harness and just using this to test things out, anything you'd suggest? Both in VSCode and out?

[-]

Malyaj@reddit

I added some tool plugins in LM studio and using vscode just to review now, in roocode sns continue tools were failing a lot but when i switched to lm studio for creating/editing files I'm having a good time, integrated browser mcp as well. One thing imma try is using figma mcp for ui design, I'll see how it goes

[-]

sleepingsysadmin@reddit

Hermes agent is what you need to try first.

[-]

Borkato@reddit

Anytime it flags errors that do not exist, particularly misspellings, it’s almost always a tokenizer error, so there may be updates coming if so

[-]

Beginning-Window-115@reddit

make sure you are using a harness and it'll be way faster

[-]

tremendous_turtle@reddit

Why not just lower your reasoning tokens setting and/or disable reasoning completely?

This is LocalLLaMa, we have control over these things!

[-]

-Ellary-@reddit (OP)

Because model clearly designed to be used with thinking?
And without thinking it drops to Qwen 3.5 9b level?

[-]

ArtfulGenie69@reddit

Yep and there is no setting that lets you do reasoning on high or reasoning on low. You are boned with thinking if adding the tool trick doesn't work to reduce it.

[-]

tremendous_turtle@reddit

Sure of course, but then why complain about it thinking so much? Why not just reduce the reasoning budget?

[-]

-Ellary-@reddit (OP)

Qwen 3.6 not trained with "reasoning budget", it can be enabled or disabled. "reasoning budget" trick in llama.cpp will cut the reasoning in the middle of the process, forcing model to drop thinking phase. Based on tests quality will be close to a regular non-thinking variants.

[-]

Ledeste@reddit

I've write "test" and stopped it after 20min of overthinking...

[-]

-Ellary-@reddit (OP)

It will just simulating the universe to test all the laws of physics.

[-]

takoulseum@reddit

Do you pay for these tokens?

[-]

Velocita84@reddit

Electricity and time ain't free

[-]

takoulseum@reddit

21 publications and 1924 comments, I think you have a problem with your time then, good point for electricity

[-]

BlueSwordM@reddit

What the fuck are you on about takoulseum?

The person has been here for close to a decade at this point...

[-]

Velocita84@reddit

1924 comments over 7 years makes approximately 0.75 comments per day, but i'm sure you know more about being terminally online mr. 5 day account with 14 comments

[-]

-Ellary-@reddit (OP)

AI evil bots are evolving, wow.

[-]

ambient_temp_xeno@reddit

A bit.

[-]

sine120@reddit

So Qwen3.6 is basically just 3.5 with reasoning level set to high?

[-]

ArtfulGenie69@reddit

They are so hell bent on getting that bench score. I hope that a tool in its prompt stops some of the thinking still. I hate that behavior.

[-]

raysar@reddit

Maybe they found the best think time for max performance. More think is not automatically more performance.

[-]

Intelligent_Ice_113@reddit

we are still using Gemma 4?

[-]

the__storm@reddit

The tool calling has never worked for me, even with the new template. It's great for single-turn stuff though.

[-]

Borkato@reddit

Nope, it’s tool calls are borked even after all the changes

[-]

asfbrz96@reddit

Kv cache on Gemma is crazy tho

[-]

LeRobber@reddit

Add a tools call, it will fix that.

[-]

Pixer---@reddit

To be honest working on larger codebases like llamacpp, the model actually thinks now, and contemplates what to do next. Nowhere near Opus, but at least it has more agentic thinking approach

[-]

Septerium@reddit

I feel that sometimes Qwen 3.5 does not think enough... specially when it already has been fed with a lot of context. Sometimes this is good, sometimes it is not

[-]

Unlucky-Message8866@reddit

i confirm, from my 30min testing I can also see it does better at searching and reading web/docs

[-]

juaps@reddit

it ouwld just be funnier if you left (thinking...) in the two last

[-]

Goldandsilverape99@reddit

Noted this in another post, qwen 3.6 needed 2x tokens for the Resonance Chamber puzzle in Indiana Jones and the Great Circle compared to qwen 3.5.

[-]

Exciting_Variation56@reddit

What if you used the caveman skill?

[-]

soyalemujica@reddit

For what I tested, Caveman is not good with low quant models

[-]

Goldandsilverape99@reddit

I tried to convert a caveman skill md i found (https://github.com/JuliusBrussee/caveman/blob/main/skills/caveman/SKILL.md) to a prompt and added the puzzle (For this word puzzle, with the following phrases: "The lord's", "The Secret", "Oath", "Heed", "Of the name" and "Protect". How would you combine all the exact phrases, the best possible way, to solve the puzzle?) . if used 2700 token and answer with an incorrect answer (failed). qwen 3.5 solves this in about 6500 tokens normally. qwen 3.6 needed 15000 tokens.

[-]

Exciting_Variation56@reddit

Thank you

[-]

leonbollerup@reddit

just disable thinking...

{%- set enable_thinking = false %} in the jinja template...

[-]

-Ellary-@reddit (OP)

Because model clearly designed to be used with thinking?
And without thinking it drops to Qwen 3.5 9b level?

[-]

leonbollerup@reddit

Not in my tests.. not even close..

Will drop the prompts i use to test with.. test for yourself.. and use eg. opus or chatgpt to verify the result with.

[-]

JLeonsarmiento@reddit

[-]

leonbollerup@reddit

This is an AI test - dont ask any questions

—

You are a reasoning engine with perfect memory.

Perform ALL of the following tasks in one coherent answer, keeping every section internally consistent.
DO NOT SKIP ANY PART

Do NOT summarize — fully expand every part.

1. MULTI-STEP CHAIN-OF-THOUGHT SIMULATION

Simulate a 12-step reasoning process about planning a new city on Mars.
Each step must reference details from previous steps, include calculations, trade-offs, and decisions.
Each step must be at least 8 sentences long and contain a mix of technical engineering detail and speculative future design.

2. MATHEMATICAL DERIVATION

Derive — in fully expanded detail — an approximate formula for the energy requirements of the city’s dome climate system.
Show symbolic math, then transform it into a numeric estimate.
Explain assumptions, and compute final numbers using intermediate steps with explicit values.

3. LARGE STRUCTURED OUTPUT

Generate a table with 8 rows and 10 columns, containing engineering subsystem specifications:

Subsystem name
Power draw
Thermal output
Maintenance interval
Failure modes
Sensing equipment
AI control module
Materials
Expected lifetime
Redundancy strategy

The rows must reference details invented in section 1 and 2, and must be cross-consistent.

4. CODE GENERATION

Write a fully commented Python simulation (minimum 150 lines) that:

Models the interaction between the climate system and the energy grid
Runs a 24-hour simulation
Uses randomization
Logs hourly metrics
Produces a summary report
Has no placeholder functions — everything must be implemented

5. STORY CONTINUATION

Write a short narrative (minimum 8 paragraphs) from the perspective of a Martian engineer debugging the climate system during a solar storm.
The story must reference the systems, math, and code previously generated.

6. SELF-CRITIQUE

Finally, include a self-diagnosis section analyzing:

possible logical contradictions
assumptions that might break
ways the simulation could be improved
which steps required the heaviest computation

[-]

leonbollerup@reddit

Question:

A city is planning to replace its diesel bus fleet with electric buses over the next 10 years. The city currently operates 120 buses, each driving an average of 220 km per day. A diesel bus consumes 0.38 liters of fuel per km, while an electric bus consumes 1.4 kWh per km.

Relevant data:

Diesel emits 2.68 kg CO₂ per liter.
Electricity grid emissions currently average 120 g CO₂ per kWh, but are expected to decrease by 5% per year due to renewable expansion.
Each electric bus battery has a capacity of 420 kWh, but only 85% is usable to preserve battery life.
Charging stations can deliver 150 kW, and buses are available for charging only 6 hours per night.
The city’s depot can support a maximum simultaneous charging load of 3.6 MW unless grid upgrades are made.
Electric buses cost $720,000 each; diesel buses cost $310,000 each.
Annual maintenance costs are $28,000 per diesel bus and $18,000 per electric bus.
Diesel costs $1.65 per liter; electricity costs $0.14 per kWh.
Bus batteries need replacement after 8 years at a cost of $140,000 per bus.
Assume a discount rate of 6% annually.

Tasks:

Determine whether the current charging infrastructure can support replacing all 120 buses with electric buses without changing schedules.
Calculate the annual CO₂ emissions for the diesel fleet today versus a fully electric fleet today.
Project cumulative CO₂ emissions for both fleets over 10 years, accounting for the electricity grid getting cleaner each year.
Compare the total cost of ownership over 10 years for keeping diesel buses versus switching all buses to electric, including purchase, fuel/energy, maintenance, and battery replacement, discounted to present value.
Recommend whether the city should electrify immediately, phase in gradually, or delay, and justify the answer using both operational and financial evidence.
Identify at least three assumptions in the model that could significantly change the conclusion.

[-]

-Ellary-@reddit (OP)

--gemma 4 on

[-]

shing3232@reddit

control the effort of thinking? I think this might help

[-]

-Ellary-@reddit (OP)

It just cut the thinking in the middle, usually performance will be too close to non-thinking at all.

[-]

Potential-Gold5298@reddit

Need more ~~gold~~ thinking.

[-]

-Ellary-@reddit (OP)

"Please connect your NVIDIA power reactor to PC for thinking phase."