qwen3.6 performance jump is real, just make sure you have it properly configured

[-]

Writer_IT@reddit

Is It really Better than the 122b? This seems so over the top "too good to be true" to feel unrealistic

[-]

Well, that's a First. I still remember the rule of thumb from llama times, to always choose the bigger model at lower quantization, especially at q4-ish Probably for moe architetture and tool call that Is not anymore the principle

[-]

gxcreator@reddit

It's not that simple, we don't have larger 3.6 model yet, so it is OLD Q4 vs NEW FP8

[-]

floconildo@reddit

I'm not sure 35B FP8 > 122B INT4 holds true. I think mileage may vary, especially if we're talking complex reasoning, multi turn usage and tool usage.

Personally I went back to 3.5 122B (Q6 though) because 3.6 35B straight up got facts mixed up during a research round. Can't wait for 3.6 122B to release though.

[-]

Writer_IT@reddit

Ok, After through testing while vibe coding, i feel the 3.6 fp8 does NOT win against the 122b nvfp4. It's struggling ti be efficient in developing and following a Plan, compared to the order brother Still a good model for the size, but doesn't turn the general rule about having more parameters

[-]

mycall@reddit

Would you say that Qwen3-Coder-Next Q8_0 would do better than 3.6 for coding?

[-]

Realistic-Elephant-6@reddit

Unfortunately, no, not in my experience. (Qwen3-Coder-Next is on par with 122B-int4 on consistency and code quality, and beats it on speed.)

[-]

mycall@reddit

Now we are both waiting for 3.6-coder

[-]

Writer_IT@reddit

Honestly, didn't use it enough to compare. Q3.5-122b Is the first local model that i used to heavy vibe coding. However, from this thread alone, i feel these matters are subjective enough that you should try it yourself and see how It feels, if anything for the multimodale support

[-]

Realistic-Elephant-6@reddit

Can not confirm on vLLM. 35B keeps making mistakes that neither qwen3-coder-next nor 122B int4 Autoround would ever make... Unless you have some magic sauce -- care to share it? I am really trying to like the 35B model since it would fit nicely with all the other crap into my RAM on the GX10.

[-]

stefan_evm@reddit

can 200% confirm. The 122B model degrades more under INT4 quantization than the 35B model does under FP8. 3.6 35B is much much better

[-]

AlwaysLateToThaParty@reddit

Yeah nah. I use the qwen3.5 122b/10a heretic mxfp4_MOE quant, and the full quantization of qwen3.6 A35/3a simply isn't as consistent.

[-]

gpalmorejr@reddit

The 3.6-35B-A3B seems to be EXTREMELY resistant to quantization losses in general. I use a 2 bit quant and it seems to be doing quite well. Whatever they did seems to be working. I'll take it.

[-]

power97992@reddit

What q2 is okay? Maybe it’s worth a try

[-]

ea_man@reddit

Oh yeah, and guess what? It's dam fast and stays in \~10GB so you can even use it for autocomp.

[-]

power97992@reddit

Uh, it says 12.49gb in lm studio

[-]

ea_man@reddit

Don't do that bro, Qwen3.6-35B-A3B-UD-IQ2_XXS.gguf

10.8 GB

[-]

power97992@reddit

I didnt find in lm studio, maybe i can find it in hugginface. 10.8 is still too big… i need like 10gb

[-]

ea_man@reddit

https://huggingface.co/unsloth/Qwen3.6-35B-A3B-GGUF , that's how small a 2IQ gets.

You have a 12GB GPU? Maybe you should find ways to not waste VRAM with your desktop.

[-]

power97992@reddit

Less than that…. I have like 10.5 max

[-]

ea_man@reddit

Well stick to models like Omnicode / 9B then if you want to load them in 10.5GB.

[-]

gpalmorejr@reddit

For megabeginners I usually recommend just using the built in download menu, especially if you aren't hardware constrained or tweaking things a lot.

But if you are hardware constrained or needs specific updates or are tweaking or just generally a little tech savvy and able to work with files:

I always get the ones from Hugging Face directly. LM studio does usually have the updated versions or downloads them from HF anyway, but for some reason they are always bigger. Part of it is that the LM Studio download usually includes the image embbeder (mmproj), but the size difference is too big to be accounted for by that alone usually. Unsloths pages are well laid out on Hugging Face and make everything easy to pars. And you can look at the download links on the card all in one place.

You will have to go in and add the folder yourself, but just go to where your LM Studio instance stores everything, find the "Models" folder, Add a folder for the "group", such a where it came from, like "Hugging Face", or if you are like me, "Unsloth" for the creator, then inside that folder make a folder named exactly the same as the model file (although there is room to change it and the file name, some workflows care, LM Studio doesn't). Then place your model file and mmproj in that folder. LMStudio will find it automatically. You don't even have to restart it. Just open the load menu again.

If you download multiple quants, you can put them in the same folder and LM Studio will automatically organize your model menu accordingly and let you select a separate quant there. LM Studio generally (Although I have experienced some changes in this behavior) groups them under the folder name and will call them whatever you name the folder, in the interface.

Also, the LM Studio Community quants are sometimes behind Unsloth for updates to theodels and such if that happens. Although, they are usually pretty close. But the Unsloth qunat do usually perform much better for a give file/RAM size.

[-]

gpalmorejr@reddit

I suspect if you push it that'll get weird but so far it is basically on parity with 3.5 Q4_K_M. I keep 3.6 Q4_K_M just in case I was some extra assurance, but it seems to be working so far. Plus it gives me way more room for context cache build ups from RooCode since it can be a little lazy to clear memory sometimes, which has actually resulted in smoother flow overall.

[-]

KallistiTMP@reddit

Wonder if it's coherent enough to run as a draft model for spec decoding @ Q2 or even Q1 alongside the FP8 model.

[-]

gpalmorejr@reddit

As a drafting model? Seems huge for that. But then again, outside of my ability to test on my hardware.

[-]

power97992@reddit

Yeah i wished i had more ram ….

[-]

gpalmorejr@reddit

I feel that lol. What are you at? What's your setup?

[-]

power97992@reddit

I dont have much vram on my laptop, less than 11GB.

[-]

gpalmorejr@reddit

Oof. That's tight.

[-]

power97992@reddit

q2 K is too large, maybe q2 XSS will work or q1.58

[-]

gpalmorejr@reddit

Or you can offle MOE to the CPU/RAM. It'll be slower but it'll be fine. Dedicated GPU or integrated?

[-]

power97992@reddit

Dude, i know u can offload it to normal RAM, but my system has unified ram, there is nowhere to offload it...

[-]

gpalmorejr@reddit

Oh. You didn't say that before. You said 11GB of VRAM. If you are on unified, then I believe that tiu don't actually have to offload at all. There is some other methods for allocating compute. But yeah, that clarification makes a big difference.

[-]

power97992@reddit

My total free ram is less than 11gb, there is no way to increase that, since i need to run my OS..

[-]

gpalmorejr@reddit

I know.... I agree with you.... You'll probably have to use a slightly smaller model unfortunately, but the cool part is that those unified Apple devices are pretty good at Model loading. They aren't as good as GPU only compute but they are pretty good.

[-]

power97992@reddit

I will just use the API, i tried qwen 3.5 9b wasn't great.

[-]

gpalmorejr@reddit

Yeah 9B is good as long as your expectations aren't like full enterprise coding and long context logic. And since it isn't MoE @ A3B and instead dense @9B it is slower as well. I feel you though. That is why I run my model on a large computer at home and use it remotely from my laptop wherever I'm at.

[-]

gpalmorejr@reddit

I tired to run IQ2xxs and it wouldn't load on 16GB. Too tight with the OS in RAM.

[-]

mrrizzle@reddit

Unsloth?

[-]

gpalmorejr@reddit

Yup

[-]

Cold_Tree190@reddit

How much vram would you need to run it in fp8 do you know?

[-]

Wise-Hunt7815@reddit

256K context, about 48G

[-]

whoisraiden@reddit

If you want it all in VRAM, at least 40 GB for good context size.

[-]

romasalah@reddit

I was exactly wondering about that, I'm running around 70GB VRAM and only 3.5-122B at Q3 fits in them. so wondering if moving to 3.6-35B would be better

[-]

is-this-a-nick@reddit

how does it compare to 397 Q3?

[-]

Goldandsilverape99@reddit

The Qwen3.6 35BA3B has lost maybe a very little in "intelligence", but has improved on some of the antigenic and tool calling benches.

[-]

Realistic-Elephant-6@reddit

That's only because the tool calling template has finally been fixed. With the correct tool calling template the 122B is significantly better (just a bit slower due to a larger active core)

[-]

Single_Ring4886@reddit

What about pure programming without agents?

[-]

Karyo_Ten@reddit

But you need agents to write tests and run them and read docs

[-]

ShadyShroomz@reddit

2 years ago I was asking chatgpt to generate individual functions for me and then i'd copy paste them into my code.

you dont need agents...

[-]

Karyo_Ten@reddit

That's such a waste of time though.

[-]

Alarming_Pick626@reddit

Not really, this is probably the best way to use these tools. Even the best models from anthropic starts to fail on real projects.

[-]

Karyo_Ten@reddit

I don't see how being a glorified Ctrl+C Ctrl+V monkey is the best way to use anything.

[-]

Shiny-Squirtle@reddit

...and context. When you ask it to refactor a previously generated function and generates the whole thing again

[-]

AI_Enhancer@reddit

Not saying this is the best possible workflow, but just saying that for this specific problem, you can just remove messages and they get cleared from context. In LM Studio at least. So if the model is just advancing or fixing the same code, you can just delete these messages and then paste the new fixed code back into the convo if needed.

[-]

decoy_258@reddit

what is the correct modern alternative, assuming local inference?

[-]

BannedGoNext@reddit

You don't need agents if YOU are the agent ;). In Soviet Russia agent is you!

[-]

unjustifiably_angry@reddit

This is a criminally slow workflow.

[-]

Great_Guidance_8448@reddit

That's stone age stuff. With agents you have it analyze your code, identify potential bugs, suggest refactors, etc. You are really limiting yourself with the "individual functions" generation.

[-]

thewhzrd@reddit

This is the way.

[-]

Realistic-Elephant-6@reddit

In my experience, it's not. It is significantly dumber and makes a lot of mistakes that 122B doesn't make.

[-]

AlwaysLateToThaParty@reddit

Is It really Better than the 122b?

No, it isn't.

[-]

Icy_Butterscotch6661@reddit

I've been using it with hermes agent and getting pretty frustrated at it. It keeps doing extra shit I didn't ask for despite instructions telling otherwise. For example, I asked it to look up how to install opencode and it went ahead and installed it instead of answering it. I think the 27b / carnice-27b fine-tune did not have this issue but I'm not certain anymore

[-]

leap966@reddit

Haha 😄 Now I need to ask Qwen3.6 how to install OpenCode.

[-]

McSendo@reddit

That's my concern about those agent harnesses. Don't search for porn, seaches for porn.

[-]

howardhus@reddit

yep.. feels like the usual hype that youtubers need to get clicks

[thumbnail of shocked face] "You have to try this new model RIGHT NOW! (but lets drink some cofee first)"

[-]

Possible-Pirate9097@reddit

Network Chuck?

[-]

Ka_Trewq@reddit

He is for a while. And also, is praying. I'm not kidding.

[-]

tmvr@reddit

He's really leaning into every grift that is current isn't he :))

[-]

Savantskie1@reddit

I think he started most of them

[-]

mybruhhh@reddit

Models trained with quality data have been shown to outperform models 10x their size time and time again

[-]

Refefer@reddit

I've run both through our new product to evaluate as a replacement to the 122b model at UD FP4 quants. It is surprisingly good for a small model but isn't as good or efficient at agentic tasks in my tests than the 122b. Still, plan to use it for tasks which are a bit narrower/higher constraints and do not require a lot of world knowledge.

[-]

Ranmark@reddit

I testes it on nicklothian's bench a few times. One time it's actually went over the dense 27b model and got the same result as a 122b MoE. But I wasn't able to recreate this at least once. 27b and 122b is much more stable in that regard.

[-]

zyxwvu54321@reddit

why is the 27B listed twice? And I am not getting any better results than 3.5 35B in my limited testing.

[-]

Realistic-Elephant-6@reddit

Yeah, this model is very much in "YMMV" range for me. Might be the programming languages we use 🤷‍♂️

[-]

zyxwvu54321@reddit

In my more testings, it is better at vision than 3.5 35B. Other than that, its still that that much noticable difference.

[-]

KaMaFour@reddit

Reasoning vs no reasoning

[-]

kuhunaxeyive@reddit

Qwen3.6 is good for programming yes, but not so good at writing natural, concise text. It in part inserts weird phrases and creates convoluted sentences even at Q8. For texts, Gemma-4-31B has a much more high level phrasing that I can trust for European languages.

Also, Qwen3.6 doesn't pass the car washing test reliably. Gemma-4 nails it everytime in seconds and even in non-thinking at Q5.

[-]

Salt-Willingness-513@reddit

Gemma 4 is amazing for swiss german down to e4b. Crazy how good gemma 4 is in european languages. Never saw anything compareable apart from Claude and Gemini.

[-]

Realistic-Elephant-6@reddit

Can it transcribe Schwitzerdytsch from audio though?

[-]

Salt-Willingness-513@reddit

It indeed can

[-]

Realistic-Elephant-6@reddit

Bärndytsch oo? ;)

[-]

Salt-Willingness-513@reddit

züridütsch aber das hets nöd verstande haha

[-]

ayylmaonade@reddit

I've found it wonderful to talk to, even better than 3.5 which I already liked because it reminded me of how Claude talks. But then again, I do have an instruction in my system prompt to use natural language. Might be worth trying:

Use natural, conversational language that matches the user's tone (formal, casual, etc.). Friendly, approachable, and adaptive. Foster intellectual curiosity. Adapt explanations based on perceived proficiency (Adaptive teaching). Humor is allowed. Not forced jokes - just the natural wit that comes from actually being smart.

[-]

Realistic-Elephant-6@reddit

It really does sound a lot like Sonnet, right? Even Sonnet says so, LOL

[-]

ayylmaonade@reddit

Lol yeah, it's almost indistinguishable for me.

[-]

Realistic-Elephant-6@reddit

Gemma-4-31B is a dense model. You are comparing a pears to bananas here.

[-]

Queasy-Contract9753@reddit

I've experienced the same. Much of my work is feeding in very long journeys and discussing them. And summarisation. This model struggles very hard with train of consciousness style writing. Normally my go to is Deepseek or Gemini flash but I have found Gemma 4 31b up to it. 26ba4 isn't bad either but misses minor points more often.

[-]

-Ellary-@reddit

Yeah, Gemma 4 is a general model, for any kind of activity.
Qwen 3.6 is a pure agentic model, there is no point to "talk" to it.

[-]

Southern-Expert22@reddit

Bro, it blitzing codes at like 40t/s with tools, search, scripting, etc..., I put at at like sonnet 4.6 lvl or like gpt 5.3, opus is brain dead atm but hopefully it comes back.

[-]

Realistic-Elephant-6@reddit

Nah, Sonnet *4.6* is still quite a bit better than 35b (Though Sonnet 3.5 isn't, which is wild. We are getting a tiny, free multimodal local MoE model, delivering the quality that was SOTA a year ago...) . But 3.5-122B-A10B is very much in the same range as the new Sonnet, which is why I can barely wait for 3.6-122B...

[-]

CooperDK@reddit

That is a very small jump though.

[-]

MushroomGecko@reddit

> Be Qwen > Release new medium-sized model that competes with previous flagship > Repeat

[-]

Foreign-Beginning-49@reddit

I get scared because they are so good and there's no way this can last. They will become too powerful to give away for free right? The agentic gains in the last 12 months have been dizzying.

[-]

Cerevox@reddit

They are giving it away to undercut western models. As long as google/claude keep producing models, china will keep making better free models to trash their business model.

[-]

ZenaMeTepe@reddit

Could be the first time normal people benefit from such economic warfare.

[-]

BewareOfHorses@reddit

Normal people have always benefited from it. There's a reason every man and his dog in the third world has a phone and internet access.

[-]

unjustifiably_angry@reddit

Well, normally there's a long-term cost that isn't apparent at first, like all your jobs going overseas and your kids' lives being ruined so you could benefit from cheap tat for a few decades.

For AI there's really no downside in this case for average people, at least not one that's immediately obvious. The US economy might tank in the long run because it's being propped up by AI companies right now, but if you're not American you have absolutely zero fucks to give about that situation.

[-]

charnet3d@reddit

The "downside" is that you have to adapt or die. A programmer in this era 2026-2030 and beyond who doesn't use agentic coding is like farmer who never uses tractors and does everything manually, or an accountant who insists on using mental math. They will be uncompetitive and left behind.

[-]

Imaginary-Unit-3267@reddit

If I remember correctly, there's actually some hunter-gatherers in Africa who have phones.

[-]

Ardalok@reddit

Competition has always been good for the market.

[-]

unjustifiably_angry@reddit

To a point. Eventually one company makes an objectively inferior product that they can sell much cheaper and the competitors are forced to follow suit, and then it repeats and repeats until you have the average American diet.

[-]

layer4down@reddit

The West knocks communism and even socialism but it seems to have its benefits considering we’re their number one customer.

[-]

InevitableMaw@reddit

This is literally capitalism at work. Different entities competing, leveraging their strengths. Meta was open sourcing models for these same reason the Chinese firms are.

[-]

layer4down@reddit

In theory, Google gets to sell compute as well for running Gemini on TPU’s. We least we see this in Azure and AWS for sure.

[-]

Awongy00@reddit

China is far more capitalist and consumerist than the West in many aspects.

[-]

dto_lurker@reddit

Are people who run their own AI models (me included) "normal"?

[-]

darkwalker247@reddit

the normal is using GPT for tasks that a sub-30b model could do easily with a fraction of the resources 😔

[-]

Outpost_Underground@reddit

Should be

[-]

Outrageous_Mail_8381@reddit

You do, until you don't when consolidation and profit seeking comes in. You could argue the same is currently happening in the EV space where there's tremendous competition atm in China with dozens of different brands fighting it out to build market share, it wont last forever.

[-]

KallistiTMP@reddit

And I mean because they legitimately do have an impending population crisis and a strong national interest in developing robust social infrastructure.

It really is largely just common fucking sense for research to be conducted openly, and supported by public funding and infrastructure.

The western model is so counterproductive to innovation and research that the west is sweating and barely keeping a 3 month lead, despite having something like 100 times the budget. And yes, I know that China has smuggled H100's and probably a fair deal of B200's and all that, but their entire countries' compute combined is smaller than what just Anthropic or Grok or Meta has.

Their power grid and water systems are worlds better too. And they don't have a mass anti-AI movement because their infrastructure is overbuilt for public and industrial needs instead of pillaged for private profit to the point that there isn't enough power and water for both the datacenter and the humans living next to it, and the public trusts that it will be used to improve people's lives.

Meanwhile the main thing the US is competing on is how many mass layoffs can be enabled, how effectively it can bomb third world countries, and how many ads they can cram into the damn thing.

China is doing what they've always done. They bet on the American capitalists being greedy and shortsighted. And they were absolutely right.

[-]

Foreign-Beginning-49@reddit

I guess it's not a win win then, thats what bothers me about it. Makes sense, it's low n key economic warfare. At least it benefits the poors like myself. There is a great sadness in realizing how far we are from truly universal access to intelligence on tap. It's like the French revolution when folks weren't allowed to become literate. Perhpas it's the agents themselves that will do a new one...

[-]

IamFondOfHugeBoobies@reddit

What you have to realize about China is that they aren't competing against you or the average man. Don't get me wrong, I'm not saying they CARE about you.

But they compete against western billionaires and politicians. In their view, empowering us is weakening them. Because in Chinese politics the idea of individualism is the worst poison to state power they can imagine.

Thus empowering the average citizen = Poisoning the state.

This has held true to some extent, just look at tik-tok, at how they helped dumb down America by subtly supporting any policy that made the rural poor less well off and educated since the Clinton era.

Now they view citizens having power, non-state controlled AI as another step in that. In their mind it makes us harder to control thus weakening the state.

The irony of course is that individualism will always trounce collectivism because in collectivism corruption and flaws become hidden and thus never worked on.

It's why Russia failed the initial Ukraine invasion so bad and yet now several years later, have a far, far more capable military. The flaws could be hidden under a strong state. But then need forced them into the open where they could be fixed.

What I'm saying is. Xi Jin Ping is as delusional as any other asshole and we should just be VERY happy it's benefiting us little folks. Xi is truly the greatest supporter of individualism and personal freedoms, even if it is entirely by accident and against his will.

[-]

MeowManMeow@reddit

I agreed with all your points except when you started saying that individualism prevents corruption from being hidden.

Authoritarianism promotes and hides corruption, and that can come out of a collective or individual system. I mean half the current US administration are conmen, rampant corruption. The army has never passed a budget, pardons for embezzlement and fraud everywhere.

I think what you are missing is that non-capitalist/imperialist countries have to have an authoritarian government to prevent the USA from exploiting any flaws to get a regime change. Look at all the countries USA has had a hand in overthrowing governments (and these are just ones we know about) by fanning any flaws into rebellions and into a coincidentally friendly with the US government.

[-]

bnolsen@reddit

He never startes prevents in that post. No system is perfect that has humans in it. Corruption will always exist in some form but the more centralized government is the more centralized and extreme the corruption can be and probably is.

[-]

MeowManMeow@reddit

So you are saying maybe a class-less and state-less society might be the best to prevent corruption as there is no central government?

I 100% agree with you that the amount of societies efforts are put into LLMs is a giant waste, which is why I don’t believe that a few men with huge amounts of money get to dictate what our economy is focused on.

[-]

IamFondOfHugeBoobies@reddit

I'm not sure how you could call Trump an individualist. The man idolizes North Korea and forces people to wear shoes that don't fit to stay in his good graces.

Also lol@"imperialist" country. Your privelige is showing buddy. The west isn't "imperalist" in isolation. It's just the imperalists that won imperialism against the eastern imperialists.

The fact that you are aware of all the dirt the u.s has done pre-Trump and you're discussing it is my point. Individualist countries are not DEVOID of bad things. But they critique, debate and argue over them.

[-]

MeowManMeow@reddit

I’m not calling Trump an individualist. I’m saying the USA is one of if not the most individualist country. In your comment you say that China and Russia is collective and that’s why they are corrupt, yet corruption is rampant in an individualist society.

I am aware of the corruption of Trump and discussing it, but we are also talking about Putin and the corruption of Russia engagement in Ukraine. Yet you say one is hidden and the other isn’t, but how can we be talking and discussing about it if it’s hidden?

In both there is no action against the corruption.

[-]

Karyo_Ten@reddit

https://gwern.net/complement

Joel Spolsky in 2002 identified a major pattern in technology business & economics: the pattern of “commoditizing your complement”, an alternative to vertical integration, where companies seek to secure a chokepoint or quasi-monopoly in products composed of many necessary & sufficient layers by dominating one layer while fostering so much competition in another layer above or below its layer that no competing monopolist can emerge, prices are driven down to marginal costs elsewhere in the stack, total price drops & increases demand, and the majority of the consumer surplus of the final product can be diverted to the quasi-monopolist. No matter how valuable the original may be and how much one could charge for it, it can be more valuable to make it free if it increases profits elsewhere. A classic example is the commodification of PC hardware by the Microsoft OS monopoly, to the detriment of IBM & benefit of MS.

This pattern explains many otherwise odd or apparently self-sabotaging ventures by large tech companies into apparently irrelevant fields, such as the high rate of releasing open-source contributions by many Internet companies or the intrusion of advertising companies into smartphone manufacturing & web browser development & statistical software & fiber-optic networks & municipal WiFi & radio spectrum auctions & DNS (Google): they are pre-emptive attempts to commodify another company elsewhere in the stack, or defenses against it being done to them.

[-]

Foreign-Beginning-49@reddit

Definitely needed this thank you. Yall are super informative.

[-]

IrisColt@reddit

or the intrusion of advertising companies into smartphone manufacturing & web browser development & statistical software & fiber-optic networks & municipal WiFi & radio spectrum auctions & DNS

I understand that reference, heh

[-]

mrgalacticpresident@reddit

Re-Read it! Great insight. Bonus that I absolutely love gwern. Thanks for sharing.

[-]

The_frozen_one@reddit

I don’t really buy that argument though, is Google engaged in economic warfare? Gemma4 is quite good too. I think it’s “commodify your compliment”, i.e. don’t let products that benefit your product become valuable. If you sell hamburgers, you want hamburger buns to be as cheap and plentiful as possible. If you run a search engine, you want browsers to be fast and free.

[-]

ZenaMeTepe@reddit

If you prevent US models ever becoming profitable, after they'll incinerated 1000s of billions?

[-]

dexterlemmer@reddit

The models are supposed to be the commodity. The US hyperscalers also want that mid- to long term.

AFAICT, the products are: 1. Training data. Importantly, this includes data about how previous iterations of your model was used. 2. Safety/Alignment for manipulating many users and their products via the models they use. 3. Inference datacenters that benifits from economy of scale and the robustness of highly distributed servers. 4. Supercomputers for training the next SOTA model to be privided first to giant players that want an edge and big research projects that have massive budgets and new, almost unsolvable problems to spare.

Alibaba (qwen) competes in 1--3. They would love to compete in 4. However, for the time being, they simply don't have access to either the technology or the expertise.

However, US SOTA expertise trickles down to more Western researchers fast and the breakthrough is much more expensive than future developments inspired by it. Once you've made a brealthrough on a hyperscalar supercomputer, it's at most a few months until a Western researcher does better with an Open Source model trained on a 1k+ times less powerful supercomputer. And it's to Alibaba's advantage if that Western researcher had used Qwen as base model rather than Gemma.

Thus, by Open Sourcing Qwen, Alibaba makes more money on 1--3 and China gets to be less far behing in cutting edge models for domestic use.

[-]

erkinalp@reddit

there's no undercutting, their product is both cheaper and tangibly better

[-]

InevitableMaw@reddit

Lol. If I could run either the best chinese model, or the best Claud/CPT locally, I would never touch a chinese model.

[-]

admnb@reddit

Aren't they basically scraping the big models? Like the reason the can create these capable models it because western models are created in the first place. They are riding that wave and will continue to do so, forcing the big western companies to keep running and to keep overextending.

[-]

InevitableMaw@reddit

Nothing China is doing is forcing western models to "overextend". They would be investing just as much if the Chinese models didn't exist, because they are primarily competing against each other.

[-]

vulgrin@reddit

China also has central government that is pushing the entire country to use AI, and wants it spread as far as it can.

[-]

Both_Opportunity5327@reddit

You also forgot to say that, Nvidia will do the same and maybe sponsor other labs, because it keeps the labs training models therefore buying thier equipment.

[-]

oxygen_addiction@reddit

The latest Qwen hits about 1/3 of the issues that Opus 4.6/Gpt 5.4 xhigh do on my benchmarks.

So there's a lot of room for growth.

[-]

_Erilaz@reddit

Aren't you comparing a mere 35B A3B to the biggest Claude and GPT models in their extra-tryhard reasoning modes?

[-]

Most-Trainer-8876@reddit

right? It's literally 35B A3B compared to Opus, which is probably 5T model... which is almost 150 times bigger!

[-]

power97992@reddit

Dude people compare it to the best , not to the worst or something bad like llama 4 scout

[-]

Most-Trainer-8876@reddit

What in world would you expect a bee to fly like eagle?
I get your point tho... I'll be waiting for that day, 5T Opus compressed into size of 35B. Probably never, instead we will be able to run bigger & smarter models in the same cost as 35B.

[-]

gpalmorejr@reddit

That was my thought Though studies have shown it isn't a 1:1 for intelligence versus size, we are still talking about a 35B A3B model versus models that are estimated at what 700B+ now? Isn't one of them estimated to be 1.5T? Like...... At that right a 33% success rate is like winning an F1 race 33% of the time with a Corolla. I don't care that it doesn't natch them, I'm impressed to compared at all!

I numbers, we are talking about a model that is 1 to 2 orders of magnitude smaller than the others but performs within 0.2 orders of magnitude. That's insanity.

And the yes the larger model have more latent knowledge and "facts" built in as well as their nuance and long context handling is a little better, but half of that can be leveled by giving Qwen3.5 a websearch plugin or other internet access, and the other part can be managed with some simple project and file management. Not to mention for people who can run the Q8 and F16 versions will probably never have a nuance and loop issues. I barely have them with Q2 and Q4.

TLDR: I'm a nerd. LLMs are cool. Qwen3.6 is impressive. I need better hardware.

[-]

power97992@reddit

I doubt it is 1/3 as good as gpt 5.4 / opus at everything… only for specific tasks… even 3.6 plus is quite lazy and outputs simple stuff if u dont specify much and tries to give the simplest answer as possible even with high reasoning and in the api… whereas opus and 5.4 have better outputs even without super specified prompts .. got 5.4 and opus have a much better understanding requirements and good outputs than 3.6 plus.. if 3.6 plus is better than 36b 35b then 35b is probably not great but enough for some tasks

[-]

gpalmorejr@reddit

Fair. Like you said, finishing the race 1/3 of the time does not mean you won. But for local models, it is really good.

[-]

power97992@reddit

Glm 5.1 is pretty good, minimax 2.7 is decent

[-]

gpalmorejr@reddit

Yeah but huge, unfortunately. They are closer to SOTA models in size than regular open models anyway. So that tacks.

[-]

QuinQuix@reddit

You mean it solved about a third of what the big boys can solve?

[-]

RedParaglider@reddit

If my toaster can't cook what my Traeger Pro can cook, then what's the fucking point. Fuck that toaster.

[-]

QuinQuix@reddit

I wasn't bashing?

[-]

RedParaglider@reddit

I know, I was backing up your comment with a stupid joke.

[-]

Borkato@reddit

Which is fucking insane.

[-]

JackPrince@reddit

With a free model. Iterate the issues with proper boundaries and you basically only pay for the iterations. It is a scaling issue from their on

[-]

QuinQuix@reddit

What would I need to install to run this setup?

I'm kind of struggling not so much with understanding what I'm doing but with limited time.

For example I'm running comfy but then you download the models and you find out you also need Workflows to go with them. All the workflows have their own nodes and models incorporated and downloading those from within comfy only works partially for nodes and not at all for models.

Googling the models will give you multiple hits and versions for each file and you need to check whether they are malicious or not. You can't really run universal Workflows easily because different models have different requirements.

Do you literally lose most time cobbling together the duct files and dependencies to get a Workflow running. Which isn't that much fun but I guess it is what it is.

If I wanted to vibe code something I'm assuming you need to install an agentic coding framework that can load the required models and you probably need to set up a Workflow as back end not entirely dissimilar to what you need to do with comfy?

I wouldn't mind the time investment if I had more time but I don't.

I find tinkering with the models and model settings fun. I don't find scraping together 25 dependencies only to see comfyui crash fun. Lol.

[-]

gpalmorejr@reddit

I'm like you. I understand computers but I do not want to spend my life in dependency hell.

I run LM studio (just download and run). I run Qwen3.5-35B-A3B.

I run RooCode extension on VSCodium.

Poin RooCode at LM Studio with a drop down menu. And TADA.

There is still tweaking involved but unlock a lot of CLI tools it is literally the click of a button or movement of a slider. Documentation for all these tools is good. Telling RooCode to use LM studio and your chosen model is literally a drop down menu (although careful here, if you have another model loaded and select a different one, RooCode will try to load the new one on top lol). Unfortunately you'll never get away from some configuration but this is definitely the easiest way for me.

[-]

QuinQuix@reddit

I feel you. Dependency hell really is hell.

I recently set up wsl2 to run personaplex.

My god.

Also it's dumb as a rock. So it was a bit of a letdown honestly.

[-]

gpalmorejr@reddit

I run Fedora 43 KDE and it has been a bit of a dream honestly. People get scare of Linux but I think even though it increases some dependencies, it make getting them easier. When I needed a bunch of depenedcies for a thing RooCode was building, it told me. I literally just type "sudo dnf install [whatever I thought it was called]" and it just installed it and all the dependencies for it. If I run a command that I don't have the tools or dependencies for, it'll just ask to install them automatically and then it runs the commands after, so it only takes a single click of the y button if I don't have what is required. And for many things, they are either included in the Appimage, Flatpak, Snap, or installed automatically when you click install in the package manager, which is set up like the appstore and stupid easy to use. And if it is not, it'll just tell you and ask to install them automatically again. Yes yes... It's a little bit of command line use, but like..... So little it is actually easier than finding the things you need online. And you'd be surprised how capable the CLI tools for this are. I'm not shilling, I'm just saying I actually have loved the switch because it makes things easier in unexpected ways. But in the ways that scare people. Lol. Yeah I have to use a terminal command but also, I used one commands and everything automatically found all the dependencies, installed only what I didn't have, update what I did if necessary, and so on. But most of the time I just double click and Icon like Windows or click install in the app store. And I can update EVERHTHING include system files, webbrowsers, libraries, etc with a single update button. Super nice.

Sorry that was a rambly.

Tldr; I have enjoy Linux for the exact reaso n people avoid it.

[-]

Due-Memory-6957@reddit

People said exactly that and then they came and released a 3.6. Why can't you people just stop being doomers for a second?

[-]

tecneeq@reddit

The model will never get worse. It will last as long as you find use in it.

[-]

swingbear@reddit

So, and this is just my opinion from recent news reports. The OW Chinese models distill their flagship models from the current frontier models by Anthropic etc. so they get close in performance, that’s why they are always close but not quite leading the pack. Qwen have become absolutely dominant in distilling that knowledge into their small-mid sized models.

So afaik, if they stopped the openweight race they wouldn’t have enough market share for the proprietary game.

[-]

AvidCyclist250@reddit

The giving will taper off. It's guaranteed.

[-]

Oren_Lester@reddit

its not for free, they are branding

[-]

Safe-Ad9662@reddit

Apache 2.0 ! Its free to use

[-]

Oren_Lester@reddit

Modern age Robin hood. these are future investments, it's not MIT or Apache license because they democratizing AI, it's open source because it's part of plan. These vast investments are not done for free. It's the way for the Chinese to build trust / brands. These models will stop being open source or commercially restricted once they will have the lead. But that's my opinion and I am probably wrong.

[-]

Safe-Ad9662@reddit

Modern Robin Hood? Please. I’m more like Friar Joo — a monk with a big belly from sitting in front of a console for too long, just enjoying the tech while it's here.

You should probably loosen that tinfoil hat of yours; it seems to be blocking all the rays, including the ones carrying common sense. I couldn't care less about your 'geopolitical theories' or your brand-building paranoia. While you’re busy overanalyzing the world from the depths of your own ego, I’m just here to use what’s available.

Peace ! beach !!

[-]

Oren_Lester@reddit

hehe "brand-building paranoia", "geopolitical theories", qwen3.6 is good. You got offended or your agent ?

[-]

Safe-Ad9662@reddit

Offended? Far from it. I'm just entertained by your textbook McCarthyism. Richard Nixon would be shedding a tear of joy seeing you in action — to you, every Asian line of code is a spy and every independent user is a 'compromised agent.'

It’s hilarious that you think I need an AI to call out your tinfoil-wrapped delusions. Some of us actually spend our time in the terminal building things, while you're stuck in a 1950s fever dream where 'brand-building' is a grand conspiracy.

If you spent half as much time learning how these models actually work as you do sniffing for 'propaganda,' you might actually contribute something useful. But hey, keep playing the paranoid sentinel. It’s a great look for someone who’s clearly terrified of a world that’s moving faster than his brain can parse.

Now go back to your basement, the Red Scare called and they want their script back

[-]

Oren_Lester@reddit

Not terrified, just not naive like you :), it's not about Chinese or not Chinese, but thinking that a frontier lab invests huge amount of money without a business plan is willful ignorance.

[-]

Safe-Ad9662@reddit

Oh, so we’ve moved from conspiracy theories to 'Economics 101'? Groundbreaking. Thanks for explaining that businesses want to make money — I truly had no idea while I was busy compiling their code for my own use.

You call it 'naive,' I call it pragmatism. I'm using the tool while it's sharp and the license is open. You’re standing in the corner crying about the manufacturer's long-term business plan while the rest of us are actually getting work done.

Stick to your 'not naive' philosophy if it makes you feel superior, but while you’re busy overthinking the 'why,' I’m busy with the 'how.' We are not the same. Have a nice life in your bunker

[-]

Oren_Lester@reddit

You are more than welcome , happy you understand there is a reason why it's open source and free

[-]

svantana@reddit

Commoditize your complement. Alibaba is not trying to pivot to LLM serving as their main business. The same goes for Amazon, Nvidia. Maybe some will start to do a 2-tier system like Google.

[-]

tecneeq@reddit

> Be Gemma

> Get casually beaten by Qwen just a few days after you delivered your sliding window magnum opus.

[-]

BitterProfessional7p@reddit

Partly it is explained by the fact that they jacked up the reasoning tokens 40%. It is more like a Qwen3.5-35B-A3B (xhigh)

[-]

Most-Trainer-8876@reddit

I noticed this as well, It thinks for way longer! But I think it's worth it, results speak for themselves.
Gemma 4 and Qwen 3.5 onwards, they are finally on such level that it can be used as your coding assistant who follows you nice & good when given enough information.

[-]

mike7seven@reddit

I have yet to test this theory but everyone says adding tools to the models Qwen3.6 and Gemma 4 significantly reduces the extensive thinking.

[-]

Most-Trainer-8876@reddit

Yes, way way smaller thinking when using with tools, literally one liner thinking. But apart from that, it thinks longer in general chat.

[-]

mike7seven@reddit

Tested it yesterday and I can confirm when tooling is present the thinking/reasoning drops significantly.

[-]

Zc5Gwu@reddit

It's still faster than 27b, even with all the tokens it's using.

[-]

CriticalCup6207@reddit

Can confirm. "Properly configured" is doing a lot of work in that title. We ran the same evals with default config vs. tuned context window + rope scaling and the gap was significant. The model is genuinely better but you're leaving a lot on the table with out-of-the-box settings. What did your config look like for the jump you saw?

[-]

onil_gova@reddit (OP)

using recommended settings from the model card.

Thinking mode for precise coding tasks (e.g. WebDev): temperature=0.6, top_p=0.95, top_k=20, min_p=0.0, presence_penalty=0.0, repetition_penalty=1.0

but specifically calling awareness to preserve thinking flag being a requirement now.

I'm interested in your findings and settings, care to share?

[-]

sleepy_quant@reddit

Huge upgrade over the 2.5 32B 8Q. I've got a similar setup but my 3.6 tuning is still a mess lol. Any chance you could drop your config? Specifically interested in how you're stopping the hallucinations/looping during long coding sessions

[-]

power97992@reddit

2.5 came out like around 19-20 months ago …

[-]

sleepy_quant@reddit

Tested 3 dense and it’s just not stable compared to 2.5 for my stack. Getting a lot of leaked thinking and hallucinations. I might need to rework my system prompts, but right now it’s definitely not hitting the sweetspot

[-]

_risho_@reddit

using lm studio on a macbook when send a message to this qwen model it just spins its wheels for thousands of tokens (sometimes infinitely) and then finally responds after multiple minutes. the thinking is coherent it's just very redundant and seems less than necessary. is this expected or does lm studio need to push out an update to use it properly?

[-]

onil_gova@reddit (OP)

for your use case, Gemma might be better. Ths is an agentic-heavy model, thr think gets way more focused once you put it into an agentic harness.

[-]

KubeCommander@reddit

What harness are you running it in? I was considering something like opencode which does pretty well with qwen-next coder

[-]

BumblebeeParty6389@reddit

Is 3.5 27B and 3.6 35B really on par with DeepSeek V3.2?

[-]

pigeon57434@reddit

on strictly hard stem stuff yes but on basically anything else no way

[-]

ortegaalfredo@reddit

Quite obviously not if you do some small tests.

[-]

Healthy-Nebula-3603@reddit

Yes

DS 3.2 is very old

[-]

BumblebeeParty6389@reddit

It came out 5 months ago 😭

[-]

Borkato@reddit

I love this hobby so much

[-]

petuman@reddit

Yeah, but it's more or less modification of V3 from Dec 2024, not new pretraining run from ground up.

[-]

Healthy-Nebula-3603@reddit

Yes it is old :)

[-]

Faktafabriken@reddit

”Very old”

AI is moving FAST!

[-]

oxygen_addiction@reddit

On some tasks.

[-]

Iory1998@reddit

I can't wait for the 27B!

[-]

cafedude@reddit

and coder-80B

[-]

Steus_au@reddit

yesh this is why we are all waiting 122b as it could put sonnet to the tears

[-]

cafedude@reddit

I'd like to see a 3.6-coder (80B).

[-]

onil_gova@reddit (OP)

I honestly can't wait for a Sonnet-quality model on my laptop. We'll be able to protect ourselves against the enshittification of Frontier model subscription plans with their bipolar rate limits.

[-]

vex_humanssucks@reddit

The context caching piece is what makes this feel different. Previous generations had to re-feed context constantly which tanked throughput -- having the KV cache actually stick means sustained multi-turn performance is finally usable at local scale.

[-]

AICyberPro@reddit

Running Qwen3.6 on a 3090 (24GB) via llama.cpp native binary, the performance jump is real even without an M-series Max. Getting \~100 tok/s on short prompts, \~80 on long ones. The catch is configuration:

- --mmproj is mandatory for 3.6 (vision model, Ollama doesn't ship it)
- Rope encoding changed to 4-element sections ΓÇö breaks every prebuilt Docker image, need to build from source
- CUDA 13.2 produces gibberish output (NVIDIA working on a fix)
- KV cache q8_0 is the difference between fitting 65k context or OOM

Compared to Qwen3.5 on the same card: 3.6 is \~30% slower at peak (101 vs 142 tok/s) but noticeably better at structured coding and reasoning tasks. Paying a speed tax for capability, which I think is worth it.

Full benchmark breakdown, config files, and the Makefile workflow I use daily: github.com/aminrj/local-llm-ops

Curious if anyone's also seeing the CUDA 13.2 gibberish issue or if it's isolated.

[-]

ObjectiveOctopus2@reddit

Artificial analysis benchmarks are artificial

[-]

ResidentPositive4122@reddit

This sub when a new SotA jumps on artificial analysis - "this is the worst benchmark possible, stupid number goes up, they don't test emotional erp uncensored uniqueness, reeeeeeee".

This sub when a new open model jumps on artificial analysis - "this is the one!!!111"

Rinse and repeat. Dazed and confused.

[-]

DeProgrammer99@reddit

This just in. Different people have different opinions. More at 11.

[-]

draconic_tongue@reddit

this "different people" cope ignores how reddit works. consensus functions the same even if people are different. it's not different people, it's the same fucking people because the same people use the website

[-]

InevitableMaw@reddit

There is some effect where if an opinion takes over a thread, people with the opposite opinion don't engage as much, which can occasionally produce counter hive minds, but yeah, usually the hive mind will exert itself.

[-]

Borkato@reddit

Yeah, they’re goomba fallacying

[-]

AvidCyclist250@reddit

TIL. Finally, a word for that.

[-]

relmny@reddit

Even if that were true, this is a Local LLM sub...

[-]

randylush@reddit

It is absolutely true. You’re right that this is the most likely sub to have bias, but still, the bias is strong

[-]

FinBenton@reddit

Normally when a model jumps at that test its really whatever, ass test, benchmaxxed model etc BUT 3.6 is actually huge jump in local llms so its a big deal regardless what the bench says.

[-]

NoAge5252@reddit

Hi op, could you please provide your full setup as I have the same machine and I am trying to run Qwen 3.6 with herme-agent with qwen running on omlx, and I am facing empty tool call errors. I have updated the prederve thinking to be on. Also read that tgis might be due to qeen thinking inside the blovks and not actually making a tool call outside, and to address that update the hermes config to update model prompt to end the thinking block before making a tool call. Buy haven't seen success so far.

[-]

onil_gova@reddit (OP)

Omlx doesn't have support for preserve thinking yet. Still waiting on my pull request to get merged https://github.com/jundot/omlx/pull/814

[-]

Dependent-Aardvark32@reddit

Which GPU best for local using ? A100 ? or Is RTX4090 enough ? Does anyone experience ?

[-]

Embarrassed_Adagio28@reddit

It really is the first fast local model i trust with coding. I get 75 tokens per second with q5 on dual 16gb v100's.

[-]

GrungeWerX@reddit

Hmmm. I’ll be testing if it’s actually better than Qwen 3.5 27B this weekend.

[-]

trycatch1@reddit

So far in my experience 27B is much better at Q4. 3.6 35B A3B is almost 4x faster than 27B on my hardware in t/s, but it wastes so many tokens that in the end 27B is faster anyway at getting work done. And 27B is more stable, loops less, so it also wastes fewer my brain tokens.

[-]

DOAMOD@reddit

In my tests, 3.6 A3 rivals 3.5 27, but it's about 15 seconds slower by problem. for me its crazy, I cant wait for 27b...

[-]

planemsg@reddit

Same is happening on my end. Seeing this alot on other comments as well.

[-]

redballooon@reddit

Please do, and report back. Whenever there's a new Qwen release this sub is flooded with posts how this one is the best thing that ever happened to the world, and by a large margin.

That makes me think Qwen is even better at playing social media than on building foundation models.

[-]

Borkato@reddit

I’m a real human (lol) and I only talk about qwen being good because it actually is. I understand that there’s an urge to assume it’s all just bots but sometimes the answer to “why is everyone talking about this” is actually “because it’s good”

[-]

relmny@reddit

Ir happens exactly the same with gemma...

[-]

redballooon@reddit

Deepseek didn't play that game too well after its initial splash.

[-]

soyalemujica@reddit

I personally have tested it, and it's at Qwen-Coder-Next level\~, I'd say also at 27B dense in coding capabilities, although, Q4 sometimes fails with toolcalls.

[-]

SkyFeistyLlama8@reddit

Sweet finding. I still keep Coder Next 80B around for more detailed analysis and refactoring but I can barely run it because of the size. Qwen 3.5 35B could handle maybe 80% of what I used Next 80B for. If 3.6 35B can do 95%, then I might get rid of that old behemoth.

[-]

riceinmybelly@reddit

Old? Haha yeah we’re not getting any sleep

[-]

Still-Wafer1384@reddit

Sorry to ask slightly off topic, how do you rate QCN vs 27B?

[-]

soyalemujica@reddit

27B is stronger than QCN for complex reasoning. QCN is good for not so complex coding.

[-]

Mayank-eagerwithAI@reddit

The preserve_thinking flag is critical—without it you're basically running a lobotomized version that skips the chain-of-thought reasoning that makes 3.6 competitive. I've seen similar gaps with other reasoning models where the default inference settings strip out the internal monologue. For anyone on Mac Silicon, the oMLX + Pi.dev combo is solid, but watch your context window utilization—8bit quant at 3K prompt processing can start thrashing memory bandwidth past ~24K tokens depending on your batch size.

[-]

createthiscom@reddit

It's interesting that kimi k2.5 is listed above qwen 3.5 397b. qwen got a slightly higher score on the aider polyglot. I should probably download both.

[-]

No_Fee_2726@reddit

[-]

_hephaestus@reddit

Which quant for oMLX? Just the mlx community, something from qwen or did you make your own?

[-]

onil_gova@reddit (OP)

yeah, mlx community at 8bit

[-]

julianmatos@reddit

Can confirm, the jump from 3.2 to 3.6 is noticeable. I've been using it for code review and doc summarization tasks that used to feel like a stretch for local models.

If anyone's wondering whether their setup can handle it before committing to the download, localllm.run is handy for checking hardware compatibility with specific models and quant levels.

[-]

Tigew@reddit

I’ve been running this on a 2070 and it’s been insane.

[-]

Big_Actuator3772@reddit

lol

[-]

jimmytoan@reddit

The preserve_thinking flag being required to unlock the real capability is something a lot of benchmarks are missing - people compare apples to oranges and then wonder why results are inconsistent. Running it with oMLX + Pi.dev sounds smooth on the M5 Max, what's the context window you're hitting before it starts degrading?

[-]

onil_gova@reddit (OP)

I was still somewhat useful past 200k, but definitely started noticing the context rot.

[-]

dionisioalcaraz@reddit

is that flag only for agentic use cases?

[-]

Fit-Palpitation-7427@reddit

Does it run on a 24Gb 4090?

[-]

onil_gova@reddit (OP)

yes, at Q4, check out the unsloth quantity size breakdown.

[-]

myreala@reddit

Any ideas how I make it to stop giving up? I'm using it with open code and I keep having to prompt continue before it starts again for a few seconds and then gives up again.

[-]

an0maly33@reddit

I had this problem with gem4 but qwen3.6 hasn't done it yet for me. I use pi primarily. Maybe a difference in the harness?

[-]

onil_gova@reddit (OP)

yeah, same. But in case anyone is running auto-research loops and doesn't want to enter "continue" every time, here is a Pi extension I wrote just for that

[-]

ortegaalfredo@reddit

It's a great model, but no way in hell is better than Deepseek V2, and not even at the level of Qwen 27B

[-]

onil_gova@reddit (OP)

Check out the results breakdown. It's not a sweep across the board. It does beat this model on HLE, for instance.

[-]

BustyMeow@reddit

That's average, meaning that some are better or worse.

[-]

DOAMOD@reddit

Those of us who actually use the model and aren't just talking nonsense, said so from day one, and people saying this is just benchmarxx.

[-]

StardockEngineer@reddit

27B is in the chart twice?

[-]

Economy_Cabinet_7719@reddit

Thinking on and off.

[-]

StardockEngineer@reddit

Ah. I didn't catch that from my phone. Thanks.

[-]

Bobylein@reddit

Yea just that preserve_thinking does nothing for me in llama.cpp

[-]

epicycle@reddit

Did you share your settings somewhere for this? I’m setting up mine to code and interested in folks configs.

[-]

korino11@reddit

3.6 same shit as 3.5 .It much worst than even deepseek. qwen from whole series cannot remeber his own context memory. What about projects and rooles. He cannot o anythyng serious at all. garbage..

[-]

an0maly33@reddit

Been using it for a few days and it's has been far above anything else I've used for agentic work. You can't just use defaults. Unsloth has the correct settings posted for it.

[-]

korino11@reddit

I used it a LOT. and know what? When qwen write -all done. When he made 3 times recheck in project and write me -all is 100% correct. Gpt found TONS of errors an non complited Concept at all. it was done maximum at 30% And that not once! That was at every time..

[-]

korino11@reddit

Also. it CANNOT remeber his OWN context. He begins to make mistake on 25% of context windows. He start to forget about Concept, tasks....

[-]

balerion20@reddit

We have a100 80GB and currently using qwen3.5 27b with bf16 and 262k context for coding purposes. It is good but kinda slow. Considering trying out fp8 version of 3.6 35b with same context, does anyone tried out and have any comments

[-]

tmvr@reddit

Well, you have the hardware there, I guess you could go ahead, try it out and tell us what you found? ;)

[-]

balerion20@reddit

I will definitely try next week but since we were working on a project I didnt wanna pull the plug from the qwen 27b since there are people working. I was just wondering if some people had a chance to compare but I guess I am in the wrong since it got downvoted lol

[-]

tmvr@reddit

It was a joke, I don't know why you are getting downvoted either...

[-]

balerion20@reddit

Weird really, maybe people thought I am flexing with my company’s hardware…

[-]

q5sys@reddit

That's most likely it. There's a lot of jealousy on this sub, and if you have a 90 class or higher card there's a bunch of people that will pounce on you.
Sometimes this sub seems to be almost entirely split between people with <=12GB cards... people with 90/enterprise cards... people that converted an old 8x card crypto mining rig into an LLM rig.

[-]

kmp11@reddit

It crazy that 12mo ago, Qwen2.5 was all the rage and that agents were essentially impossible with that model.

[-]

bannert1337@reddit

With this jump from Qwen3.5 35B A3B to Qwen 3.6 35B A3B I would love to see Qwen3.6 27B. It probably would be even better.

[-]

Thunderstarer@reddit

Are we getting a dense 3.6?

[-]

cosimoiaia@reddit

I tested it over the week after it got gguf'd. It handled pretty much every task in my workflow (analyze features on project, create issues on GH, pick up issues, work on the fixes, run/create tests, open PRs) also solved a truckload of problems I was having on a complex project that even gpt5 was looping for, in one session. I have to say, I'm pretty impressed. It's a great and fast model, I only wish it was European so I would feel icky when using it.

[-]

Borkato@reddit

Didn’t it get released like yesterday??

[-]

cosimoiaia@reddit

Wednesday iirc, yesterday they updated it, I still haven't downloaded that one.

[-]

planetearth80@reddit

does preserve_thinking work with Ollama?

[-]

BrianJThomas@reddit

I tried with Claude code and got hundreds of thousands of tokens generated for a medium size coding task. Is that normal for this model? It generates like 20x the tokens of Gemma 4 for me.

[-]

oxygen_addiction@reddit

Opencode or pi-coding-agent

Claude Code poisons non-Anthropic models by default

[-]

BrianJThomas@reddit

I read the opencode prompts a while back and they were full of garbage instructions trying to tune model behavior. Is it better now?

[-]

oxygen_addiction@reddit

You can customize them. I've switched to pi + claude inspired system prompts from the leak

[-]

MaCl0wSt@reddit

how's Pi? been hearing the name recently. I've only used OpenCode when it comes to local LLM coding agents

[-]

rpkarma@reddit

It’s great because it starts empty, basically. Just a couple of tools for file read, write, edit, and bash. Barely anything in the base prompt. Configure it yourself and build what you need :)

[-]

am2549@reddit

What do you think of Goose? And where did you get the system prompts, made them yourself?

[-]

SmartCustard9944@reddit

No, OpenCode is still bad, it doesn’t structure the context as well as Claude Code.

Also, the context poisoning is just an unsubstantiated claim. For me Claude Code works really well with Gemma 4.

[-]

tecneeq@reddit

Claude context poisoning is a myth, perpetrated by the open weights kabal, to cripple the free markets?

[-]

h310dOr@reddit

I personally used qwen to rewrite them ... Much better..kinda weird though that they would be so spammy and full of repetitions

[-]

BrianJThomas@reddit

Yeah I actually rewrote it a while back. I’ll have to revisit. Thanks for the info.

[-]

Western_Objective209@reddit

claude code is the only one that has a team that runs evals on their prompts. there's no poisoning going on, it's just not tuned for other models

[-]

Kodix@reddit

That's been my experience so far, in limited tests. The results are *more reliable* than Gemma 4 (which often requires secondary bugfixing passes), but each task takes a longer time due to the reasoning.

[-]

silentsnake@reddit

Turn on preserve_thinking otherwise it will yap non stop every round

[-]

t4a8945@reddit

So on this graph, Qwen 3.5 35B-A3B is better than Qwen 3.5 122B-A10B.

Yeah that invalidates anything else.

[-]

BustyMeow@reddit

Not Qwen3.6-35B-A3B (Thinking)?

[-]

t4a8945@reddit

Oh yes, my bad, you're right. Didn't see the lightbulb icons.

[-]

whyyoudidit@reddit

which cloud is offering this for cheap?

[-]

Ell2509@reddit

Is minimax m2.7 not on there?

[-]

WithoutReason1729@reddit

Your post is getting popular and we just featured it on our Discord! Come check it out!

You've also been given a special flair for your contribution. We appreciate your post!

I am a bot and this action was performed automatically.

[-]

hoschidude@reddit

Qwen 3.5 27B is still ways better. No idea what this benchmark is saying

[-]

Technical-Earth-3254@reddit

These insane benchmark jumps for .1 version increments are counter-productive in the long run. Expectations are going up and while the models are good, they can't keep up with what people expect from them.

[-]

KaMaFour@reddit

"models improving is bad actually"

[-]

Thedudely1@reddit

It really is a good model based on my limited tests so far. Using Unsloth's Q3_K_XL. It can't compete with DS 3.2 in terms of raw breadth of knowledge and facts, but it is great at following instructions and writing a ray casting engine in a niche Java derivative, which 3.5 could not do reliably in my experience. It is defenitely a significant improvement over 3.5 no doubt. But it's also still a 35b MoE model. It is very close to the dense 27b 3.5 model.

[-]

JohnMason6504@reddit

Can confirm, running 3.6 8bit on a much more modest box, single 4090 48GB mod with 64GB DDR5, and the jump on code tasks is real. Where 3.5 would start looping on a refactor around 6K context, 3.6 holds discipline past 16K in my logs. preserve_thinking is not optional, turning it off costs about 8 points on HumanEval-plus internally. Also worth flagging for people on Pi.dev style setups, the MLX 8bit path on M-series is different from GGUF Q8_0 on llama.cpp, the MLX one gives you cleaner quantization for thinking tokens specifically. If you are on NVIDIA, use AWQ 8bit through vLLM, not GGUF Q8. The quality floor is meaningfully different.

[-]

port888@reddit

In LM Studio, I've been getting "Error rendering prompt with jinja template: "Unknown StringValue filter: safe"." whenever I use any of the Qwen 3.6 models. The fix is to remove | safe from the prompt template jinjja, usually at line 122. it's been perfect ever since.

Reference: https://ianlpaterson.com/blog/lm-studio-fix-cannot-truncate-prompt-n-keep-n-ctx/

[-]

Be Qwen Release new medium-sized model that competes with previous flagship Repeat