qwen3.6 performance jump is real, just make sure you have it properly configured
Posted by onil_gova@reddit | LocalLLaMA | View on Reddit | 300 comments
I've been running workloads that I typically only trust Opus and Codex with, and I can confirm 3.6 is really capable. Of course, it's not at the level of those models, but it's definitely crossing the barrier of usefulness, plus the speed is amazing running this on an M5 Max 128B 3K PP 100 TG.
Just ensure you have `preserve_thinking` turned on. Check out details [here](https://www.reddit.com/r/LocalLLaMA/s/oy3jLNbSkB).
Writer_IT@reddit
Is It really Better than the 122b? This seems so over the top "too good to be true" to feel unrealistic
silentsnake@reddit
3.6 35B FP8 > 3.5 122B INT4
Writer_IT@reddit
Well, that's a First. I still remember the rule of thumb from llama times, to always choose the bigger model at lower quantization, especially at q4-ish Probably for moe architetture and tool call that Is not anymore the principle
gxcreator@reddit
It's not that simple, we don't have larger 3.6 model yet, so it is OLD Q4 vs NEW FP8
floconildo@reddit
I'm not sure 35B FP8 > 122B INT4 holds true. I think mileage may vary, especially if we're talking complex reasoning, multi turn usage and tool usage.
Personally I went back to 3.5 122B (Q6 though) because 3.6 35B straight up got facts mixed up during a research round. Can't wait for 3.6 122B to release though.
Writer_IT@reddit
Ok, After through testing while vibe coding, i feel the 3.6 fp8 does NOT win against the 122b nvfp4. It's struggling ti be efficient in developing and following a Plan, compared to the order brother Still a good model for the size, but doesn't turn the general rule about having more parameters
mycall@reddit
Would you say that Qwen3-Coder-Next Q8_0 would do better than 3.6 for coding?
Realistic-Elephant-6@reddit
Unfortunately, no, not in my experience. (Qwen3-Coder-Next is on par with 122B-int4 on consistency and code quality, and beats it on speed.)
mycall@reddit
Now we are both waiting for 3.6-coder
Writer_IT@reddit
Honestly, didn't use it enough to compare. Q3.5-122b Is the first local model that i used to heavy vibe coding. However, from this thread alone, i feel these matters are subjective enough that you should try it yourself and see how It feels, if anything for the multimodale support
Realistic-Elephant-6@reddit
Can not confirm on vLLM. 35B keeps making mistakes that neither qwen3-coder-next nor 122B int4 Autoround would ever make... Unless you have some magic sauce -- care to share it? I am really trying to like the 35B model since it would fit nicely with all the other crap into my RAM on the GX10.
stefan_evm@reddit
can 200% confirm. The 122B model degrades more under INT4 quantization than the 35B model does under FP8. 3.6 35B is much much better
AlwaysLateToThaParty@reddit
Yeah nah. I use the qwen3.5 122b/10a heretic mxfp4_MOE quant, and the full quantization of qwen3.6 A35/3a simply isn't as consistent.
gpalmorejr@reddit
The 3.6-35B-A3B seems to be EXTREMELY resistant to quantization losses in general. I use a 2 bit quant and it seems to be doing quite well. Whatever they did seems to be working. I'll take it.
power97992@reddit
What q2 is okay? Maybe it’s worth a try
ea_man@reddit
Oh yeah, and guess what? It's dam fast and stays in \~10GB so you can even use it for autocomp.
power97992@reddit
Uh, it says 12.49gb in lm studio
ea_man@reddit
Don't do that bro, Qwen3.6-35B-A3B-UD-IQ2_XXS.gguf
10.8 GB
power97992@reddit
I didnt find in lm studio, maybe i can find it in hugginface. 10.8 is still too big… i need like 10gb
ea_man@reddit
https://huggingface.co/unsloth/Qwen3.6-35B-A3B-GGUF , that's how small a 2IQ gets.
You have a 12GB GPU? Maybe you should find ways to not waste VRAM with your desktop.
power97992@reddit
Less than that…. I have like 10.5 max
ea_man@reddit
Well stick to models like Omnicode / 9B then if you want to load them in 10.5GB.
gpalmorejr@reddit
For megabeginners I usually recommend just using the built in download menu, especially if you aren't hardware constrained or tweaking things a lot.
But if you are hardware constrained or needs specific updates or are tweaking or just generally a little tech savvy and able to work with files:
I always get the ones from Hugging Face directly. LM studio does usually have the updated versions or downloads them from HF anyway, but for some reason they are always bigger. Part of it is that the LM Studio download usually includes the image embbeder (mmproj), but the size difference is too big to be accounted for by that alone usually. Unsloths pages are well laid out on Hugging Face and make everything easy to pars. And you can look at the download links on the card all in one place.
You will have to go in and add the folder yourself, but just go to where your LM Studio instance stores everything, find the "Models" folder, Add a folder for the "group", such a where it came from, like "Hugging Face", or if you are like me, "Unsloth" for the creator, then inside that folder make a folder named exactly the same as the model file (although there is room to change it and the file name, some workflows care, LM Studio doesn't). Then place your model file and mmproj in that folder. LMStudio will find it automatically. You don't even have to restart it. Just open the load menu again.
If you download multiple quants, you can put them in the same folder and LM Studio will automatically organize your model menu accordingly and let you select a separate quant there. LM Studio generally (Although I have experienced some changes in this behavior) groups them under the folder name and will call them whatever you name the folder, in the interface.
Also, the LM Studio Community quants are sometimes behind Unsloth for updates to theodels and such if that happens. Although, they are usually pretty close. But the Unsloth qunat do usually perform much better for a give file/RAM size.
gpalmorejr@reddit
I suspect if you push it that'll get weird but so far it is basically on parity with 3.5 Q4_K_M. I keep 3.6 Q4_K_M just in case I was some extra assurance, but it seems to be working so far. Plus it gives me way more room for context cache build ups from RooCode since it can be a little lazy to clear memory sometimes, which has actually resulted in smoother flow overall.
KallistiTMP@reddit
Wonder if it's coherent enough to run as a draft model for spec decoding @ Q2 or even Q1 alongside the FP8 model.
gpalmorejr@reddit
As a drafting model? Seems huge for that. But then again, outside of my ability to test on my hardware.
power97992@reddit
Yeah i wished i had more ram ….
gpalmorejr@reddit
I feel that lol. What are you at? What's your setup?
power97992@reddit
I dont have much vram on my laptop, less than 11GB.
gpalmorejr@reddit
Oof. That's tight.
power97992@reddit
q2 K is too large, maybe q2 XSS will work or q1.58
gpalmorejr@reddit
Or you can offle MOE to the CPU/RAM. It'll be slower but it'll be fine. Dedicated GPU or integrated?
power97992@reddit
Dude, i know u can offload it to normal RAM, but my system has unified ram, there is nowhere to offload it...
gpalmorejr@reddit
Oh. You didn't say that before. You said 11GB of VRAM. If you are on unified, then I believe that tiu don't actually have to offload at all. There is some other methods for allocating compute. But yeah, that clarification makes a big difference.
power97992@reddit
My total free ram is less than 11gb, there is no way to increase that, since i need to run my OS..
gpalmorejr@reddit
I know.... I agree with you.... You'll probably have to use a slightly smaller model unfortunately, but the cool part is that those unified Apple devices are pretty good at Model loading. They aren't as good as GPU only compute but they are pretty good.
power97992@reddit
I will just use the API, i tried qwen 3.5 9b wasn't great.
gpalmorejr@reddit
Yeah 9B is good as long as your expectations aren't like full enterprise coding and long context logic. And since it isn't MoE @ A3B and instead dense @9B it is slower as well. I feel you though. That is why I run my model on a large computer at home and use it remotely from my laptop wherever I'm at.
gpalmorejr@reddit
I tired to run IQ2xxs and it wouldn't load on 16GB. Too tight with the OS in RAM.
mrrizzle@reddit
Unsloth?
gpalmorejr@reddit
Yup
Cold_Tree190@reddit
How much vram would you need to run it in fp8 do you know?
Wise-Hunt7815@reddit
256K context, about 48G
whoisraiden@reddit
If you want it all in VRAM, at least 40 GB for good context size.
romasalah@reddit
I was exactly wondering about that, I'm running around 70GB VRAM and only 3.5-122B at Q3 fits in them. so wondering if moving to 3.6-35B would be better
is-this-a-nick@reddit
how does it compare to 397 Q3?
Goldandsilverape99@reddit
The Qwen3.6 35BA3B has lost maybe a very little in "intelligence", but has improved on some of the antigenic and tool calling benches.
Realistic-Elephant-6@reddit
That's only because the tool calling template has finally been fixed. With the correct tool calling template the 122B is significantly better (just a bit slower due to a larger active core)
Single_Ring4886@reddit
What about pure programming without agents?
Karyo_Ten@reddit
But you need agents to write tests and run them and read docs
ShadyShroomz@reddit
2 years ago I was asking chatgpt to generate individual functions for me and then i'd copy paste them into my code.
you dont need agents...
Karyo_Ten@reddit
That's such a waste of time though.
Alarming_Pick626@reddit
Not really, this is probably the best way to use these tools. Even the best models from anthropic starts to fail on real projects.
Karyo_Ten@reddit
I don't see how being a glorified Ctrl+C Ctrl+V monkey is the best way to use anything.
Shiny-Squirtle@reddit
...and context. When you ask it to refactor a previously generated function and generates the whole thing again
AI_Enhancer@reddit
Not saying this is the best possible workflow, but just saying that for this specific problem, you can just remove messages and they get cleared from context. In LM Studio at least. So if the model is just advancing or fixing the same code, you can just delete these messages and then paste the new fixed code back into the convo if needed.
decoy_258@reddit
what is the correct modern alternative, assuming local inference?
BannedGoNext@reddit
You don't need agents if YOU are the agent ;). In Soviet Russia agent is you!
unjustifiably_angry@reddit
This is a criminally slow workflow.
Great_Guidance_8448@reddit
That's stone age stuff. With agents you have it analyze your code, identify potential bugs, suggest refactors, etc. You are really limiting yourself with the "individual functions" generation.
thewhzrd@reddit
This is the way.
Realistic-Elephant-6@reddit
In my experience, it's not. It is significantly dumber and makes a lot of mistakes that 122B doesn't make.
AlwaysLateToThaParty@reddit
No, it isn't.
Icy_Butterscotch6661@reddit
I've been using it with hermes agent and getting pretty frustrated at it. It keeps doing extra shit I didn't ask for despite instructions telling otherwise. For example, I asked it to look up how to install opencode and it went ahead and installed it instead of answering it. I think the 27b / carnice-27b fine-tune did not have this issue but I'm not certain anymore
leap966@reddit
Haha 😄 Now I need to ask Qwen3.6 how to install OpenCode.
McSendo@reddit
That's my concern about those agent harnesses. Don't search for porn, seaches for porn.
howardhus@reddit
yep.. feels like the usual hype that youtubers need to get clicks
[thumbnail of shocked face] "You have to try this new model RIGHT NOW! (but lets drink some cofee first)"
Possible-Pirate9097@reddit
Network Chuck?
Ka_Trewq@reddit
He is for a while. And also, is praying. I'm not kidding.
tmvr@reddit
He's really leaning into every grift that is current isn't he :))
Savantskie1@reddit
I think he started most of them
mybruhhh@reddit
Models trained with quality data have been shown to outperform models 10x their size time and time again
Refefer@reddit
I've run both through our new product to evaluate as a replacement to the 122b model at UD FP4 quants. It is surprisingly good for a small model but isn't as good or efficient at agentic tasks in my tests than the 122b. Still, plan to use it for tasks which are a bit narrower/higher constraints and do not require a lot of world knowledge.
Ranmark@reddit
I testes it on nicklothian's bench a few times. One time it's actually went over the dense 27b model and got the same result as a 122b MoE. But I wasn't able to recreate this at least once. 27b and 122b is much more stable in that regard.
zyxwvu54321@reddit
why is the 27B listed twice? And I am not getting any better results than 3.5 35B in my limited testing.
Realistic-Elephant-6@reddit
Yeah, this model is very much in "YMMV" range for me. Might be the programming languages we use 🤷♂️
zyxwvu54321@reddit
In my more testings, it is better at vision than 3.5 35B. Other than that, its still that that much noticable difference.
KaMaFour@reddit
Reasoning vs no reasoning
kuhunaxeyive@reddit
Qwen3.6 is good for programming yes, but not so good at writing natural, concise text. It in part inserts weird phrases and creates convoluted sentences even at Q8. For texts, Gemma-4-31B has a much more high level phrasing that I can trust for European languages.
Also, Qwen3.6 doesn't pass the car washing test reliably. Gemma-4 nails it everytime in seconds and even in non-thinking at Q5.
Salt-Willingness-513@reddit
Gemma 4 is amazing for swiss german down to e4b. Crazy how good gemma 4 is in european languages. Never saw anything compareable apart from Claude and Gemini.
Realistic-Elephant-6@reddit
Can it transcribe Schwitzerdytsch from audio though?
Salt-Willingness-513@reddit
It indeed can
Realistic-Elephant-6@reddit
Bärndytsch oo? ;)
Salt-Willingness-513@reddit
züridütsch aber das hets nöd verstande haha
ayylmaonade@reddit
I've found it wonderful to talk to, even better than 3.5 which I already liked because it reminded me of how Claude talks. But then again, I do have an instruction in my system prompt to use natural language. Might be worth trying:
Realistic-Elephant-6@reddit
It really does sound a lot like Sonnet, right? Even Sonnet says so, LOL
ayylmaonade@reddit
Lol yeah, it's almost indistinguishable for me.
Realistic-Elephant-6@reddit
Gemma-4-31B is a dense model. You are comparing a pears to bananas here.
Queasy-Contract9753@reddit
I've experienced the same. Much of my work is feeding in very long journeys and discussing them. And summarisation. This model struggles very hard with train of consciousness style writing. Normally my go to is Deepseek or Gemini flash but I have found Gemma 4 31b up to it. 26ba4 isn't bad either but misses minor points more often.
-Ellary-@reddit
Yeah, Gemma 4 is a general model, for any kind of activity.
Qwen 3.6 is a pure agentic model, there is no point to "talk" to it.
Southern-Expert22@reddit
Bro, it blitzing codes at like 40t/s with tools, search, scripting, etc..., I put at at like sonnet 4.6 lvl or like gpt 5.3, opus is brain dead atm but hopefully it comes back.
Realistic-Elephant-6@reddit
Nah, Sonnet *4.6* is still quite a bit better than 35b (Though Sonnet 3.5 isn't, which is wild. We are getting a tiny, free multimodal local MoE model, delivering the quality that was SOTA a year ago...) . But 3.5-122B-A10B is very much in the same range as the new Sonnet, which is why I can barely wait for 3.6-122B...
CooperDK@reddit
That is a very small jump though.
MushroomGecko@reddit
> Be Qwen > Release new medium-sized model that competes with previous flagship > Repeat
Foreign-Beginning-49@reddit
I get scared because they are so good and there's no way this can last. They will become too powerful to give away for free right? The agentic gains in the last 12 months have been dizzying.
Cerevox@reddit
They are giving it away to undercut western models. As long as google/claude keep producing models, china will keep making better free models to trash their business model.
ZenaMeTepe@reddit
Could be the first time normal people benefit from such economic warfare.
BewareOfHorses@reddit
Normal people have always benefited from it. There's a reason every man and his dog in the third world has a phone and internet access.
unjustifiably_angry@reddit
Well, normally there's a long-term cost that isn't apparent at first, like all your jobs going overseas and your kids' lives being ruined so you could benefit from cheap tat for a few decades.
For AI there's really no downside in this case for average people, at least not one that's immediately obvious. The US economy might tank in the long run because it's being propped up by AI companies right now, but if you're not American you have absolutely zero fucks to give about that situation.
charnet3d@reddit
The "downside" is that you have to adapt or die. A programmer in this era 2026-2030 and beyond who doesn't use agentic coding is like farmer who never uses tractors and does everything manually, or an accountant who insists on using mental math. They will be uncompetitive and left behind.
Imaginary-Unit-3267@reddit
If I remember correctly, there's actually some hunter-gatherers in Africa who have phones.
Ardalok@reddit
Competition has always been good for the market.
unjustifiably_angry@reddit
To a point. Eventually one company makes an objectively inferior product that they can sell much cheaper and the competitors are forced to follow suit, and then it repeats and repeats until you have the average American diet.
layer4down@reddit
The West knocks communism and even socialism but it seems to have its benefits considering we’re their number one customer.
InevitableMaw@reddit
This is literally capitalism at work. Different entities competing, leveraging their strengths. Meta was open sourcing models for these same reason the Chinese firms are.
layer4down@reddit
In theory, Google gets to sell compute as well for running Gemini on TPU’s. We least we see this in Azure and AWS for sure.
Awongy00@reddit
China is far more capitalist and consumerist than the West in many aspects.
dto_lurker@reddit
Are people who run their own AI models (me included) "normal"?
darkwalker247@reddit
the normal is using GPT for tasks that a sub-30b model could do easily with a fraction of the resources 😔
Outpost_Underground@reddit
Should be
Outrageous_Mail_8381@reddit
You do, until you don't when consolidation and profit seeking comes in. You could argue the same is currently happening in the EV space where there's tremendous competition atm in China with dozens of different brands fighting it out to build market share, it wont last forever.
KallistiTMP@reddit
And I mean because they legitimately do have an impending population crisis and a strong national interest in developing robust social infrastructure.
It really is largely just common fucking sense for research to be conducted openly, and supported by public funding and infrastructure.
The western model is so counterproductive to innovation and research that the west is sweating and barely keeping a 3 month lead, despite having something like 100 times the budget. And yes, I know that China has smuggled H100's and probably a fair deal of B200's and all that, but their entire countries' compute combined is smaller than what just Anthropic or Grok or Meta has.
Their power grid and water systems are worlds better too. And they don't have a mass anti-AI movement because their infrastructure is overbuilt for public and industrial needs instead of pillaged for private profit to the point that there isn't enough power and water for both the datacenter and the humans living next to it, and the public trusts that it will be used to improve people's lives.
Meanwhile the main thing the US is competing on is how many mass layoffs can be enabled, how effectively it can bomb third world countries, and how many ads they can cram into the damn thing.
China is doing what they've always done. They bet on the American capitalists being greedy and shortsighted. And they were absolutely right.
Foreign-Beginning-49@reddit
I guess it's not a win win then, thats what bothers me about it. Makes sense, it's low n key economic warfare. At least it benefits the poors like myself. There is a great sadness in realizing how far we are from truly universal access to intelligence on tap. It's like the French revolution when folks weren't allowed to become literate. Perhpas it's the agents themselves that will do a new one...
IamFondOfHugeBoobies@reddit
What you have to realize about China is that they aren't competing against you or the average man. Don't get me wrong, I'm not saying they CARE about you.
But they compete against western billionaires and politicians. In their view, empowering us is weakening them. Because in Chinese politics the idea of individualism is the worst poison to state power they can imagine.
Thus empowering the average citizen = Poisoning the state.
This has held true to some extent, just look at tik-tok, at how they helped dumb down America by subtly supporting any policy that made the rural poor less well off and educated since the Clinton era.
Now they view citizens having power, non-state controlled AI as another step in that. In their mind it makes us harder to control thus weakening the state.
The irony of course is that individualism will always trounce collectivism because in collectivism corruption and flaws become hidden and thus never worked on.
It's why Russia failed the initial Ukraine invasion so bad and yet now several years later, have a far, far more capable military. The flaws could be hidden under a strong state. But then need forced them into the open where they could be fixed.
What I'm saying is. Xi Jin Ping is as delusional as any other asshole and we should just be VERY happy it's benefiting us little folks. Xi is truly the greatest supporter of individualism and personal freedoms, even if it is entirely by accident and against his will.
MeowManMeow@reddit
I agreed with all your points except when you started saying that individualism prevents corruption from being hidden.
Authoritarianism promotes and hides corruption, and that can come out of a collective or individual system. I mean half the current US administration are conmen, rampant corruption. The army has never passed a budget, pardons for embezzlement and fraud everywhere.
I think what you are missing is that non-capitalist/imperialist countries have to have an authoritarian government to prevent the USA from exploiting any flaws to get a regime change. Look at all the countries USA has had a hand in overthrowing governments (and these are just ones we know about) by fanning any flaws into rebellions and into a coincidentally friendly with the US government.
bnolsen@reddit
He never startes prevents in that post. No system is perfect that has humans in it. Corruption will always exist in some form but the more centralized government is the more centralized and extreme the corruption can be and probably is.
MeowManMeow@reddit
So you are saying maybe a class-less and state-less society might be the best to prevent corruption as there is no central government?
I 100% agree with you that the amount of societies efforts are put into LLMs is a giant waste, which is why I don’t believe that a few men with huge amounts of money get to dictate what our economy is focused on.
IamFondOfHugeBoobies@reddit
I'm not sure how you could call Trump an individualist. The man idolizes North Korea and forces people to wear shoes that don't fit to stay in his good graces.
Also lol@"imperialist" country. Your privelige is showing buddy. The west isn't "imperalist" in isolation. It's just the imperalists that won imperialism against the eastern imperialists.
The fact that you are aware of all the dirt the u.s has done pre-Trump and you're discussing it is my point. Individualist countries are not DEVOID of bad things. But they critique, debate and argue over them.
MeowManMeow@reddit
I’m not calling Trump an individualist. I’m saying the USA is one of if not the most individualist country. In your comment you say that China and Russia is collective and that’s why they are corrupt, yet corruption is rampant in an individualist society.
I am aware of the corruption of Trump and discussing it, but we are also talking about Putin and the corruption of Russia engagement in Ukraine. Yet you say one is hidden and the other isn’t, but how can we be talking and discussing about it if it’s hidden?
In both there is no action against the corruption.
Karyo_Ten@reddit
https://gwern.net/complement
Foreign-Beginning-49@reddit
Definitely needed this thank you. Yall are super informative.
IrisColt@reddit
I understand that reference, heh
mrgalacticpresident@reddit
Re-Read it! Great insight. Bonus that I absolutely love gwern. Thanks for sharing.
The_frozen_one@reddit
I don’t really buy that argument though, is Google engaged in economic warfare? Gemma4 is quite good too. I think it’s “commodify your compliment”, i.e. don’t let products that benefit your product become valuable. If you sell hamburgers, you want hamburger buns to be as cheap and plentiful as possible. If you run a search engine, you want browsers to be fast and free.
ZenaMeTepe@reddit
If you prevent US models ever becoming profitable, after they'll incinerated 1000s of billions?
dexterlemmer@reddit
The models are supposed to be the commodity. The US hyperscalers also want that mid- to long term.
AFAICT, the products are: 1. Training data. Importantly, this includes data about how previous iterations of your model was used. 2. Safety/Alignment for manipulating many users and their products via the models they use. 3. Inference datacenters that benifits from economy of scale and the robustness of highly distributed servers. 4. Supercomputers for training the next SOTA model to be privided first to giant players that want an edge and big research projects that have massive budgets and new, almost unsolvable problems to spare.
Alibaba (qwen) competes in 1--3. They would love to compete in 4. However, for the time being, they simply don't have access to either the technology or the expertise.
However, US SOTA expertise trickles down to more Western researchers fast and the breakthrough is much more expensive than future developments inspired by it. Once you've made a brealthrough on a hyperscalar supercomputer, it's at most a few months until a Western researcher does better with an Open Source model trained on a 1k+ times less powerful supercomputer. And it's to Alibaba's advantage if that Western researcher had used Qwen as base model rather than Gemma.
Thus, by Open Sourcing Qwen, Alibaba makes more money on 1--3 and China gets to be less far behing in cutting edge models for domestic use.
erkinalp@reddit
there's no undercutting, their product is both cheaper and tangibly better
InevitableMaw@reddit
Lol. If I could run either the best chinese model, or the best Claud/CPT locally, I would never touch a chinese model.
admnb@reddit
Aren't they basically scraping the big models? Like the reason the can create these capable models it because western models are created in the first place. They are riding that wave and will continue to do so, forcing the big western companies to keep running and to keep overextending.
InevitableMaw@reddit
Nothing China is doing is forcing western models to "overextend". They would be investing just as much if the Chinese models didn't exist, because they are primarily competing against each other.
vulgrin@reddit
China also has central government that is pushing the entire country to use AI, and wants it spread as far as it can.
Both_Opportunity5327@reddit
You also forgot to say that, Nvidia will do the same and maybe sponsor other labs, because it keeps the labs training models therefore buying thier equipment.
oxygen_addiction@reddit
The latest Qwen hits about 1/3 of the issues that Opus 4.6/Gpt 5.4 xhigh do on my benchmarks.
So there's a lot of room for growth.
_Erilaz@reddit
Aren't you comparing a mere 35B A3B to the biggest Claude and GPT models in their extra-tryhard reasoning modes?
Most-Trainer-8876@reddit
right? It's literally 35B A3B compared to Opus, which is probably 5T model... which is almost 150 times bigger!
power97992@reddit
Dude people compare it to the best , not to the worst or something bad like llama 4 scout
Most-Trainer-8876@reddit
What in world would you expect a bee to fly like eagle?
I get your point tho... I'll be waiting for that day, 5T Opus compressed into size of 35B. Probably never, instead we will be able to run bigger & smarter models in the same cost as 35B.
gpalmorejr@reddit
That was my thought Though studies have shown it isn't a 1:1 for intelligence versus size, we are still talking about a 35B A3B model versus models that are estimated at what 700B+ now? Isn't one of them estimated to be 1.5T? Like...... At that right a 33% success rate is like winning an F1 race 33% of the time with a Corolla. I don't care that it doesn't natch them, I'm impressed to compared at all!
I numbers, we are talking about a model that is 1 to 2 orders of magnitude smaller than the others but performs within 0.2 orders of magnitude. That's insanity.
And the yes the larger model have more latent knowledge and "facts" built in as well as their nuance and long context handling is a little better, but half of that can be leveled by giving Qwen3.5 a websearch plugin or other internet access, and the other part can be managed with some simple project and file management. Not to mention for people who can run the Q8 and F16 versions will probably never have a nuance and loop issues. I barely have them with Q2 and Q4.
TLDR: I'm a nerd. LLMs are cool. Qwen3.6 is impressive. I need better hardware.
power97992@reddit
I doubt it is 1/3 as good as gpt 5.4 / opus at everything… only for specific tasks… even 3.6 plus is quite lazy and outputs simple stuff if u dont specify much and tries to give the simplest answer as possible even with high reasoning and in the api… whereas opus and 5.4 have better outputs even without super specified prompts .. got 5.4 and opus have a much better understanding requirements and good outputs than 3.6 plus.. if 3.6 plus is better than 36b 35b then 35b is probably not great but enough for some tasks
gpalmorejr@reddit
Fair. Like you said, finishing the race 1/3 of the time does not mean you won. But for local models, it is really good.
power97992@reddit
Glm 5.1 is pretty good, minimax 2.7 is decent
gpalmorejr@reddit
Yeah but huge, unfortunately. They are closer to SOTA models in size than regular open models anyway. So that tacks.
QuinQuix@reddit
You mean it solved about a third of what the big boys can solve?
RedParaglider@reddit
If my toaster can't cook what my Traeger Pro can cook, then what's the fucking point. Fuck that toaster.
QuinQuix@reddit
I wasn't bashing?
RedParaglider@reddit
I know, I was backing up your comment with a stupid joke.
Borkato@reddit
Which is fucking insane.
JackPrince@reddit
With a free model. Iterate the issues with proper boundaries and you basically only pay for the iterations. It is a scaling issue from their on
QuinQuix@reddit
What would I need to install to run this setup?
I'm kind of struggling not so much with understanding what I'm doing but with limited time.
For example I'm running comfy but then you download the models and you find out you also need Workflows to go with them. All the workflows have their own nodes and models incorporated and downloading those from within comfy only works partially for nodes and not at all for models.
Googling the models will give you multiple hits and versions for each file and you need to check whether they are malicious or not. You can't really run universal Workflows easily because different models have different requirements.
Do you literally lose most time cobbling together the duct files and dependencies to get a Workflow running. Which isn't that much fun but I guess it is what it is.
If I wanted to vibe code something I'm assuming you need to install an agentic coding framework that can load the required models and you probably need to set up a Workflow as back end not entirely dissimilar to what you need to do with comfy?
I wouldn't mind the time investment if I had more time but I don't.
I find tinkering with the models and model settings fun. I don't find scraping together 25 dependencies only to see comfyui crash fun. Lol.
gpalmorejr@reddit
I'm like you. I understand computers but I do not want to spend my life in dependency hell.
I run LM studio (just download and run). I run Qwen3.5-35B-A3B.
I run RooCode extension on VSCodium.
Poin RooCode at LM Studio with a drop down menu. And TADA.
There is still tweaking involved but unlock a lot of CLI tools it is literally the click of a button or movement of a slider. Documentation for all these tools is good. Telling RooCode to use LM studio and your chosen model is literally a drop down menu (although careful here, if you have another model loaded and select a different one, RooCode will try to load the new one on top lol). Unfortunately you'll never get away from some configuration but this is definitely the easiest way for me.
QuinQuix@reddit
I feel you. Dependency hell really is hell.
I recently set up wsl2 to run personaplex.
My god.
Also it's dumb as a rock. So it was a bit of a letdown honestly.
gpalmorejr@reddit
I run Fedora 43 KDE and it has been a bit of a dream honestly. People get scare of Linux but I think even though it increases some dependencies, it make getting them easier. When I needed a bunch of depenedcies for a thing RooCode was building, it told me. I literally just type "sudo dnf install [whatever I thought it was called]" and it just installed it and all the dependencies for it. If I run a command that I don't have the tools or dependencies for, it'll just ask to install them automatically and then it runs the commands after, so it only takes a single click of the y button if I don't have what is required. And for many things, they are either included in the Appimage, Flatpak, Snap, or installed automatically when you click install in the package manager, which is set up like the appstore and stupid easy to use. And if it is not, it'll just tell you and ask to install them automatically again. Yes yes... It's a little bit of command line use, but like..... So little it is actually easier than finding the things you need online. And you'd be surprised how capable the CLI tools for this are. I'm not shilling, I'm just saying I actually have loved the switch because it makes things easier in unexpected ways. But in the ways that scare people. Lol. Yeah I have to use a terminal command but also, I used one commands and everything automatically found all the dependencies, installed only what I didn't have, update what I did if necessary, and so on. But most of the time I just double click and Icon like Windows or click install in the app store. And I can update EVERHTHING include system files, webbrowsers, libraries, etc with a single update button. Super nice.
Sorry that was a rambly.
Tldr; I have enjoy Linux for the exact reaso n people avoid it.
Due-Memory-6957@reddit
People said exactly that and then they came and released a 3.6. Why can't you people just stop being doomers for a second?
tecneeq@reddit
The model will never get worse. It will last as long as you find use in it.
swingbear@reddit
So, and this is just my opinion from recent news reports. The OW Chinese models distill their flagship models from the current frontier models by Anthropic etc. so they get close in performance, that’s why they are always close but not quite leading the pack. Qwen have become absolutely dominant in distilling that knowledge into their small-mid sized models.
So afaik, if they stopped the openweight race they wouldn’t have enough market share for the proprietary game.
AvidCyclist250@reddit
The giving will taper off. It's guaranteed.
Oren_Lester@reddit
its not for free, they are branding
Safe-Ad9662@reddit
Apache 2.0 ! Its free to use
Oren_Lester@reddit
Modern age Robin hood. these are future investments, it's not MIT or Apache license because they democratizing AI, it's open source because it's part of plan. These vast investments are not done for free. It's the way for the Chinese to build trust / brands. These models will stop being open source or commercially restricted once they will have the lead. But that's my opinion and I am probably wrong.
Safe-Ad9662@reddit
Modern Robin Hood? Please. I’m more like Friar Joo — a monk with a big belly from sitting in front of a console for too long, just enjoying the tech while it's here.
You should probably loosen that tinfoil hat of yours; it seems to be blocking all the rays, including the ones carrying common sense. I couldn't care less about your 'geopolitical theories' or your brand-building paranoia. While you’re busy overanalyzing the world from the depths of your own ego, I’m just here to use what’s available.
Peace ! beach !!
Oren_Lester@reddit
hehe "brand-building paranoia", "geopolitical theories", qwen3.6 is good. You got offended or your agent ?
Safe-Ad9662@reddit
Offended? Far from it. I'm just entertained by your textbook McCarthyism. Richard Nixon would be shedding a tear of joy seeing you in action — to you, every Asian line of code is a spy and every independent user is a 'compromised agent.'
It’s hilarious that you think I need an AI to call out your tinfoil-wrapped delusions. Some of us actually spend our time in the terminal building things, while you're stuck in a 1950s fever dream where 'brand-building' is a grand conspiracy.
If you spent half as much time learning how these models actually work as you do sniffing for 'propaganda,' you might actually contribute something useful. But hey, keep playing the paranoid sentinel. It’s a great look for someone who’s clearly terrified of a world that’s moving faster than his brain can parse.
Now go back to your basement, the Red Scare called and they want their script back
Oren_Lester@reddit
Not terrified, just not naive like you :), it's not about Chinese or not Chinese, but thinking that a frontier lab invests huge amount of money without a business plan is willful ignorance.
Safe-Ad9662@reddit
Oh, so we’ve moved from conspiracy theories to 'Economics 101'? Groundbreaking. Thanks for explaining that businesses want to make money — I truly had no idea while I was busy compiling their code for my own use.
You call it 'naive,' I call it pragmatism. I'm using the tool while it's sharp and the license is open. You’re standing in the corner crying about the manufacturer's long-term business plan while the rest of us are actually getting work done.
Stick to your 'not naive' philosophy if it makes you feel superior, but while you’re busy overthinking the 'why,' I’m busy with the 'how.' We are not the same. Have a nice life in your bunker
Oren_Lester@reddit
You are more than welcome , happy you understand there is a reason why it's open source and free
svantana@reddit
Commoditize your complement. Alibaba is not trying to pivot to LLM serving as their main business. The same goes for Amazon, Nvidia. Maybe some will start to do a 2-tier system like Google.
tecneeq@reddit
> Be Gemma
> Get casually beaten by Qwen just a few days after you delivered your sliding window magnum opus.
BitterProfessional7p@reddit
Partly it is explained by the fact that they jacked up the reasoning tokens 40%. It is more like a Qwen3.5-35B-A3B (xhigh)
Most-Trainer-8876@reddit
I noticed this as well, It thinks for way longer! But I think it's worth it, results speak for themselves.
Gemma 4 and Qwen 3.5 onwards, they are finally on such level that it can be used as your coding assistant who follows you nice & good when given enough information.
mike7seven@reddit
I have yet to test this theory but everyone says adding tools to the models Qwen3.6 and Gemma 4 significantly reduces the extensive thinking.
Most-Trainer-8876@reddit
Yes, way way smaller thinking when using with tools, literally one liner thinking. But apart from that, it thinks longer in general chat.
mike7seven@reddit
Tested it yesterday and I can confirm when tooling is present the thinking/reasoning drops significantly.
Zc5Gwu@reddit
It's still faster than 27b, even with all the tokens it's using.
CriticalCup6207@reddit
Can confirm. "Properly configured" is doing a lot of work in that title. We ran the same evals with default config vs. tuned context window + rope scaling and the gap was significant. The model is genuinely better but you're leaving a lot on the table with out-of-the-box settings. What did your config look like for the jump you saw?
onil_gova@reddit (OP)
using recommended settings from the model card.
but specifically calling awareness to preserve thinking flag being a requirement now.
I'm interested in your findings and settings, care to share?
sleepy_quant@reddit
Huge upgrade over the 2.5 32B 8Q. I've got a similar setup but my 3.6 tuning is still a mess lol. Any chance you could drop your config? Specifically interested in how you're stopping the hallucinations/looping during long coding sessions
power97992@reddit
2.5 came out like around 19-20 months ago …
sleepy_quant@reddit
Tested 3 dense and it’s just not stable compared to 2.5 for my stack. Getting a lot of leaked thinking and hallucinations. I might need to rework my system prompts, but right now it’s definitely not hitting the sweetspot
_risho_@reddit
using lm studio on a macbook when send a message to this qwen model it just spins its wheels for thousands of tokens (sometimes infinitely) and then finally responds after multiple minutes. the thinking is coherent it's just very redundant and seems less than necessary. is this expected or does lm studio need to push out an update to use it properly?
onil_gova@reddit (OP)
for your use case, Gemma might be better. Ths is an agentic-heavy model, thr think gets way more focused once you put it into an agentic harness.
KubeCommander@reddit
What harness are you running it in? I was considering something like opencode which does pretty well with qwen-next coder
BumblebeeParty6389@reddit
Is 3.5 27B and 3.6 35B really on par with DeepSeek V3.2?
pigeon57434@reddit
on strictly hard stem stuff yes but on basically anything else no way
ortegaalfredo@reddit
Quite obviously not if you do some small tests.
Healthy-Nebula-3603@reddit
Yes
DS 3.2 is very old
BumblebeeParty6389@reddit
It came out 5 months ago 😭
Borkato@reddit
I love this hobby so much
petuman@reddit
Yeah, but it's more or less modification of V3 from Dec 2024, not new pretraining run from ground up.
Healthy-Nebula-3603@reddit
Yes it is old :)
Faktafabriken@reddit
”Very old”
AI is moving FAST!
oxygen_addiction@reddit
On some tasks.
Iory1998@reddit
I can't wait for the 27B!
cafedude@reddit
and coder-80B
Steus_au@reddit
yesh this is why we are all waiting 122b as it could put sonnet to the tears
cafedude@reddit
I'd like to see a 3.6-coder (80B).
onil_gova@reddit (OP)
I honestly can't wait for a Sonnet-quality model on my laptop. We'll be able to protect ourselves against the enshittification of Frontier model subscription plans with their bipolar rate limits.
vex_humanssucks@reddit
The context caching piece is what makes this feel different. Previous generations had to re-feed context constantly which tanked throughput -- having the KV cache actually stick means sustained multi-turn performance is finally usable at local scale.
AICyberPro@reddit
Running Qwen3.6 on a 3090 (24GB) via llama.cpp native binary, the performance jump is real even without an M-series Max. Getting \~100 tok/s on short prompts, \~80 on long ones. The catch is configuration:
Compared to Qwen3.5 on the same card: 3.6 is \~30% slower at peak (101 vs 142 tok/s) but noticeably better at structured coding and reasoning tasks. Paying a speed tax for capability, which I think is worth it.
Full benchmark breakdown, config files, and the Makefile workflow I use daily: github.com/aminrj/local-llm-ops
Curious if anyone's also seeing the CUDA 13.2 gibberish issue or if it's isolated.
ObjectiveOctopus2@reddit
Artificial analysis benchmarks are artificial
ResidentPositive4122@reddit
This sub when a new SotA jumps on artificial analysis - "this is the worst benchmark possible, stupid number goes up, they don't test emotional erp uncensored uniqueness, reeeeeeee".
This sub when a new open model jumps on artificial analysis - "this is the one!!!111"
Rinse and repeat. Dazed and confused.
DeProgrammer99@reddit
This just in. Different people have different opinions. More at 11.
draconic_tongue@reddit
this "different people" cope ignores how reddit works. consensus functions the same even if people are different. it's not different people, it's the same fucking people because the same people use the website
InevitableMaw@reddit
There is some effect where if an opinion takes over a thread, people with the opposite opinion don't engage as much, which can occasionally produce counter hive minds, but yeah, usually the hive mind will exert itself.
Borkato@reddit
Yeah, they’re goomba fallacying
AvidCyclist250@reddit
TIL. Finally, a word for that.
relmny@reddit
Even if that were true, this is a Local LLM sub...
randylush@reddit
It is absolutely true. You’re right that this is the most likely sub to have bias, but still, the bias is strong
FinBenton@reddit
Normally when a model jumps at that test its really whatever, ass test, benchmaxxed model etc BUT 3.6 is actually huge jump in local llms so its a big deal regardless what the bench says.
NoAge5252@reddit
Hi op, could you please provide your full setup as I have the same machine and I am trying to run Qwen 3.6 with herme-agent with qwen running on omlx, and I am facing empty tool call errors. I have updated the prederve thinking to be on. Also read that tgis might be due to qeen thinking inside the blovks and not actually making a tool call outside, and to address that update the hermes config to update model prompt to end the thinking block before making a tool call. Buy haven't seen success so far.
onil_gova@reddit (OP)
Omlx doesn't have support for preserve thinking yet. Still waiting on my pull request to get merged https://github.com/jundot/omlx/pull/814
Dependent-Aardvark32@reddit
Which GPU best for local using ? A100 ? or Is RTX4090 enough ? Does anyone experience ?
Embarrassed_Adagio28@reddit
It really is the first fast local model i trust with coding. I get 75 tokens per second with q5 on dual 16gb v100's.
GrungeWerX@reddit
Hmmm. I’ll be testing if it’s actually better than Qwen 3.5 27B this weekend.
trycatch1@reddit
So far in my experience 27B is much better at Q4. 3.6 35B A3B is almost 4x faster than 27B on my hardware in t/s, but it wastes so many tokens that in the end 27B is faster anyway at getting work done. And 27B is more stable, loops less, so it also wastes fewer my brain tokens.
DOAMOD@reddit
In my tests, 3.6 A3 rivals 3.5 27, but it's about 15 seconds slower by problem. for me its crazy, I cant wait for 27b...
planemsg@reddit
Same is happening on my end. Seeing this alot on other comments as well.
redballooon@reddit
Please do, and report back. Whenever there's a new Qwen release this sub is flooded with posts how this one is the best thing that ever happened to the world, and by a large margin.
That makes me think Qwen is even better at playing social media than on building foundation models.
Borkato@reddit
I’m a real human (lol) and I only talk about qwen being good because it actually is. I understand that there’s an urge to assume it’s all just bots but sometimes the answer to “why is everyone talking about this” is actually “because it’s good”
relmny@reddit
Ir happens exactly the same with gemma...
redballooon@reddit
Deepseek didn't play that game too well after its initial splash.
soyalemujica@reddit
I personally have tested it, and it's at Qwen-Coder-Next level\~, I'd say also at 27B dense in coding capabilities, although, Q4 sometimes fails with toolcalls.
SkyFeistyLlama8@reddit
Sweet finding. I still keep Coder Next 80B around for more detailed analysis and refactoring but I can barely run it because of the size. Qwen 3.5 35B could handle maybe 80% of what I used Next 80B for. If 3.6 35B can do 95%, then I might get rid of that old behemoth.
riceinmybelly@reddit
Old? Haha yeah we’re not getting any sleep
Still-Wafer1384@reddit
Sorry to ask slightly off topic, how do you rate QCN vs 27B?
soyalemujica@reddit
27B is stronger than QCN for complex reasoning. QCN is good for not so complex coding.
Mayank-eagerwithAI@reddit
The preserve_thinking flag is critical—without it you're basically running a lobotomized version that skips the chain-of-thought reasoning that makes 3.6 competitive. I've seen similar gaps with other reasoning models where the default inference settings strip out the internal monologue. For anyone on Mac Silicon, the oMLX + Pi.dev combo is solid, but watch your context window utilization—8bit quant at 3K prompt processing can start thrashing memory bandwidth past ~24K tokens depending on your batch size.
createthiscom@reddit
It's interesting that kimi k2.5 is listed above qwen 3.5 397b. qwen got a slightly higher score on the aider polyglot. I should probably download both.
No_Fee_2726@reddit
_hephaestus@reddit
Which quant for oMLX? Just the mlx community, something from qwen or did you make your own?
onil_gova@reddit (OP)
yeah, mlx community at 8bit
julianmatos@reddit
Can confirm, the jump from 3.2 to 3.6 is noticeable. I've been using it for code review and doc summarization tasks that used to feel like a stretch for local models.
If anyone's wondering whether their setup can handle it before committing to the download, localllm.run is handy for checking hardware compatibility with specific models and quant levels.
Tigew@reddit
I’ve been running this on a 2070 and it’s been insane.
Big_Actuator3772@reddit
lol
jimmytoan@reddit
The preserve_thinking flag being required to unlock the real capability is something a lot of benchmarks are missing - people compare apples to oranges and then wonder why results are inconsistent. Running it with oMLX + Pi.dev sounds smooth on the M5 Max, what's the context window you're hitting before it starts degrading?
onil_gova@reddit (OP)
I was still somewhat useful past 200k, but definitely started noticing the context rot.
dionisioalcaraz@reddit
is that flag only for agentic use cases?
Fit-Palpitation-7427@reddit
Does it run on a 24Gb 4090?
onil_gova@reddit (OP)
yes, at Q4, check out the unsloth quantity size breakdown.
myreala@reddit
Any ideas how I make it to stop giving up? I'm using it with open code and I keep having to prompt continue before it starts again for a few seconds and then gives up again.
an0maly33@reddit
I had this problem with gem4 but qwen3.6 hasn't done it yet for me. I use pi primarily. Maybe a difference in the harness?
onil_gova@reddit (OP)
yeah, same. But in case anyone is running auto-research loops and doesn't want to enter "continue" every time, here is a Pi extension I wrote just for that
ortegaalfredo@reddit
It's a great model, but no way in hell is better than Deepseek V2, and not even at the level of Qwen 27B
onil_gova@reddit (OP)
Check out the results breakdown. It's not a sweep across the board. It does beat this model on HLE, for instance.
BustyMeow@reddit
That's average, meaning that some are better or worse.
DOAMOD@reddit
Those of us who actually use the model and aren't just talking nonsense, said so from day one, and people saying this is just benchmarxx.
StardockEngineer@reddit
27B is in the chart twice?
Economy_Cabinet_7719@reddit
Thinking on and off.
StardockEngineer@reddit
Ah. I didn't catch that from my phone. Thanks.
Bobylein@reddit
Yea just that preserve_thinking does nothing for me in llama.cpp
epicycle@reddit
Did you share your settings somewhere for this? I’m setting up mine to code and interested in folks configs.
korino11@reddit
3.6 same shit as 3.5 .It much worst than even deepseek. qwen from whole series cannot remeber his own context memory. What about projects and rooles. He cannot o anythyng serious at all. garbage..
an0maly33@reddit
Been using it for a few days and it's has been far above anything else I've used for agentic work. You can't just use defaults. Unsloth has the correct settings posted for it.
korino11@reddit
I used it a LOT. and know what? When qwen write -all done. When he made 3 times recheck in project and write me -all is 100% correct. Gpt found TONS of errors an non complited Concept at all. it was done maximum at 30% And that not once! That was at every time..
korino11@reddit
Also. it CANNOT remeber his OWN context. He begins to make mistake on 25% of context windows. He start to forget about Concept, tasks....
balerion20@reddit
We have a100 80GB and currently using qwen3.5 27b with bf16 and 262k context for coding purposes. It is good but kinda slow. Considering trying out fp8 version of 3.6 35b with same context, does anyone tried out and have any comments
tmvr@reddit
Well, you have the hardware there, I guess you could go ahead, try it out and tell us what you found? ;)
balerion20@reddit
I will definitely try next week but since we were working on a project I didnt wanna pull the plug from the qwen 27b since there are people working. I was just wondering if some people had a chance to compare but I guess I am in the wrong since it got downvoted lol
tmvr@reddit
It was a joke, I don't know why you are getting downvoted either...
balerion20@reddit
Weird really, maybe people thought I am flexing with my company’s hardware…
q5sys@reddit
That's most likely it. There's a lot of jealousy on this sub, and if you have a 90 class or higher card there's a bunch of people that will pounce on you.
Sometimes this sub seems to be almost entirely split between people with <=12GB cards... people with 90/enterprise cards... people that converted an old 8x card crypto mining rig into an LLM rig.
kmp11@reddit
It crazy that 12mo ago, Qwen2.5 was all the rage and that agents were essentially impossible with that model.
bannert1337@reddit
With this jump from Qwen3.5 35B A3B to Qwen 3.6 35B A3B I would love to see Qwen3.6 27B. It probably would be even better.
Thunderstarer@reddit
Are we getting a dense 3.6?
cosimoiaia@reddit
I tested it over the week after it got gguf'd. It handled pretty much every task in my workflow (analyze features on project, create issues on GH, pick up issues, work on the fixes, run/create tests, open PRs) also solved a truckload of problems I was having on a complex project that even gpt5 was looping for, in one session. I have to say, I'm pretty impressed. It's a great and fast model, I only wish it was European so I would feel icky when using it.
Borkato@reddit
Didn’t it get released like yesterday??
cosimoiaia@reddit
Wednesday iirc, yesterday they updated it, I still haven't downloaded that one.
planetearth80@reddit
does preserve_thinking work with Ollama?
BrianJThomas@reddit
I tried with Claude code and got hundreds of thousands of tokens generated for a medium size coding task. Is that normal for this model? It generates like 20x the tokens of Gemma 4 for me.
oxygen_addiction@reddit
Opencode or pi-coding-agent
Claude Code poisons non-Anthropic models by default
BrianJThomas@reddit
I read the opencode prompts a while back and they were full of garbage instructions trying to tune model behavior. Is it better now?
oxygen_addiction@reddit
You can customize them. I've switched to pi + claude inspired system prompts from the leak
MaCl0wSt@reddit
how's Pi? been hearing the name recently. I've only used OpenCode when it comes to local LLM coding agents
rpkarma@reddit
It’s great because it starts empty, basically. Just a couple of tools for file read, write, edit, and bash. Barely anything in the base prompt. Configure it yourself and build what you need :)
am2549@reddit
What do you think of Goose? And where did you get the system prompts, made them yourself?
SmartCustard9944@reddit
No, OpenCode is still bad, it doesn’t structure the context as well as Claude Code.
Also, the context poisoning is just an unsubstantiated claim. For me Claude Code works really well with Gemma 4.
tecneeq@reddit
Claude context poisoning is a myth, perpetrated by the open weights kabal, to cripple the free markets?
h310dOr@reddit
I personally used qwen to rewrite them ... Much better..kinda weird though that they would be so spammy and full of repetitions
BrianJThomas@reddit
Yeah I actually rewrote it a while back. I’ll have to revisit. Thanks for the info.
Western_Objective209@reddit
claude code is the only one that has a team that runs evals on their prompts. there's no poisoning going on, it's just not tuned for other models
Kodix@reddit
That's been my experience so far, in limited tests. The results are *more reliable* than Gemma 4 (which often requires secondary bugfixing passes), but each task takes a longer time due to the reasoning.
silentsnake@reddit
Turn on preserve_thinking otherwise it will yap non stop every round
t4a8945@reddit
So on this graph, Qwen 3.5 35B-A3B is better than Qwen 3.5 122B-A10B.
Yeah that invalidates anything else.
BustyMeow@reddit
Not Qwen3.6-35B-A3B (Thinking)?
t4a8945@reddit
Oh yes, my bad, you're right. Didn't see the lightbulb icons.
whyyoudidit@reddit
which cloud is offering this for cheap?
Ell2509@reddit
Is minimax m2.7 not on there?
WithoutReason1729@reddit
Your post is getting popular and we just featured it on our Discord! Come check it out!
You've also been given a special flair for your contribution. We appreciate your post!
I am a bot and this action was performed automatically.
hoschidude@reddit
Qwen 3.5 27B is still ways better. No idea what this benchmark is saying
Technical-Earth-3254@reddit
These insane benchmark jumps for .1 version increments are counter-productive in the long run. Expectations are going up and while the models are good, they can't keep up with what people expect from them.
KaMaFour@reddit
"models improving is bad actually"
Thedudely1@reddit
It really is a good model based on my limited tests so far. Using Unsloth's Q3_K_XL. It can't compete with DS 3.2 in terms of raw breadth of knowledge and facts, but it is great at following instructions and writing a ray casting engine in a niche Java derivative, which 3.5 could not do reliably in my experience. It is defenitely a significant improvement over 3.5 no doubt. But it's also still a 35b MoE model. It is very close to the dense 27b 3.5 model.
JohnMason6504@reddit
Can confirm, running 3.6 8bit on a much more modest box, single 4090 48GB mod with 64GB DDR5, and the jump on code tasks is real. Where 3.5 would start looping on a refactor around 6K context, 3.6 holds discipline past 16K in my logs. preserve_thinking is not optional, turning it off costs about 8 points on HumanEval-plus internally. Also worth flagging for people on Pi.dev style setups, the MLX 8bit path on M-series is different from GGUF Q8_0 on llama.cpp, the MLX one gives you cleaner quantization for thinking tokens specifically. If you are on NVIDIA, use AWQ 8bit through vLLM, not GGUF Q8. The quality floor is meaningfully different.
port888@reddit
In LM Studio, I've been getting "Error rendering prompt with jinja template: "Unknown StringValue filter: safe"." whenever I use any of the Qwen 3.6 models. The fix is to remove
| safefrom the prompt template jinjja, usually at line 122. it's been perfect ever since.Reference: https://ianlpaterson.com/blog/lm-studio-fix-cannot-truncate-prompt-n-keep-n-ctx/
Long_comment_san@reddit
apeapebanana@reddit
THAT'S A LOT OF NUTS!!
Blues520@reddit
That's nuts
Jealous-Astronaut457@reddit
Qwen 3.5 35B better than Qwen 3.5 122B, what a strange ranking
tecneeq@reddit
qwen 3.5 35b has 37 points, qwen 3.5 122b has 42. What are you talking about?
Jealous-Astronaut457@reddit
I am wrong, the non thinking
MushroomGecko@reddit