Qwen 3.5B is so impressive, it found multiple bugs claude opus 4.7 couldnt
Posted by ArugulaAnnual1765@reddit | LocalLLaMA | View on Reddit | 17 comments

Just wanted to start off with how absolutely blown away i am by this new model. I am running the bartowski/Qwen_Qwen3.6-35B-A3B-GGUF IQ4_XS quant on my 5090 with the full 256k context.
I am damn impressed! I had asked it a very broad question, to just look for any bugs or issues.
With that huge context window, I noticed it dumping entire relevant files into its context , which it could easily handle, it filled up to 150k\~ tokens before dumping its plan, which I am seriously cool with (I like to transfer the plan to a new convo and reset that window anyway)
It was able to find multiple bugs which violated the guidelines set in rules/claude.md
Running on my 5090, it was blazing at around 180 tps - my eyes were wide as I saw the machine work in front of me, it was truly glorious
In contrast, I tasked slowpus 4.7 to the same task. After taking literally 10x longer and using my entire 5hr usage window, It didnt even find half of the legitimate bugs that my local setup found.
I noticed that claude was MUCH more careful about loading up the context, performing a ton of greps and text searches, sure its much more efficient for anthropics servers, but it will never beat half of the codebase being loaded straight into context lmao
Overall, the past 6 months has fealt like flying on top of a rocket - it was so useless months ago, now its super smart and insanely fast, my mind it literally blown rn
Hodler-mane@reddit
why are people even trying to compare these models. Qwen 3.5/3.6B is great at following instructions and tool calling.. but its no Opus, its no Sonnet, it probably gets beaten by Haiku.
Now go write a full stack application with Qwen 3.5B and find a million issues with it that you wouldn't have with Opus.
Simple_Library_2700@reddit
3.5 27B does without a doubt beat out haiku, so I’d imagine 3.6 does as well.
But to say it beats opus is very funny and simply untrue.
ArugulaAnnual1765@reddit (OP)
When it runs literal circles around slowpus, able to iterate through multiple prompts before slowpus even finishes reasoning for the first.
Simple_Library_2700@reddit
Yes, I also have a 5090 and love running local because it is fast. But when I need something done right I’m still turning to the cloud model every time.
Prudent-Ad4509@reddit
I'd say there are certain usages and cases where it would. But the same can be said for any highly specialized llm done right.
No-Anchovies@reddit
Because some people actually build massive full stack applications....script by script (such surprise, much wonder). Not everyone is out there vibecoding slop
ArugulaAnnual1765@reddit (OP)
lol exactly I had it work on a ticket, not the entire fucking codebase lmao
ArugulaAnnual1765@reddit (OP)
The fact that its even in the same conversation blows me away - also at 180 tps it blazes right past slowups, by the time slowpus is done reasoning and out of usage, my local set up would have already gone through multiple iterations anyway, producing a better result.
My claude pro sub expired on the 18th and i havent had an issue with usage warnings since LMAO
Glum_Act122@reddit
missing the point of local models
mmis1000@reddit
If you think about gemini flash/pro is supposed to be a 10x or 30x bigger model and can't even follow instruction for a long run. Requires antigavity to prepend any single output with hint to force it use tool properly...
Elegant_Tech@reddit
If you are doing basic coding with popular languages you might not see much difference. Like you even said you have to get big in scope with a more complex tech stack that hits the knowledge gaps.
xXy4bb4d4bb4d00Xx@reddit
ive been using qwen + opencode now for months and consistently it delivers better results
i have out of interest given the same task to opus / codex a few times and it's never really done anything that qwen couldn't - and in at least one case opus failed completely
i understand things are moving very quickly and many people do not have the time/capability to look at all the various options, but i am quite convinced that the api based models are doa
ArugulaAnnual1765@reddit (OP)
100% I can get the same results now, its 10x faster, and unlimited usage - why exactly would I go back to paying $20 a month again? LOL
Techngro@reddit
Is this only possible for 32GB VRAM setups (you mentioned 250k context)? I'm considering getting a 24GB GPU because I know my 4080 Super won't cut it with 16GB. I've been paying for Claude Max for a while now, but am leaning towards going mostly local.
ArugulaAnnual1765@reddit (OP)
You might be able to get away with it if you use a smaller quant, smaller quant and more quantized kv cache (I have mine q8)
egomarker@reddit
Cool story but no.
rarogcmex@reddit
Have you verified that bugs do exists really? I believe Anthropics deliberately made their models numb in cybersec.