I'm Not a Dev But I Use Qwen 3.6 35b to Code

Posted by thejacer@reddit | LocalLLaMA | View on Reddit | 73 comments

Full disclosure: I used to program a bit, but I was garbage at it so I found a new career. This was eons ago so I'm not a dev, obviously.

There's been a few posts the last couple of days highlighting struggles with these small models and coding so I wanted to just share what worked for me, and this isn't a "use this harness" or "this agent did the thing" kind of post. Keep in mind, I'm not a dev, I never even learned modern development strategies or anything like that so if this is obvious to some of you actual programmers just forgive me and move on, if it sounds stupid...well it works, so...

The thing that changed vibe-coding for me was having the LLM write and run very thorough tests. I don't know if I was doing something wrong before but the LLM didn't recommend this (GLM 5, Kimi K2.5, Gemini 3.0 Pro, Claude Sonnet...) but more and more I noticed people mentioning tests and iterative development that I just couldn't get my system to do...turns out after I prompted the LLM to write tests it would and then it runs these tests after every change and makes corrections. With this I've managed to get substantially better work done with Qwen 3.6 35b than even Kimi K2.5 (prior to tests obv...).

Previously I would ask the LLM to add a feature or fix something and something else would end up broken or modified in some sort of way. This held true for Claude Sonnet 4.5 and Kimi K2.5, while Qwen3.5 122b, 27b and 35b were absolutely useless. Since incorporating these tests I've got working features that Kimi K2.5 (via Moonshot API) kept getting half assed, and its been done with Qwen 3.6 35b.

[-]

OneSlash137@reddit

I am a dev… I don’t use that to code…

People using it to code are coping.

I can’t get qwen to finish two tasks in a row. There’s zero percent chance I’m trusting it to check its own work.

[-]

DocMadCow@reddit

I'm another DEV and I've been using Qwen 3.6 27b and 35b to do some heavy lifting for me. The other day I used it for a long running task where Copilot using Opus 4.6 and GPT 5.5 were shitting the bed. If you haven't tried Pi.dev yet I suggest you give it a shot before you give up.

[-]

OneSlash137@reddit

You don’t have as much experience as you think if you’re putting qwen in the driver seat and are happy with the results.

[-]

DocMadCow@reddit

I'm happy with the results as I do most of my code by hand with some auto complete. But for long running tasks like reviewing old code bases it does quite well.

[-]

OneSlash137@reddit

I’m sure it can. The solutions to any issues it identifies, assuming no false positives, will be wuestionsble at best, not scalable, and in most cases it will break something else.

It does not have the reasoning required to examine the code, Id fixes and then play out the downstream impacts. It’s a constant game of whack a mole.

[-]

DocMadCow@reddit

All AI code should be code reviewed as being the person that generated it you are taking responsibility for it. I also find it handy to sometimes do a second pass with a different model to verify the work done.

[-]

OneSlash137@reddit

Using one lobotomized model to check the work of another one is just about the dumbest thing I’ve heard.

[-]

DocMadCow@reddit

Coming into a group about people running local LLMs and shitting all over them is just troll behavior. You should join a community that better suites your interests.

[-]

mlhher@reddit

This is another of these bad harnesses. It is a slight step up to the miserable OpenCode experience which I assume makes people think it is "good".

It is still terrible for local models comparatively to what can actually get out of local models with a harness built properly and not just to slap another UI on top and change some code paths

[-]

ProfessionalSpend589@reddit

You’ve commented 3 times and basically said "you’re holding it wrong" every time.

I’m not gonna ask you what is the proper way - it seems you don’t know either.

[-]

mlhher@reddit

Nobody has asked. People have just jumped onto "NO IT DIDNT WORK FOR ME SO EVERYONE ELSE IS LYING". I think this is a case of people just having their egos attached to it.

> it seems you don’t know either.

I have built my own harness yes. But I am not trying to promote it which is why I have not posted it. Though people have told me that "you solved local coding for me" and "the same model seems smarter with xx" (I think it should even be visible in my reddit history).

Contrary to most people it seems I am open to testing instead of blindly following.

[-]

JeuTheIdit@reddit

If you promote a harness: Complaints and downvotes.

If you don't promote a harness: Complaints and downvotes.

People fucking suck bro just move on... Not worth the trouble.

[-]

mlhher@reddit

I am just trying to explain what I learned. The goal shouldn't be people insulting each other. It should be learning and furthering one's own (potentially flawed) perspective. What should be avoided at all costs is just jumping to conclusions because "hey I might be doing something wrong" hurts the ego.

> People fucking suck bro just move on
This does not mean one should level down to the same playing field.

[-]

ProfessionalSpend589@reddit

I didn’t recognise you, but your project is in my bookmarks, because it seemed interesting.

Although my LLM (distributed over network) executes so slow I haven’t had problems with opencode yet.

[-]

JeuTheIdit@reddit

You have more patience than me but are ultimately correct lol.

Sidenote - Been testing late recently and I have been loving it so far! Appreciate all the work you put into it.

[-]

OneSlash137@reddit

No, I didn’t say anyone was lying. I said they’re too ignorant to know better. There’s a difference.

[-]

sagiroth@reddit

What tool in that case you would suggest then? The answer "I dont use local models" is not acceptable since it is against the point of you even being here in the first place.

[-]

mlhher@reddit

I do use Qwen3.6-35B-A3B for nearly all of my dev work. At 65k context. With virtually no guidance needed. So yes I do use local models.

I use my own harness as already stated. https://github.com/mlhher/late
Don't believe me don't try it if you feel like I am another of these "fork OpenCode change UI" guys lol.

[-]

sagiroth@reddit

Thats fair point. If it works then i am happy for you.

[-]

hidden2u@reddit

what do you use

[-]

mlhher@reddit

Well if I told you now the other guy would say "HAHA GOT U" right? Lol Yes I do use my own harness and I built it specifically because of all this bullshit.

It would be funny if it wasn't quite so sad. But I guess brand loyalty is quite a thing. Even here.

[-]

hidden2u@reddit

huh?

[-]

mlhher@reddit

I was trying not to post my own harness but I get it may seem weird. Though didn't expect the flat out insults really lol.

I am exclusively using https://github.com/mlhher/late with Qwen3.6-35B-A3B for nearly all of my dev work. No missed tool calls, no exploding context, no prompt re-processing.

[-]

sagiroth@reddit

Guy is full of shit, dont feed him

[-]

DocMadCow@reddit

Exactly what someone who was trying to push their own unpopular harness would say. PI.dev is widely used and quite popular.

[-]

mlhher@reddit

Of course it is popular. Just like OpenCode is popular. I am not trying to push anything right here (or did I tell you to use a specific harness?).

There is a long list of issues with these tools. If you are not even open to investigating what could be brought out of the models in reality then sure you do you.

[-]

sagiroth@reddit

Dont get me wrong but that sounds like a skill issue or wrong config.

[-]

OneSlash137@reddit

It’s actually a case of you don’t know enough about development to know how bad it is.

[-]

mlhher@reddit

If someone says "hey you might be wrong and it works fine for me" and your first pivot is to "YOU ARE BAD" I think I can see where the issue lies.

[-]

OneSlash137@reddit

I’m offering my observations as someone who actually has knowledge. The people defending it just don’t and it’s clear. No half competent developer would turn to qwen for even simple tasks.

[-]

mlhher@reddit

> who actually has knowledge.

So a dev who builds AI training math and high concurrency search APIs from scratch has no knowledge.

What I find funny is that you don't even know me and just pivoted to say "you have no knowledge" because I disagreed with your opinion lol.

[-]

OneSlash137@reddit

A dev who says a local model is capable is regarded. Sorry not sorry.

[-]

sagiroth@reddit

I have suspicion its a troll account. Any respectable dev would agree the models are capable to a degree. Sure not to replace entire workflow or let to steer the wheel like Claude code but its very close to at some point reach that level at least for web development from my experience

[-]

mlhher@reddit

> Sure not to replace entire workflow or let to steer the wheel like Claude code

I would actually go further than that and say yes they are lol. Maybe not for every task but definitely for most by now. But I agree with your point.

> I have suspicion its a troll account.

I think its just a human with an ego slightly too big (specifically for themselves).

[-]

Individual_Holiday_9@reddit

AI really brings out the early 2000s slashdot STEMlord vibe among a certain type of Reddit guy

[-]

sagiroth@reddit

Well, as a senior software engineer I can tell you its capable for its weight. Surely not comparable to 1T param cloud models but if you have defined tasks and well written tickets its more than enough. Sure you cant fire and forget like Claude code but if I was forced to use it I would be more than happy to continue

[-]

OneSlash137@reddit

Unless you’re as specific to define the exact logic every loop in the code should use, it will not write good code. Usable and working maybe. I already covered this.

Your point is moot.

[-]

sagiroth@reddit

I wouldn't agree it needs to be as exact as telling where to loop or call functions but if your codebase is well structured and you know which files are relevant I cant prove it but personally pretty happy with the output which again I wouldn't trust to vibe code but accelerate my work, sure.

[-]

thejacer@reddit (OP)

Idk why you're so rude, but it definitely feels like you're in the wrong sub. FYI I've met several people who did a job for their entire career but were shit at all the way to retirement. Have a good day!

[-]

OneSlash137@reddit

I know. Most of those people report to me… I’m used to carrying all the dead weight.

[-]

Lux_Interior9@reddit

You spent a life building a career just for it to be invalidated by AI and regular people. This happened when the newspapers went digital, and even to my friend, who was a videographer. I get why you're so upset. It sucks when amateurs start moving into your space and changing things, but here we are.

Have you chosen a new career path yet or do you plan on retiring?

[-]

mlhher@reddit

I am a dev implementing ML and AI algorithms from scratch and high concurrency payment/search APIs from scratch and I use specifically Qwen3.6-35B-A3B to code locally succesfully.

> People using it to code are coping.

People trying to use OpenCode, Claude Code for local models are rather out of the loop and have quite the missing knowledge of how AI works.

[-]

OneSlash137@reddit

I’m not lacking AI knowledge. You lack the development knowledge to see how bad it is.

[-]

mlhher@reddit

I am literally developing things autonomously with the Qwen model without guidance. There are no missed tool calls, no ambiguities no issues no nothing.

A more reasonable response would have been "prove it" if you do not believe me or "please explain" if you are open to learn.

Telling a guy who builds literal AI training and high concurrency search gateways he is lacking in dev knowledge is rather funny.

[-]

OneSlash137@reddit

I’m sorry you don’t know as much about making scalable apps as I do. Now go back to fiver to sell your next ai course.

[-]

mlhher@reddit

"Hey I think you are using this thing wrong"
"NO YOU ARE STUPID"
"I am literally doing the thing right now with it"
"NO YOU ARE WRONG AND STUPID"

very mature exchange.

[-]

OneSlash137@reddit

Sorry you don’t like the facts. Imaging spending money on hardware for qwen to avoid paying $20 a month for Claude lol.

Sorry but anyone that sees actual real world results from this model knows it’s worthless…. Unless you’re clueless. Then it’s gear because it knows more than you.

[-]

jopereira@reddit

I'm not a dev but I've been following Ai development since AlphaGo (2016). I instantly accepted Ray Kurzweil predictions about what AI will become.

It's very strange, but my guess in 90% of people using AI don't really understand what it is (now), what problems it is supposed to solve and what is still left outside it's reach (now).
I solved problems by just changing the prompt (same model, same agent) and, many times, just by changing the agent (same model, same prompt), as agents have their own system prompts.

So many we can do with local... we just need to be smart!

[-]

joost00719@reddit

If you have the right harness it's really capable.

[-]

OneSlash137@reddit

If you have the right harness it might not break. Nothing will make it capable.

[-]

mlhher@reddit

If you have the right harness Qwen3.5-35B-A3B already allowed for fully autonomous coding. With 3.6 it is just a slight step up further.

[-]

thejacer@reddit (OP)

That's cool. It's working fine for me. I was \~100 user messages into a session over my vacation and added a handful of working features to the meal planning app for my wife using ONLY 35b at Q4_1.

[-]

Accurate-Use-3427@reddit

Coolll

[-]

TestingTheories@reddit

This is a great post, can I ask what quant, if any, you are using?

[-]

thejacer@reddit (OP)

I’m using Q4_1

[-]

BringMeTheBoreWorms@reddit

For a simple and quick helper, if you’re not using some fancy planning or memory add on, add to the agents.md something like

‘After every change you make, append a log of what you changed to the end of changes.md’

You can have it write a script that it calls to do the appending so it doesnt waste tokens. It’s a simple change tracker that helps so much later on. I did this early on when I was mucking around.

Then at any stage it or you can look up what was changed, when and why for a full audit history.

[-]

vulgrin@reddit

Just remember that you need to review your tests. Even Claude code can cheerfully write 100% passing tests that do absolutely nothing or test the wrong things, or just simply return “true” with a todo to flesh out the test someday.

[-]

thejacer@reddit (OP)

Thank you for the heads up! I would probably be mostly lost trying to verify the tests but I'll check it out. Generally everything I've worked on has been just little fun projects for my personal amusement, I've got too much going on to try to scale an app or whatever actual devs do lol

[-]

vulgrin@reddit

Well the first tip might be to have a different model review the tests and explain them to you. If they match your expectations, cool. If they say TODO: add test someday, the LLM will probably tell you.

[-]

Uncle___Marty@reddit

Im not gonna lie, when qwen 3.6 models started dropping this sub became a flood of posts of people gushing about how amazing they were and it was good reading, its now a couple weeks(?) after they dropped and these posts are STILL appearing and im still loving it 😉

I'm running a 4 bit quant of 35BA3B and its so much fun to watch it work. What you said about frontier LLMs being way too entusiastic about making changes you didnt ask for, till recently I was only using gemini to code with and despite saying "DO NOT MAKE ANY CHANGES, just talk to me first and discuss my proposed changes" gemini would still think "Hmmmm, this code could be changed" and breaks everything. I've NEVER seen Qwen do anything like this, qwen seems to adore the planning stages more than the actual making changes part, I've even seen it plan stuff out and then ask "Is it ok to make these changes?" despite me previously telling it to make the changes.

My mind is still blown that AliBaba released some mid range models that have made a considerable amount of people cancel their subs to frontier models and choose a local model instead which works virtually the same (or better in some ways!) than a paid model.

What a time to be alive right? *high fives OP*

[-]

dreamai87@reddit

Man I was going through your first paragraph I thought you might in the end say, people are spamming qwen Yes this model is really great. It’s surprising me every next day. When I feel this task would sure difficult for qwen to pull off but this model keeps nailing it. I still see sometimes it ignores requests but it’s either lazy promoting or poor typing from my end.

[-]

Far_Cat9782@reddit

Woth the right system prompts/harness it is really sota. 27b is my coder and 35b is my every day.

[-]

phein4242@reddit

DevSecNetOps engineer: Out of all local models I tried, the qwen 3.5/3.6 models by far produce the best quality code, even when using lower quants.

Running qwen3.6-32b q4 on a rtx a6000 w 48gb vram.

[-]

YourNightmar31@reddit

There is no 32b, either 35 or 27 lol. You should be using qwen 3.6 27B Q5 or Q6 if you have that amount of vram.

[-]

qwen_next_gguf_when@reddit

I'm not a dev. I use qwen to produce code that has flaws and spam the internet to taint AI crawlers' result.

[-]

thejacer@reddit (OP)

I'm just having a blast tinkering with little projects I've thought of over the years lol. I intend to let it work on my Home Assistant dashboard and automations and I'm gonna see if it can add llama.cpp to the Ollama integration.

[-]

notlongnot@reddit

Keep having a blast ✊

[-]

qwen_next_gguf_when@reddit

It ollama works , then don't bother. I use llamacpp because I have multiple GPU with different speed and I serve multiple models for different purpose.

[-]

thejacer@reddit (OP)

I have 2xMi50 running two instances of Qwen3.6 35b serving three discord chat bots and servicing my smart home, opencode and a desktop tool that I had Gemini build for me. I won't use Ollama for anything lol.

[-]

-dysangel-@reddit

Yes. Writing out tests can be very helpful. Also using "plan mode" helps a lot to get things right and build context on what likely needs edited before you even start. If you just set the agent doing something, it often starts editing things before it gets the wider picture.

[-]

putrasherni@reddit

good

[-]

kmouratidis@reddit

I've used everything from gpt-3.5-turbo onwards for coding, including the first small dedicated coder models (Star, Santa, ...), bunch of Llama3.1-70B & Qwen2.5-72B, Mistral Small 24B, Qwen3+, Claude Sonnet & Opus 3.5/3.7/4.0/4.5/4.6, GLM 4.5/4.7/5, and probably more. There hasn't been a model that I couldn't use effectively for code assistance since 2022. People are debating over trivial things with their overinflated expectations instead of actually doing useful stuff.

[-]

LegitimateCopy7@reddit

the real disclosure should be that you only used LLM to create small components, likely something that can be done within a couple hundreds lines of code, with countless working examples online that have been practically hard baked into the LLM.