Qwen3.6 can code
Posted by Purple-Programmer-7@reddit | LocalLLaMA | View on Reddit | 33 comments
Got my 5th error on OpenAI models tonight and said “fuck it, let’s see how Qwen3.6-27b can do”.
Linked it up in opencode. Asked it to so some svelte 5.
Perfect result.
N=1 and it took longer than it would take the paid apis… the next 12 months will be quite interesting
gestapov@reddit
Im sorry am just a beginner in local LLM, but is opencode the same as openclaw? Local agent?
Purple-Programmer-7@reddit (OP)
OpenCode is a cli coding harness like ClaudeCode.
Different from OpenClaw which, to your point, is an autonomous agent.
Intelligent_Ice_113@reddit
that's how Chinese slowly killing OpenAI 🤫🥰
ranting80@reddit
GPT 5.4 is great for planning what you want to do. It thinks a lot better than Opus or Sonnet and critiques Claude a lot with stunning results. I'm definitely going to cancel my Claude sub and try Kimi 2.6 for comparison. Opus 4.7 is a massive let down. I love frontier models for planning. But now, with Qwen 3.6... all my coding is going local.
szansky@reddit
What GPU are u using ?
Fabulous_Fact_606@reddit
The Chinese needs to start building a something like a GTX 3090 1000G vram and sell for cheap Nvidia is cooked.
EggDroppedSoup@reddit
OpenAI isn't open at all
SnooPaintings8639@reddit
Can't tell if trolling or...
Dany0@reddit
Yes it is! It stands for Open your wallets. And Insides
kmp11@reddit
Next twelve month will possibly see local models shrink 50-90% if researchers can get technology like 1.58bit models and TurboQuant to work.
exaknight21@reddit
I feel like LLMs went from 1T to 500B to 300B then 200B to 100B then 70B now 27B all within what I can safely say feels like yesterday. So I think by the end of 2026 we have agentic 4B models doing dank stuff.
Can’t wait
3oclockam@reddit
What agent are people using for this? Anyone using Hermes for coding with qwen?
ranting80@reddit
Opencode works like butter.
Kodix@reddit
I'm using hermes a *little* bit for coding. With Qwen3.6-35B. Not directly - it's not what I really want from it - but earlier I made it autonomously fix the out of date OpenViking plugin and it just.. did that, fully, with almost no input from me. And now the OpenViking memory finally works properly.
So yeah, it's capable *enough*.
ranting80@reddit
I just bought a spark because of this. Models that can fit inside of this VRAM window and code everything I need to was a dream. Qwen 3.6 122b is the model I want to run on it when/if it comes out. Then I can pretty much leave the internet behind.
tuvok86@reddit
noob here: im testing locally on 4090 and cant get opencode to do as well as pi; apparently it's using lot more token because it's sending the thinking blocks back to the backend, but that's not needed?
SnooPaintings8639@reddit
I am not sure but I think pi does also send all the tokens (including thinking) back to the API. The difference is - pi keeps history as is, and opencode does some 'magic optimization' on the context. This means, the opencode breaks the cache quite often and cause the entire prompt to be reprocessed again, and again, which is slow.
OpenCode is great for external APIs, but I don't think it's best fit for local inference.
This is very much an opinion, and I'd be happy if someone explained me if I got something wrong here.
GroundbreakingMall54@reddit
yeah kv cache is a memory monster. fp8 helps but you still sacrifice context for vram. either batch smaller or just accept the limit tbh
Maximum-Wishbone5616@reddit
? 27b fits comfortable even on mini AI rig like 2x 5090 (FP8/KV16/262k context uses something around 60GB VRAM)
Potential-Leg-639@reddit
What the hell are you talking about? „mini AI rig like 2 x 5090“
Glittering-Call8746@reddit
Cheap electricity bill to have 2 5090 running all times. 😁
SummarizedAnu@reddit
Bro thinks cheap mini rig is even a 5090.
Most people here are either running rtx 3060 or below. Only a few people who are just rich or are trainings making all these quant /abliterated models for us are the ones with 2 4090s ,2 5090s etc.
met_MY_verse@reddit
My 16GB RX580 agrees.
SummarizedAnu@reddit
I used to use 16 gb ram and no gpu until 5 moths ago for 10 years .
dkarlovi@reddit
LegitimateCopy7@reddit
Perfect proof
Purple-Programmer-7@reddit (OP)
Not worth my time. If you don’t trust, all good.
LegacyRemaster@reddit
I was completing the merge between 2 scripts and Claude gave me this error. I started Qwen 3.6 27b q8 ---> corrected and fixed script. And it found some bugs that Claude had added. I asked Gemini Pro to evaluate the Qwen result and it said 100% ok. Today I'm also evaluating it with Minimax 2.7 Q4 local and it works very well... Just to better understand which workflow to use for validation. Whether 100% local or hybrid. Note: The error is clear: they tell you to use the API with Claudecode or VsCode and not chat. True. But LMstudio with Qwen's long context on an RTX 6000 96GB did the job "only" using chat.
CalligrapherFar7833@reddit
You asked gemini to check but its trash ask gpt/codex to check to actually surface issued
LegacyRemaster@reddit
surface? code issues.
CalligrapherFar7833@reddit
Surface issues means surface code issues
GroundbreakingMall54@reddit
yeah 120k feels tight but thats just how fp8 vllm works. kv cache chews through vram fast. either drop batch size or bite the bullet and use less context
cr0wburn@reddit
It has 260k context! What are you on about!