Is Qwen3.6 current king for local agentic use?
Posted by HornyGooner4402@reddit | LocalLLaMA | View on Reddit | 25 comments
I've been testing other models but it seems like nothing even come close to Qwen3.6 35B A3B for agentic use. The worse I'd get is a loop sometimes, while Gemma4 produced broken tool calls occasionally and I couldn't even get GLM 4.7 Flash REAP past 2 or 3 messages before it starts looping. All IQ4_NL quants from Unsloth.
I'm wondering if there are better models around the same size (preferably MoE) that I haven't tried yet. I'm using it for Hermes Agent and Pi and it's not perfect, but it's crazy good for a local model
the-username-is-here@reddit
Qwen 3.5 122B is much better.
Zc5Gwu@reddit
How so? In benchmarks 27b comes ahead (I’m curious).
tarruda@reddit
Still hoping for 3.6 122B
hurdurdur7@reddit
27b q6 club
Jipok_@reddit
> Gemma4 produced broken tool calls
template problem
LoveMind_AI@reddit
For me, Gemma 4 31B is the best model, but I'm not using it for coding tasks.
jarec707@reddit
SuperGemma4 is worth checking out from what I read
Potential-Leg-639@reddit
Yes
HVACcontrolsGuru@reddit
Qwen is better at coding while I find Gemma better for general user facing. I use both and fine tune both as well!
Big hidden issue is the chat templates cause issues. I redid both the Qwen and Gemma ones for better agentic coding and tool calling fixes. Depending on how you use them there are some weird app side things to take into account with the default chat templates.
sir-draknor@reddit
Can you share more about the changes you made to the chat templates? I’m curious to see what you changed!
HVACcontrolsGuru@reddit
Gemma 4: Gemma 4 Chat Template Gist
I need to clean my private work off my Qwen one. Pushed more towards tool calling agentic workflows and coding. Use these for enterprise stuff. Still need to battle test these more today but the fixes are well documented!
silentus8378@reddit
I am using the 2bit version from byteshape with opencode and the tool calling is still solid.
DinoAmino@reddit
10/10 astroturfers agree.
TripleSecretSquirrel@reddit
Do you have an alternate suggestion? I’ve been very pleased with Qwen 3.6 27b, but if there’s a legitimately better option for <32gb of VRAM, I’d love to hear it.
twack3r@reddit
Of course not but for small models, Qwen3.6 27B and 35BA3 are the right choice at the moment.
Local coding king is GLM5.1
the_derby@reddit
> but most users find that too large to run locally.
yeah, I'm a little short of memory. =)
exaknight21@reddit
I personally have tried 27B Q4 KXL from Unsloth and 35B-A3B, the MOE is faster and winner imho. I wish my Mi50 had faster prompt processing, 27B takes hella long but pulls through eventually. I am running on obsolete hardware though so could be that. In any event, I <3 the Qwen team and that’s all that matters.
jacek2023@reddit
You forgot Nemotrons and Devstral
tiebird@reddit
Check out the nemotron 3 family, they are very good for a lot of use cases and decent spped
RANDVR@reddit
In my experience yes. Nothing else comes close to qwen out of all local models I tested.
Creative-Type9411@reddit
its crazy that its also faster
ComfyUser48@reddit
Yes. I believe it's not even THAT far from DeepSeek v4 Flash.
j_tb@reddit
Only if DS4 isn’t an option on your hardware.
Snoo_81913@reddit
Currently this is the best local MoE model in its weight.
LeMochileiro@reddit
Yes