Qwen 3.6 27B kick balls
Posted by Character_Split4906@reddit | LocalLLaMA | View on Reddit | 15 comments
This is more of a quick appreciation post for Qwen 3.6 27B running locally (8-bit unsloth quant).
I've been using it mainly alongside my 35B model in OpenCode for planning and coding. I also had it set up in Open WebUI, but until MTP support came about two weeks ago, the TPS was so painfully slow on OWUI that it was basically unusable for chat. Since then, I paired them together and have been using Qwen 27B as a daily chat assistant alongside Gemini Pro.
I've been keeping a running mental comparison between the two. For straightforward questions, Gemini handles things fine. But over the weekend I dove into some career advice and company portfolio deep dives, plus some immigration research. Gemini completely fell apart on this. It started hallucinating and fixating on stuff based on earlier messages in the conversation. I think this degradation have started to happen over last couple of weeks or so, wanted to know others experience with gemini lately.
I ended up doing a lot of manual research myself. Then I decided to try same research with Qwen 3.6 27B. I was genuinely surprised by how much better it performed on both the career/company stuff and the immigration research. The immigration results really stood out because it had to actually go through official documentation and make sense of it rather than just regurgitating something.
Side note: I've also tried Gemma 4 31B, which is great for research and planning, but it's just too slow on my M5 Max with 128GB with 8 bit quant. Curious to know folks opinion here on that and maybe once MTP is enabled for that I will try it.
DieselKraken@reddit
Agreed it wins.
poy_esp@reddit
How are you running your 27b model? I'm using Qwen3.6 35b Q4 A3B on llama.cpp on a 3080 + 6800xt but when I try a 27b model, the performance is really slow.
Any idea on where I am going wrong?
Character_Split4906@reddit (OP)
I am running on my M5 max 128gb with mtp support and 8 bit kv cache. For mtp I am keeping it at draft-n-max 4.
poy_esp@reddit
Nice! Do you notice any difference in acceptance rates between n-max 3 and 4?
nonlinearsystems@reddit
I don’t expect you to read this article but I’ve been doing a deep dive on the best local models for my stack. Highly encourage you to try Qwen3-Coder-Next-6bit-MLX
https://echalupa.com/blog/local-llm-benchmark-mac-studio-m3-ultra
Character_Split4906@reddit (OP)
For some reason, I dont find mlx models working for me in terms of performance. I found mlx quants get stuck in loop or fail with tool calling more often with omlx than gguf with llama.cpp. Also the tps is almost similar infact llama.cpp sometimes outperforms!
nonlinearsystems@reddit
It is wild how different performance and benchmarks can differ from person to person! 27b is def a great model but I do think Qwen3 Coder Next is a bit slept on 😅
RedParaglider@reddit
qwen 3 coder next is amazing. I wish we could get another model like that, or a 120/10
nonlinearsystems@reddit
That would be perfect! I do think it’s awesome that we get all these new models to test with. That said, I think some of these “legacy” models have some untapped juice. At least for me personal stack, tool use is my biggest priority.
dinerburgeryum@reddit
Yeah? It was my daily for a while, but I always felt like you could feel that it was a little underbaked. Now that we’re here tho maybe I’ll give it another swing.
nonlinearsystems@reddit
Well it does depend a lot on your harness and what you are doing with the model. I use a pretty detailed memory system that keeps my models in check. At which point, I need a model that is great at tooling and coding. That is what makes running local modes fun though, there are so many different variables.
bdixisndniz@reddit
Gemini is the hallucination king
Character_Split4906@reddit (OP)
Yeah unfortunately, chatgpt is equally bad and sometimes worse. Not sure how but it seems in last 4-6 weeks both chatgpt and gemini have dropped in quality.
hurdurdur7@reddit
27B is a solid assistant.
Character_Split4906@reddit (OP)
Yeah I am genuinely impressed and happy with some of the work its able to pull it off. OUI has been a bit of PITA sometimes for tool calling though it has improved in the latest release but still keeps you wanting for more lol