Qwen 3.6 35b a3b Q4 vs qwen 3.6 27b q6, on m5 pro 64gb
Posted by skyyyy007@reddit | LocalLLaMA | View on Reddit | 28 comments
Tried to test the two versions of models in my own m5 pro 64, curated the results on claude, not an expert so settings/config might not be the best. do share what results or improvements that can be attempted. test prompts were generated in claude for testing purposes.
Qwen3.6 35B A3B vs 27B UD — M5 Pro 64GB benchmark
Hardware: MacBook Pro M5 Pro 18-core · 64GB unified memory · LM Studio · MLX runtime · thinking OFF (/no_think) · 128K context
Specs
| 35B A3B MLX 4bit | 27B UD MLX 6bit | |
|---|---|---|
| Model size | \~21.7GB | \~30.5GB |
| Architecture | MoE — 3B active/token | Dense — 27B active/token |
| RAM at 128K ctx | \~27GB | \~38GB |
Speed
| Test | 35B A3B | 27B UD |
|---|---|---|
| 800 token test | \~72 tok/s · 11s | \~9 tok/s · 32s |
| 1200 token test | \~70 tok/s · 16s | \~9 tok/s · 70s |
| Advantage | 8x faster | baseline |
Intelligence — 4-task coding benchmark
| Task | 35B A3B | 27B UD |
|---|---|---|
| Auth hook (useRequireAuth) | 9.5/10 — typed, mounted cleanup | 8/10 — used any, no cleanup |
| Conflict resolution (500ms rules) | 10/10 | 10/10 |
| Delete account (ordered ops) | 10/10 | 10/10 |
| Bug identification (syncBatch) | 10/10 — found 3 bugs + improvements | 7/10 — found 1 bug |
| Overall | 9.8/10 | 8.75/10 |
Test prompt: 4 coding tasks · max_tokens 1200 · temp 0.6 · /no_think system prompt
Verdict: 35B A3B wins on both speed and quality for coding tasks on 64GB Apple Silicon. 27B is slower (8x) and didn't demonstrate the reasoning depth advantage expected from a dense model on these tasks.
wanted to have some number/references when i was looking for mac to get, hopefully this helps someone out there.
CornerLimits@reddit
27B is a stronger model, here with a stronger quantization wrt the 35B.. so this makes me think the whole bench is biased.
I prefer user experience posts than these “benches” that pretend to test the model on something that has been written by claude and lead to spread bad info.
skyyyy007@reddit (OP)
To be fair, I used this on my own work and I would prefer to read other people's posts on this. But there just isn't any out there with these constraints.
Based of all the other benchmarks out there, it is evident from official benchmarks that 27B is stronger when you are taking the full size, I only based off my own use case with my own specs, seeking for advices and tips.
CornerLimits@reddit
I mean that maybe your usecase was a better bench than asking claude to do that, because it is a real test!
skyyyy007@reddit (OP)
Definitely, i would be looking out for more benchmarks to test, do let me know if you have any 🙏🏻
Long_comment_san@reddit
Dense at higher quant should absolutely freaking slap MOE in quality. 35b MOE quantized to Q4 shouldn't hold a candle to Q6 dense.
skyyyy007@reddit (OP)
I thought so too, will be looking for more benchmarks to test this further, do share if you have any 🙏🏻
havnar-@reddit
Why not the full 8bit? That 0.5 tps doesn’t make much of a difference.
I have both a3b and 27b that I work with. One for speed one for accuracy.
A nice test is this one:
‘create animated version of our universe and with a sliding bar at the bottom, when I move that sliding bar, the size of sun increases or decreases, with it show the effect on other planet's orbital movement or what else is effected as numbers.’
skyyyy007@reddit (OP)
I tried to use the 8bit one for the a3b, but the ram usage was really edging when i tried to run some coding prompts with just opencode + lm studio running.
Figured that take a reduced one so that I am still able to use the mac with 1 chrome tab or something.
What context window do you have on? I'll give the 27b 8bit a try and post on new results, with that prompt too 👍🏻
Temporary-Roof2867@reddit
Bro, why did you test the 35B at Q4 against the 27B at Q6?
In general, MoEs with small quantizations tend to degrade more than dense models. Sure, the Qwen3.6 series of models is special, but let's at least make them compete on equal terms with the same quantization.
I tested the Qwen3.6-27B model at IQ_M from unsloth, and against all my expectations, it managed to do things that much larger models can only dream of. The Qwen3.6-27B is a magical model, but it requires a lot of VRAM to use it.
Just_Maintenance@reddit
35b uses more memory than 27b
Low-Boysenberry1173@reddit
Did you read his comment? Q4 vs Q6.
imp_12189@reddit
He did, and said that OP compare diff quads cause they use the same memory amount. 24 GB vs 23 GB. This is the whole point why OP compared those so people with the limited memory could choose.
Free-Combination-773@reddit
35b uses way less memory for kv cache though
imp_12189@reddit
It's not about the actual usage, he is simply looked at the unsloth table and took models that match the size. It's not a rocket science, it's a table: https://unsloth.ai/docs/models/qwen3.6
Free-Combination-773@reddit
What a terrible table. They really should have tables for different context limits
skyyyy007@reddit (OP)
Finally someone gets it 🙏🏻
redblood252@reddit
It’s moe so can be significantly faster even if all of it doesn’t fit on vram. I have tried both on 16gb vram gpu. Quantized at ud-q6 xl. 35b gives 26 tps. 26b gives 0.5tps
Ell2509@reddit
They are struggling to understand you. Mind grasping like greasy finger on an inflated balloon.
PaceZealousideal6091@reddit
Bro.. he's using apple silicon! It's unified memory.
ComfyUser48@reddit
The 27b version, for me, is a lot better for my work. Agentic coding in a large codebase. I'm on rtx 5090 and getting 45-60 tok/sec, depends on which quant I load.
MK_L@reddit
Same. Im trying to figure out which model this is comparable to on the frontier side. It seems as capable as older claude/codex. Trying to find a test i can run to really test 27b vs the others
StardockEngineer@reddit
35b does not beat 27b on quality, come on.
Temporary-Roof2867@reddit
35B is a beautiful model but 27B is much superior that's a fact
MasterLJ@reddit
35B A3B is a Mixture of Experts which only has 3B active parameters to whichever expert is selected. 27B is better at coding my a country mile. 35B A3B can generate tokens much faster though.
I'm unsure of how you set up or verified your coding tests as most other benchmarks show that 27B is significantly better.
I'm getting what feels to be Opus 4.6 or at least Sonnet 4.6 results from a tuned Qwen 27B running on an H100 at \~140 t/s.
It's getting tasks done to my liking (codewise) and finding issues that even Opus 4.7 Extra High missed (and Opus 4.6 too)
JonDowSmith@reddit
Qwen 3.6 35b is MoE. 27b is dense. Different quantizations? Comparing an apple with a chicken. Makes no sense.
aigemie@reddit
27B is just too slow sign
pulse77@reddit
Your coding benchmark seems too weak... Please do more tests - especially those, where one of the both models fail (if both models pass the test may be too simple)...
skyyyy007@reddit (OP)
Definitely, i'll be looking for more tests to attempt for sure