Qwen 3.6 35b a3b Q4 vs qwen 3.6 27b q6, on m5 pro 64gb

Posted by skyyyy007@reddit | LocalLLaMA | View on Reddit | 28 comments

Tried to test the two versions of models in my own m5 pro 64, curated the results on claude, not an expert so settings/config might not be the best. do share what results or improvements that can be attempted. test prompts were generated in claude for testing purposes.

Qwen3.6 35B A3B vs 27B UD — M5 Pro 64GB benchmark

Hardware: MacBook Pro M5 Pro 18-core · 64GB unified memory · LM Studio · MLX runtime · thinking OFF (/no_think) · 128K context

Specs

	35B A3B MLX 4bit	27B UD MLX 6bit
Model size	\~21.7GB	\~30.5GB
Architecture	MoE — 3B active/token	Dense — 27B active/token
RAM at 128K ctx	\~27GB	\~38GB

Speed

Test	35B A3B	27B UD
800 token test	\~72 tok/s · 11s	\~9 tok/s · 32s
1200 token test	\~70 tok/s · 16s	\~9 tok/s · 70s
Advantage	8x faster	baseline

Intelligence — 4-task coding benchmark

Task	35B A3B	27B UD
Auth hook (useRequireAuth)	9.5/10 — typed, mounted cleanup	8/10 — used any, no cleanup
Conflict resolution (500ms rules)	10/10	10/10
Delete account (ordered ops)	10/10	10/10
Bug identification (syncBatch)	10/10 — found 3 bugs + improvements	7/10 — found 1 bug
Overall	9.8/10	8.75/10

Test prompt: 4 coding tasks · max_tokens 1200 · temp 0.6 · /no_think system prompt

Verdict: 35B A3B wins on both speed and quality for coding tasks on 64GB Apple Silicon. 27B is slower (8x) and didn't demonstrate the reasoning depth advantage expected from a dense model on these tasks.

wanted to have some number/references when i was looking for mac to get, hopefully this helps someone out there.

[-]

skyyyy007@reddit (OP)

To be fair, I used this on my own work and I would prefer to read other people's posts on this. But there just isn't any out there with these constraints.

Based of all the other benchmarks out there, it is evident from official benchmarks that 27B is stronger when you are taking the full size, I only based off my own use case with my own specs, seeking for advices and tips.

[-]

CornerLimits@reddit

I mean that maybe your usecase was a better bench than asking claude to do that, because it is a real test!

[-]

skyyyy007@reddit (OP)

Definitely, i would be looking out for more benchmarks to test, do let me know if you have any 🙏🏻

[-]

skyyyy007@reddit (OP)

I thought so too, will be looking for more benchmarks to test this further, do share if you have any 🙏🏻

[-]

skyyyy007@reddit (OP)

I tried to use the 8bit one for the a3b, but the ram usage was really edging when i tried to run some coding prompts with just opencode + lm studio running.

Figured that take a reduced one so that I am still able to use the mac with 1 chrome tab or something.

What context window do you have on? I'll give the 27b 8bit a try and post on new results, with that prompt too 👍🏻

[-]

skyyyy007@reddit (OP)

Finally someone gets it 🙏🏻

[-]

skyyyy007@reddit (OP)

Definitely, i'll be looking for more tests to attempt for sure

27B is a stronger model, here with a stronger quantization wrt the 35B.. so this makes me think the whole bench is biased.

I prefer user experience posts than these “benches” that pretend to test the model on something that has been written by claude and lead to spread bad info.

Long_comment_san@reddit

Dense at higher quant should absolutely freaking slap MOE in quality. 35b MOE quantized to Q4 shouldn't hold a candle to Q6 dense.

havnar-@reddit

Why not the full 8bit? That 0.5 tps doesn’t make much of a difference.

I have both a3b and 27b that I work with. One for speed one for accuracy.

A nice test is this one:

‘create animated version of our universe and with a sliding bar at the bottom, when I move that sliding bar, the size of sun increases or decreases, with it show the effect on other planet's orbital movement or what else is effected as numbers.’

Temporary-Roof2867@reddit

Bro, why did you test the 35B at Q4 against the 27B at Q6?

In general, MoEs with small quantizations tend to degrade more than dense models. Sure, the Qwen3.6 series of models is special, but let's at least make them compete on equal terms with the same quantization.

I tested the Qwen3.6-27B model at IQ_M from unsloth, and against all my expectations, it managed to do things that much larger models can only dream of. The Qwen3.6-27B is a magical model, but it requires a lot of VRAM to use it.

Just_Maintenance@reddit

35b uses more memory than 27b

Low-Boysenberry1173@reddit

Did you read his comment? Q4 vs Q6.

imp_12189@reddit

He did, and said that OP compare diff quads cause they use the same memory amount. 24 GB vs 23 GB. This is the whole point why OP compared those so people with the limited memory could choose.

Free-Combination-773@reddit

35b uses way less memory for kv cache though

It's not about the actual usage, he is simply looked at the unsloth table and took models that match the size. It's not a rocket science, it's a table: https://unsloth.ai/docs/models/qwen3.6

What a terrible table. They really should have tables for different context limits

redblood252@reddit

It’s moe so can be significantly faster even if all of it doesn’t fit on vram. I have tried both on 16gb vram gpu. Quantized at ud-q6 xl. 35b gives 26 tps. 26b gives 0.5tps

Ell2509@reddit

They are struggling to understand you. Mind grasping like greasy finger on an inflated balloon.

PaceZealousideal6091@reddit

Bro.. he's using apple silicon! It's unified memory.

ComfyUser48@reddit

The 27b version, for me, is a lot better for my work. Agentic coding in a large codebase. I'm on rtx 5090 and getting 45-60 tok/sec, depends on which quant I load.

MK_L@reddit

Same. Im trying to figure out which model this is comparable to on the frontier side. It seems as capable as older claude/codex. Trying to find a test i can run to really test 27b vs the others

StardockEngineer@reddit

35b does not beat 27b on quality, come on.

35B is a beautiful model but 27B is much superior that's a fact

MasterLJ@reddit

35B A3B is a Mixture of Experts which only has 3B active parameters to whichever expert is selected. 27B is better at coding my a country mile. 35B A3B can generate tokens much faster though.

I'm unsure of how you set up or verified your coding tests as most other benchmarks show that 27B is significantly better.

I'm getting what feels to be Opus 4.6 or at least Sonnet 4.6 results from a tuned Qwen 27B running on an H100 at \~140 t/s.

It's getting tasks done to my liking (codewise) and finding issues that even Opus 4.7 Extra High missed (and Opus 4.6 too)