Qwen3.5: 122B-A10B at IQ1 or 27B at Q4?

[-]

chris_0611@reddit

I'm doing 122B-A10B Q4 right now 450T/s PP and 15T/s TG. RTX3090 + 14900K 96GB DDR5 6800

Reply

[-]

Monad_Maya@reddit

Why not something with higher bpw? Q6 can be a bit better at coding related stuff or so has been my experience with other models. Obviously some models quantize better than the others.

Reply

[-]

chris_0611@reddit

Yeah Q5 still fits. Big jump to 80GB RAM (so not much left for other software etc running at the same time) and PP drops quite a bit to 330T/s

Reply

[-]

Monad_Maya@reddit

Facing a similar dilemma between unsloth/MiniMax-M2.5-GGUF (UD_Q4_K_XL) and AesSedai/MiniMax-M2.5-GGUF (IQ4_XS).

Reply

[-]

overand@reddit

Go team "DDR4 Prices are a bit less crazy, this less-current system is actually MORE useful to me now!" Kinda a bad team name, we can workshop it.

Reply

[-]

Monad_Maya@reddit

High capacity and fast DDR5 unfortunately requires a top tier motherboard platform. DDR4 runs ok even on the cheaper motherboards. Consumer hardware is ages behind their server counterparts.

Reply

[-]

Won't fit in 24GB VRAM and 96GB ram with full context. I could try maybe Q5. But now with IQ4 (all MOE layers on CPU, but with maximum 256K context), I'm at 21.4GB VRAM and 64.9GB RAM. But still it's fast enough to be actually useable (500T/s PP and 20T/s TG). Yeah I could try maybe Q5

Reply

[-]

gtrak@reddit

using llama.cpp? I had some issues with context truncation

Reply

[-]

Borkato@reddit (OP)

Ugh, I really need to check my ram setup! I think I’m missing a lot there.

Reply

[-]

overand@reddit

Main thing, if it's a desktop especially - go into the BIOS and enable the XMP support for your memory modules. YOu might see a solid 10-20% improvement in memory bandwidth, if yiur ram's not running at its rated speed but instead the "fallback / safe" default

Reply

[-]

Borkato@reddit (OP)

Thank you for this! Though it turns out it’s just because I’m running DDR4 lol

Reply

[-]

jacek2023@reddit

My problem is amount of thinking

Reply

[-]

Borkato@reddit (OP)

Same, it’s SUPER verbose.

Reply

[-]

LicensedTerrapin@reddit

3090+64gb here. 122b with 32k context gets 20-25 TKS.

Reply

[-]

Prudent_Appearance71@reddit

>

Reply

[-]

megadonkeyx@reddit

for coding i wouldnt touch anything below a q8

Reply

[-]

SectionCrazy5107@reddit

why is 27B so slower than 35B MOE models even when fully fit within VRAM?

Reply

[-]

No_Swimming6548@reddit

Perhaps the answer lies in 27b active vs 3b active parameters

Reply

[-]

SectionCrazy5107@reddit

OK but i definitely see better quality with 27B. also, vs 3080 20GB, Q6 is giving approx 6 t/s, I see 20+ t/s on a V100, which was surprising too. 35B is just flying at 80 t/s on both

Reply

[-]

HyperWinX@reddit

Try 35B A3B too! Its hella cool. Try using IQ3_XXS quant

Reply

[-]

sine120@reddit

The 122B and 35B didn't bench far from each other, I'd guess you'll get a lot less mileage from a Q1.

Reply

[-]

Borkato@reddit (OP)

That’s kinda insane actually, wow!

Reply

[-]

guiopen@reddit

27b, it fits nicely on your GPU and the benchmarks put it very close to the 122b one

Reply

[-]

Schlick7@reddit

Toss RAM offloading for a high quant into the mix. Probably not much performance difference compared to the 27b

Reply

[-]

Borkato@reddit (OP)

Hmmm, true, like IQ3 with like 30GB in ram memory? 😂 The only thing I hate about that is prompt processing speed!

Reply

[-]

Schlick7@reddit

Is it really that much worse than running a 27b dense? Guess ive never actually compared it

Reply

[-]

jacek2023@reddit

27B is almost unusable on my setup (3x3090)

Reply

[-]

MrMisterShin@reddit

Really? You should have about 72GB VRAM. You should comfortably fit the model and run at great t/s.

Reply

[-]

jacek2023@reddit

Not many people use here models locally, I tried all three qwens today

Reply

[-]

Borkato@reddit (OP)

From my earlier tests it does tend to be, but maybe I did something wrong. I’m gonna try it lol

Reply

[-]

Schlick7@reddit

Nice. Report back

Reply

[-]

Monad_Maya@reddit

Try the 35B with CPU offloading. If sticking to the ones mentioned in the title then the 27B Q4 easily. In general, I find anything under Q4 (and occasionally even Q4) to be a bit unreliable.

Reply

Qwen3.5: 122B-A10B at IQ1 or 27B at Q4?

Reply to Post

32 Comments

chris_0611@reddit

Monad_Maya@reddit

chris_0611@reddit

Monad_Maya@reddit

overand@reddit

Monad_Maya@reddit

chris_0611@reddit

gtrak@reddit

Borkato@reddit (OP)

overand@reddit

Borkato@reddit (OP)

jacek2023@reddit

Borkato@reddit (OP)

LicensedTerrapin@reddit

Prudent_Appearance71@reddit

megadonkeyx@reddit

SectionCrazy5107@reddit

No_Swimming6548@reddit

SectionCrazy5107@reddit

HyperWinX@reddit

sine120@reddit

Borkato@reddit (OP)

guiopen@reddit

Schlick7@reddit

Borkato@reddit (OP)

Schlick7@reddit

jacek2023@reddit

MrMisterShin@reddit

jacek2023@reddit

Borkato@reddit (OP)

Schlick7@reddit

Monad_Maya@reddit