TheaterFire

Qwen3.5: 122B-A10B at IQ1 or 27B at Q4?

Posted by Borkato@reddit | LocalLLaMA | View on Reddit | 32 comments

Genuine question. I keep trying to push what my 3090 can do 😂

Reply to Post

32 Comments

chris_0611@reddit

I'm doing 122B-A10B Q4 right now 450T/s PP and 15T/s TG. RTX3090 + 14900K 96GB DDR5 6800
View on Reddit #79218809

Monad_Maya@reddit

Why not something with higher bpw? Q6 can be a bit better at coding related stuff or so has been my experience with other models. Obviously some models quantize better than the others.
View on Reddit #79227355

chris_0611@reddit

Yeah Q5 still fits. Big jump to 80GB RAM (so not much left for other software etc running at the same time) and PP drops quite a bit to 330T/s
View on Reddit #79229009

Monad_Maya@reddit

Facing a similar dilemma between unsloth/MiniMax-M2.5-GGUF (UD_Q4_K_XL) and AesSedai/MiniMax-M2.5-GGUF (IQ4_XS).
View on Reddit #79229615

overand@reddit

Go team "DDR4 Prices are a bit less crazy, this less-current system is actually MORE useful to me now!" Kinda a bad team name, we can workshop it.
View on Reddit #79998629

Monad_Maya@reddit

High capacity and fast DDR5 unfortunately requires a top tier motherboard platform. DDR4 runs ok even on the cheaper motherboards. Consumer hardware is ages behind their server counterparts.
View on Reddit #80122290

chris_0611@reddit

Won't fit in 24GB VRAM and 96GB ram with full context. I could try maybe Q5. But now with IQ4 (all MOE layers on CPU, but with maximum 256K context), I'm at 21.4GB VRAM and 64.9GB RAM. But still it's fast enough to be actually useable (500T/s PP and 20T/s TG). Yeah I could try maybe Q5
View on Reddit #79227592

gtrak@reddit

using llama.cpp? I had some issues with context truncation
View on Reddit #79296884

Borkato@reddit (OP)

Ugh, I really need to check my ram setup! I think I’m missing a lot there.
View on Reddit #79219579

overand@reddit

Main thing, if it's a desktop especially - go into the BIOS and enable the XMP support for your memory modules. YOu might see a solid 10-20% improvement in memory bandwidth, if yiur ram's not running at its rated speed but instead the "fallback / safe" default
View on Reddit #79998683

Borkato@reddit (OP)

Thank you for this! Though it turns out it’s just because I’m running DDR4 lol
View on Reddit #79999537

jacek2023@reddit

My problem is amount of thinking
View on Reddit #79230159

Borkato@reddit (OP)

Same, it’s SUPER verbose.
View on Reddit #79232946

LicensedTerrapin@reddit

3090+64gb here. 122b with 32k context gets 20-25 TKS.
View on Reddit #79251030

Prudent_Appearance71@reddit

>
View on Reddit #79292811

megadonkeyx@reddit

for coding i wouldnt touch anything below a q8
View on Reddit #79279549

SectionCrazy5107@reddit

why is 27B so slower than 35B MOE models even when fully fit within VRAM?
View on Reddit #79247660

No_Swimming6548@reddit

Perhaps the answer lies in 27b active vs 3b active parameters
View on Reddit #79250014

SectionCrazy5107@reddit

OK but i definitely see better quality with 27B. also, vs 3080 20GB, Q6 is giving approx 6 t/s, I see 20+ t/s on a V100, which was surprising too. 35B is just flying at 80 t/s on both
View on Reddit #79250541

HyperWinX@reddit

Try 35B A3B too! Its hella cool. Try using IQ3_XXS quant
View on Reddit #79248068

sine120@reddit

The 122B and 35B didn't bench far from each other, I'd guess you'll get a lot less mileage from a Q1.
View on Reddit #79232757

Borkato@reddit (OP)

That’s kinda insane actually, wow!
View on Reddit #79232902

guiopen@reddit

27b, it fits nicely on your GPU and the benchmarks put it very close to the 122b one
View on Reddit #79230307

Schlick7@reddit

Toss RAM offloading for a high quant into the mix. Probably not much performance difference compared to the 27b
View on Reddit #79217040

Borkato@reddit (OP)

Hmmm, true, like IQ3 with like 30GB in ram memory? 😂 The only thing I hate about that is prompt processing speed!
View on Reddit #79217090

Schlick7@reddit

Is it really that much worse than running a 27b dense? Guess ive never actually compared it
View on Reddit #79218620

jacek2023@reddit

27B is almost unusable on my setup (3x3090)
View on Reddit #79224262

MrMisterShin@reddit

Really? You should have about 72GB VRAM. You should comfortably fit the model and run at great t/s.
View on Reddit #79229541

jacek2023@reddit

Not many people use here models locally, I tried all three qwens today
View on Reddit #79229752

Borkato@reddit (OP)

From my earlier tests it does tend to be, but maybe I did something wrong. I’m gonna try it lol
View on Reddit #79218653

Schlick7@reddit

Nice. Report back
View on Reddit #79218957

Monad_Maya@reddit

Try the 35B with CPU offloading. If sticking to the ones mentioned in the title then the 27B Q4 easily. In general, I find anything under Q4 (and occasionally even Q4) to be a bit unreliable.
View on Reddit #79227253