GGUF Quants Arena for MMLU (24GB VRAM + 128GB RAM)

Posted by qwen_next_gguf_when@reddit | LocalLLaMA | View on Reddit | 8 comments

Dataset: MMLU subset (DEV+TEST)

Llamacpp setting: 3 params only ctx 8192 , seed 42 , fa on

Let me know whatelse do you want to see. Thanks.

Results:

Qwen3.5-27B-UD-Q5_K_XL.gguf 87.33% 12263/14042

Qwen3.5-27B-UD-Q4_K_XL.gguf 87.25% 12252/14042

Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled.i1-Q4_K_M.gguf 87.02% 12220/14042

Qwen3-Coder-Next-UD-Q4_K_XL.gguf 84.38% 11849/14042

Qwen3.5-35B-A3B-Uncensored-HauhauCS-Aggressive-Q4_K_M.gguf 83.25% 11690/14042

Qwen3.5-9B-UD-Q8_K_XL.gguf 78.81% 11067/14042

gemma-4-31B-it-UD-Q4_K_XL.gguf 78.36% 11004/14042 errors=1

Qwen3.5-397B-A17B-UD-IQ2_XXS-00001-of-00004.gguf 65.80% 9239/14042

[-]

ambient_temp_xeno@reddit

Why rawdog the parameters? What a waste of time.

qwen_next_gguf_when@reddit (OP)

Please advise the best parameters on 27B if you like.

qwen 3.5 27b: --top-p 0.95 --temp 0.6 --top-k 20 --min-p 0.0

gemma 4 31b: --top-p 0.95 --temp 1.0 --top-k 64 --min-p 0.0

The min-p 0.0 isn't optional, otherwise it defaults to 0.05 which is wrong.

Velocita84@reddit

That's just the parameters recommended by qwen and google, you don't need to follow them like they're gospel

Well. I mean having them as a base for testing is probably a good idea. It's also going to get eyerolls when people use weird settings and then have complaints.

Will re-test and add to the list. Thanks.

New_Comfortable7240@reddit

Please try qwen3.5-35B but not distilled, as there is the theory distilled won't translate to better performance

akumaburn@reddit

In benchmarks maybe but in actual usage it is my experience that distilled models perform much better.