GGUF Quants Arena for MMLU (24GB VRAM + 128GB RAM)
Posted by qwen_next_gguf_when@reddit | LocalLLaMA | View on Reddit | 8 comments
Dataset: MMLU subset (DEV+TEST)
Llamacpp setting: 3 params only ctx 8192 , seed 42 , fa on
Let me know whatelse do you want to see. Thanks.
Results:
Qwen3.5-27B-UD-Q5_K_XL.gguf 87.33% 12263/14042
Qwen3.5-27B-UD-Q4_K_XL.gguf 87.25% 12252/14042
Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled.i1-Q4_K_M.gguf 87.02% 12220/14042
Qwen3-Coder-Next-UD-Q4_K_XL.gguf 84.38% 11849/14042
Qwen3.5-35B-A3B-Uncensored-HauhauCS-Aggressive-Q4_K_M.gguf 83.25% 11690/14042
Qwen3.5-9B-UD-Q8_K_XL.gguf 78.81% 11067/14042
gemma-4-31B-it-UD-Q4_K_XL.gguf 78.36% 11004/14042 errors=1
Qwen3.5-397B-A17B-UD-IQ2_XXS-00001-of-00004.gguf 65.80% 9239/14042
ambient_temp_xeno@reddit
Why rawdog the parameters? What a waste of time.
qwen_next_gguf_when@reddit (OP)
Please advise the best parameters on 27B if you like.
ambient_temp_xeno@reddit
qwen 3.5 27b: --top-p 0.95 --temp 0.6 --top-k 20 --min-p 0.0
gemma 4 31b: --top-p 0.95 --temp 1.0 --top-k 64 --min-p 0.0
The min-p 0.0 isn't optional, otherwise it defaults to 0.05 which is wrong.
Velocita84@reddit
That's just the parameters recommended by qwen and google, you don't need to follow them like they're gospel
ambient_temp_xeno@reddit
Well. I mean having them as a base for testing is probably a good idea. It's also going to get eyerolls when people use weird settings and then have complaints.
qwen_next_gguf_when@reddit (OP)
Will re-test and add to the list. Thanks.
New_Comfortable7240@reddit
Please try qwen3.5-35B but not distilled, as there is the theory distilled won't translate to better performance
akumaburn@reddit
In benchmarks maybe but in actual usage it is my experience that distilled models perform much better.