Benchmarked 18 models that I can run on my RTX 5080 16GB using Nick Lothian's SQL benchmark

Posted by grumd@reddit | LocalLLaMA | View on Reddit | 82 comments

2 days ago there was a very cool post by u/nickl:

https://reddit.com/r/LocalLLaMA/comments/1s7r9wu/comment/odc9xj8/

Highly recommend checking it out!

I've run this benchmark on a bunch of local models that can fit into my RTX 5080, some of them partially offloaded to RAM (I have 96GB, but most will fit if you have 64).

Results:

24: unsloth/Qwen3.5-122B-A10B-GGUF:UD-Q4_K_XL
🟩🟩🟩🟩🟩 🟩🟩🟩🟩🟩 🟩🟩🟩🟩🟩 🟩🟩🟩🟥🟩 🟩🟩🟩🟩🟩
23: bartowski/Qwen_Qwen3.5-27B-GGUF:IQ4_XS
🟩🟩🟩🟩🟩 🟩🟩🟩🟩🟩 🟩🟩🟩🟩🟩 🟩🟩🟩🟥🟩 🟥🟩🟩🟩🟩
23: unsloth/Qwen3.5-122B-A10B-GGUF:UD-IQ3_XXS
🟩🟩🟩🟩🟩 🟩🟩🟩🟩🟩 🟩🟩🟩🟩🟩 🟩🟩🟩🟥🟩 🟥🟩🟩🟩🟩
22: unsloth/Qwen3.5-35B-A3B-GGUF:UD-Q6_K_XL
🟩🟩🟩🟩🟩 🟩🟩🟩🟩🟩 🟩🟩🟥🟩🟩 🟩🟩🟩🟥🟩 🟥🟩🟩🟩🟩
22: mradermacher/Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled-i1-GGUF:Q3_K_M
🟩🟩🟩🟩🟩 🟩🟩🟩🟩🟩 🟩🟩🟩🟩🟩 🟩🟥🟩🟥🟩 🟥🟩🟩🟩🟩
21: unsloth/NVIDIA-Nemotron-3-Super-120B-A12B-GGUF:UD-Q4_K_S
🟩🟩🟩🟩🟩 🟩🟩🟩🟩🟩 🟩🟩🟩🟩🟩 🟩🟩🟩🟨🟥 🟥🟨🟩🟩🟩
20: unsloth/Qwen3-Coder-Next-GGUF:UD-Q5_K_XL
🟩🟩🟩🟩🟨 🟩🟩🟩🟩🟩 🟩🟩🟨🟩🟩 🟩🟩🟩🟥🟨 🟥🟩🟩🟩🟩
20: mradermacher/Qwen3.5-35B-A3B-Claude-4.6-Opus-Reasoning-Distilled-i1-GGUF:Q6_K
🟩🟩🟩🟩🟩 🟩🟩🟩🟩🟩 🟩🟩🟥🟩🟩 🟥🟩🟩🟥🟩 🟥🟥🟩🟩🟩
19: unsloth/GLM-4.7-Flash-GGUF:UD-Q6_K_XL
🟩🟩🟩🟩🟩 🟩🟩🟩🟩🟩 🟩🟩🟥🟩🟩 🟩🟩🟩🟥🟨 🟥🟨🟩🟥🟩
18: unsloth/GLM-4.5-Air-GGUF:Q5_K_M
🟩🟩🟩🟩🟩 🟩🟩🟩🟩🟩 🟩🟩🟥🟩🟩 🟥🟩🟩🟥🟩 🟨🟨🟥🟩🟨
18: bartowski/nvidia_Nemotron-Cascade-2-30B-A3B-GGUF:Q6_K_L
🟩🟩🟩🟩🟩 🟩🟩🟩🟩🟩 🟩🟩🟨🟩🟩 🟩🟩🟩🟥🟩 🟨🟨🟥🟨🟨
16: unsloth/Qwen3-Coder-Next-GGUF:UD-Q4_K_XL
🟩🟩🟩🟩🟨 🟩🟩🟩🟩🟩 🟩🟩🟨🟩🟩 🟥🟨🟩🟥🟨 🟥🟨🟩🟨🟩
16: byteshape/Devstral-Small-2-24B-Instruct-2512-GGUF:IQ3_S
🟩🟩🟩🟩🟩 🟩🟩🟩🟩🟩 🟥🟩🟨🟩🟩 🟩🟩🟨🟥🟨 🟨🟨🟥🟨🟩
16: mradermacher/Qwen3.5-9B-Claude-4.6-HighIQ-THINKING-i1-GGUF:Q6_K
🟩🟩🟩🟩🟩 🟩🟩🟩🟩🟩 🟩🟩🟨🟥🟩 🟥🟩🟥🟥🟨 🟥🟩🟥🟩🟨
14: mradermacher/Qwen3.5-9B-Claude-4.6-HighIQ-INSTRUCT-i1-GGUF:Q6_K
🟩🟩🟩🟩🟩 🟩🟩🟩🟩🟩 🟥🟩🟥🟩🟩 🟩🟨🟥🟥🟨 🟨🟨🟥🟨🟨
14: unsloth/GLM-4.6V-GGUF:Q3_K_S
🟩🟩🟩🟩🟩 🟩🟩🟩🟩🟩 🟥🟩🟨🟨🟩 🟥🟩🟩🟨🟨 🟨🟨🟨🟨🟨
5: bartowski/Tesslate_OmniCoder-9B-GGUF:Q6_K_L
🟨🟨🟨🟨🟨 🟨🟨🟨🟩🟩 🟩🟨🟨🟩🟨 🟨🟨🟩🟨🟨 🟨🟨🟨🟨🟨
5: unsloth/Qwen3.5-9B-GGUF:UD-Q6_K_XL
🟨🟨🟨🟨🟨 🟨🟨🟨🟩🟩 🟨🟩🟨🟨🟩 🟨🟩🟨🟨🟨 🟨🟨🟨🟨🟨

The biggest surprise is Qwen3.5-9B-Claude-4.6-HighIQ-THINKING to be honest, going from 5 green tests with Qwen3.5-9B to 16 green tests. Most errors of Qwen3.5-9B boiled down to being unable to call the tools with correct formatting. For how small it is it's a very reliable finetune.

Qwen3.5-122B-A10B is still king with 16GB GPUs because I can offload experts to RAM. Speed isn't perfect but the quality is great and I can fit a sizable context into VRAM. Q4_K_XL uses around 68GB RAM, IQ3_XXS around 33GB RAM, so the smaller quant can be used with 64GB system RAM.

Note though - these benchmarks mostly test a pretty isolated SQL call. It's a nice quick benchmark to compare two models, even with tool calling, but it's not representative of a larger codebase context understanding where larger models will pull ahead.

[-]

Ranmark@reddit

new qwen3.6. dont look at the time. i have 10 year old hardware. had to increase timeout, of course

[-]