Benchmarked 18 models that I can run on my RTX 5080 16GB using Nick Lothian's SQL benchmark

Posted by grumd@reddit | LocalLLaMA | View on Reddit | 82 comments

2 days ago there was a very cool post by u/nickl:

https://reddit.com/r/LocalLLaMA/comments/1s7r9wu/comment/odc9xj8/

Highly recommend checking it out!

I've run this benchmark on a bunch of local models that can fit into my RTX 5080, some of them partially offloaded to RAM (I have 96GB, but most will fit if you have 64).

Results:

24: unsloth/Qwen3.5-122B-A10B-GGUF:UD-Q4_K_XL
🟩🟩🟩🟩🟩 🟩🟩🟩🟩🟩 🟩🟩🟩🟩🟩 🟩🟩🟩πŸŸ₯🟩 🟩🟩🟩🟩🟩
23: bartowski/Qwen_Qwen3.5-27B-GGUF:IQ4_XS
🟩🟩🟩🟩🟩 🟩🟩🟩🟩🟩 🟩🟩🟩🟩🟩 🟩🟩🟩πŸŸ₯🟩 πŸŸ₯🟩🟩🟩🟩
23: unsloth/Qwen3.5-122B-A10B-GGUF:UD-IQ3_XXS
🟩🟩🟩🟩🟩 🟩🟩🟩🟩🟩 🟩🟩🟩🟩🟩 🟩🟩🟩πŸŸ₯🟩 πŸŸ₯🟩🟩🟩🟩
22: unsloth/Qwen3.5-35B-A3B-GGUF:UD-Q6_K_XL
🟩🟩🟩🟩🟩 🟩🟩🟩🟩🟩 🟩🟩πŸŸ₯🟩🟩 🟩🟩🟩πŸŸ₯🟩 πŸŸ₯🟩🟩🟩🟩
22: mradermacher/Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled-i1-GGUF:Q3_K_M
🟩🟩🟩🟩🟩 🟩🟩🟩🟩🟩 🟩🟩🟩🟩🟩 🟩πŸŸ₯🟩πŸŸ₯🟩 πŸŸ₯🟩🟩🟩🟩
21: unsloth/NVIDIA-Nemotron-3-Super-120B-A12B-GGUF:UD-Q4_K_S
🟩🟩🟩🟩🟩 🟩🟩🟩🟩🟩 🟩🟩🟩🟩🟩 🟩🟩🟩🟨πŸŸ₯ πŸŸ₯🟨🟩🟩🟩
20: unsloth/Qwen3-Coder-Next-GGUF:UD-Q5_K_XL
🟩🟩🟩🟩🟨 🟩🟩🟩🟩🟩 🟩🟩🟨🟩🟩 🟩🟩🟩πŸŸ₯🟨 πŸŸ₯🟩🟩🟩🟩
20: mradermacher/Qwen3.5-35B-A3B-Claude-4.6-Opus-Reasoning-Distilled-i1-GGUF:Q6_K
🟩🟩🟩🟩🟩 🟩🟩🟩🟩🟩 🟩🟩πŸŸ₯🟩🟩 πŸŸ₯🟩🟩πŸŸ₯🟩 πŸŸ₯πŸŸ₯🟩🟩🟩
19: unsloth/GLM-4.7-Flash-GGUF:UD-Q6_K_XL
🟩🟩🟩🟩🟩 🟩🟩🟩🟩🟩 🟩🟩πŸŸ₯🟩🟩 🟩🟩🟩πŸŸ₯🟨 πŸŸ₯🟨🟩πŸŸ₯🟩
18: unsloth/GLM-4.5-Air-GGUF:Q5_K_M
🟩🟩🟩🟩🟩 🟩🟩🟩🟩🟩 🟩🟩πŸŸ₯🟩🟩 πŸŸ₯🟩🟩πŸŸ₯🟩 🟨🟨πŸŸ₯🟩🟨
18: bartowski/nvidia_Nemotron-Cascade-2-30B-A3B-GGUF:Q6_K_L
🟩🟩🟩🟩🟩 🟩🟩🟩🟩🟩 🟩🟩🟨🟩🟩 🟩🟩🟩πŸŸ₯🟩 🟨🟨πŸŸ₯🟨🟨
16: unsloth/Qwen3-Coder-Next-GGUF:UD-Q4_K_XL
🟩🟩🟩🟩🟨 🟩🟩🟩🟩🟩 🟩🟩🟨🟩🟩 πŸŸ₯🟨🟩πŸŸ₯🟨 πŸŸ₯🟨🟩🟨🟩
16: byteshape/Devstral-Small-2-24B-Instruct-2512-GGUF:IQ3_S
🟩🟩🟩🟩🟩 🟩🟩🟩🟩🟩 πŸŸ₯🟩🟨🟩🟩 🟩🟩🟨πŸŸ₯🟨 🟨🟨πŸŸ₯🟨🟩
16: mradermacher/Qwen3.5-9B-Claude-4.6-HighIQ-THINKING-i1-GGUF:Q6_K
🟩🟩🟩🟩🟩 🟩🟩🟩🟩🟩 🟩🟩🟨πŸŸ₯🟩 πŸŸ₯🟩πŸŸ₯πŸŸ₯🟨 πŸŸ₯🟩πŸŸ₯🟩🟨
14: mradermacher/Qwen3.5-9B-Claude-4.6-HighIQ-INSTRUCT-i1-GGUF:Q6_K
🟩🟩🟩🟩🟩 🟩🟩🟩🟩🟩 πŸŸ₯🟩πŸŸ₯🟩🟩 🟩🟨πŸŸ₯πŸŸ₯🟨 🟨🟨πŸŸ₯🟨🟨
14: unsloth/GLM-4.6V-GGUF:Q3_K_S
🟩🟩🟩🟩🟩 🟩🟩🟩🟩🟩 πŸŸ₯🟩🟨🟨🟩 πŸŸ₯🟩🟩🟨🟨 🟨🟨🟨🟨🟨
5: bartowski/Tesslate_OmniCoder-9B-GGUF:Q6_K_L
🟨🟨🟨🟨🟨 🟨🟨🟨🟩🟩 🟩🟨🟨🟩🟨 🟨🟨🟩🟨🟨 🟨🟨🟨🟨🟨
5: unsloth/Qwen3.5-9B-GGUF:UD-Q6_K_XL
🟨🟨🟨🟨🟨 🟨🟨🟨🟩🟩 🟨🟩🟨🟨🟩 🟨🟩🟨🟨🟨 🟨🟨🟨🟨🟨

The biggest surprise is Qwen3.5-9B-Claude-4.6-HighIQ-THINKING to be honest, going from 5 green tests with Qwen3.5-9B to 16 green tests. Most errors of Qwen3.5-9B boiled down to being unable to call the tools with correct formatting. For how small it is it's a very reliable finetune.

Qwen3.5-122B-A10B is still king with 16GB GPUs because I can offload experts to RAM. Speed isn't perfect but the quality is great and I can fit a sizable context into VRAM. Q4_K_XL uses around 68GB RAM, IQ3_XXS around 33GB RAM, so the smaller quant can be used with 64GB system RAM.

Note though - these benchmarks mostly test a pretty isolated SQL call. It's a nice quick benchmark to compare two models, even with tool calling, but it's not representative of a larger codebase context understanding where larger models will pull ahead.