What's the consensus on superior local models for code generation? Is my setup competitive?

Posted by warpanomaly@reddit | LocalLLaMA | View on Reddit | 30 comments

I'm trying as hard as I can to get a local setup somewhere in the ballpark of proprietary LLMs for code generation. My computer is running a Intel(R) Core(TM) Ultra 7 265K (3.90 GHz) with 128 GB of DDR5 RAM and an Nvidia Geforce RTX 5090 that has 32 GB of GDDR7 video memory. Even with this high end enthusiast hardware, I can't get my local LLMs to get close Claude Code or ChatGPT Codex. I know that I'll never get local code generation as good as the major industry players running gigantic power grid altering data centers, but it seems like I should be able to get better results than I'm getting.

My first attempt was deepseek-coder-v2:236b. Long story short I couldn't get it working. As soon as I started talking about my failed attempts to use Deepseek, lots of people told me to switch to GLM-4.7-Flash-GGUF:Q6_K_XL or MiniMax-M2.1-GGUF:Q4_K_XL. I started using GLM-4.7-Flash-GGUF:Q6_K_XL to pretty good results. This was actually generating usable code.

This was a few months ago. I know it hasn't been that long but it seems like AI is really exploding lately. I've been seeing people get crazy results for art via tools like ComfyUI and Automatic1111. Also, I think Deepseek just unveiled a new model. Idk if it's available to the public yet, but I have to ask, is there a better model for local code generation than GLM-4.7-Flash-GGUF:Q6_K_XL? Is running it from the command line with .\llama-server.exe -hf unsloth/GLM-4.7-Flash-GGUF:Q6_K_XL --alias "GLM-4.7-Flash" --host 127.0.0.1 --port 10000 --ctx-size 32000 --n-gpu-layers 99 and then connecting it to VSCodium with Continue still the best way to do what I'm trying to do?

P.S. I bought my Nvidia 5090 thinking it was the best piece of equipment for running AI locally. Should I get one of those Nvidia DGX Sparks or one of the competitors?