GPU advice for running local coding LLMs

Posted by mak3rdad@reddit | LocalLLaMA | View on Reddit | 13 comments

I’ve got a Threadripper 3995WX (64c/128t), 256GB RAM, plenty of NVMe, but no GPU. I want to run big open-source coding models like CodeLlama, Qwen-Coder, StarCoder2 locally, something close to Claude Code. If possible ;)

Budget is around $6K. I’ve seen the RTX 6000 Ada (48GB) suggested as the easiest single-card choice, but I also hear dual 4090s or even older 3090s could be better value. I’m fine with quantized models if the code quality is still pretty good.

Anyone here running repo-wide coding assistants locally? What GPUs and software stacks are you using (Ollama, vLLM, TGI, Aider, Continue, etc.)? Is it realistic to get something close to Claude Code performance on large codebases with current open models?

Thanks for any pointers before I spend the money on the gpu!