Local LLM setup for coding (pair programming style) - GPU vs MacBook Pro?
Posted by bajis12870@reddit | LocalLLaMA | View on Reddit | 18 comments
Hey everyone,
I'm a programmer and I'd love to use local LLMs as a kind of "superpower" to move faster in my day-to-day work.
Typical use case: I'm working on a codebase (Rust, Python, Go, or TypeScript with React/Vue), and I want the model to understand the existing project and implement new features on top of it — ideally writing code directly in my IDE, like a pair programming partner.
Right now I've tried cloud models like Claude, Qwen, ChatGPT, and GLM. Results are honestly great (especially Claude), but cost and privacy are starting to bother me — hence the interest in going local.
My current setup:
Ryzen 9 9950X 96 GB DDR5 RAM GPU still to choose
I'm considering a few options and I'm not sure what makes the most sense:
- Option A: Add a GPU
Nvidia 5090 (~€ 3500) AMD R9700 32 GB (~€ 1300)
Option B: Go all-in on a MacBook Pro M5 Max (128 GB RAM, ~€ 7000)
My main questions: 1. Are there local LLMs that actually get close to Claude-level performance for coding tasks?
-
Are there solid benchmarks specifically for coding + codebase-aware edits?
-
Which local models are currently best for this kind of workflow?
-
How much VRAM / unified memory do you realistically need for this use case?
-
Dense vs MoE models - what works better locally?
-
Does generation speed really matter that much? (e.g. 45 tok/s vs 100+ tok/s in real usage)
-
What tools are people using for this? (IDE plugins, local agents, etc.)
-
How can I test these setups before dropping thousands on hardware?
Curious to hear from people who are actually running local setups for real dev work (not just demos). What's your experience like?
qubridInc@reddit
Skip the Mac get a strong NVIDIA GPU (5090-class if budget allows), run Qwen 3.6 or coder variants via vLLM + Aider/OpenCode, and you’ll get the closest practical “Claude-like” local pair-programming setup today.
FederalAnalysis420@reddit
honestly i'd just rent a gpu on runpod or vast for an afternoon and actually test test models before using your own money. that could answer most of your questions faster than any benchmark will.
if you still want to buy, the 5090 should run the smaller dense models fast enough that agent loops actually feel responsive, and the mac lets you run bigger moe models but the speed drop is real. for pure coding work i'd probably lean 5090. .
privacy's a good reason to go local. on pure cost though, claude api tends to come out cheaper than people expect once you actually do the math.
erizon@reddit
I assume many people switching local are future-proofing. There are two polar opposites scenario both likely soonish (let's say in 0.5-2 years): 1. AI bubble bursts, and the market gets flooded with cheap GPU that were denied to the market 2. API costs increase by order of magnitude, and demand for local compute explodes, making it very expensive unless you already purchased the hardware earlier
FullOf_Bad_Ideas@reddit
I think neither of them will happen. Some models will get more expensive on API, some won't, GPUs will be possible to get but at high prices. Essentially - the same thing that is happening for the last 3+ years.
Erdnalexa@reddit
I bought an 5090FE from NVidia last October at about €2k (France). Is this not an option anymore? (That’s an actual question)
Important_Coach9717@reddit
We found the Joker
Erdnalexa@reddit
What?
yes_i_tried_google@reddit
You could sell it and make a 50% profit with today’s prices
Erdnalexa@reddit
How do I play then? 4k@120fps on AAA is hard to run, even with the 5090 (I hate how framegen looks and its latency).
thrownawaymane@reddit
You wait for Nvidia to bother making a 6090 and pay €3.5k for it :)
melspec_synth_42@reddit
if you can swing a 5090 the 32GB vram is a game changer for running 35B models at decent quants. mac is better as a daily driver but for raw inference throughput nothing beats nvidia right now
Pretend_Engineer5951@reddit
Own_Mix_3755@reddit
Just a question - what were you missing in Roo that you switched?
Pretend_Engineer5951@reddit
I was looking for a feature to override system prompt. As I found it was available somewhere in the past, but then developers ripped it. Kilo hasn't do that yet.
No-Anchovies@reddit
Coming from an "unlimited resources" place of work, it has been a very humbling and grounding learning experience to compartmentalise personal projects just small enough that I can actually thrown some AI at it to patch or refactor. Personally I believe it's hard to beat the convenience of running linux & Nvidia. Full plug n play on popOS has been a very relaxing experience
HugeEntertainment820@reddit
I’ve been using the qwen 3.6 the last day and I’m impressed. Asking if it can do professional work is way more than simply are there model on level of Claude code. Is your app for 1,000 people, 30k or more? What’s your tech stack etc.
iamapizza@reddit
The answer to 1 is no, and it's also, depends on what you're doing. For some people it is and some it's not good enough, forcing them to adapt.
Your best bet is to add to what you currently have. If you can get a decent gpu, you can get started pretty quickly with a local setup and see which one works for you. With your cpu and a 5090 you'll get some really good speeds.
On the other hand if this is for your job maybe still consider a third party. If not Claude then maybe GitHub copilot.
alexwh68@reddit
I can’t answer all your questions but here is some answers.
No, reset your expectations.
I use either qwen coder or the new 3.6 version. I am using a Q6 locally.
I have a 96gb ram MBP and it works well.
Llama-server with opencode
Key thing here is I am using cursor for more thinking tasks and local for more boilerplate repetitive tasks. Local is slower for sure but my workflow has changed a bit, I am a freelancer working from home. I asked qwen to build all the code for 4 new tables based on the existing project. Had breakfast came back all done, repositories, interfaces, services, dto’s and basic blazor pages. That is roughly 4 hours work by hand, copy and pasting roughly 2 hours work. So min saving today 2 hours.
My goal is to cut down on api usage where sensible.