Local LLM setup for coding (pair programming style) - GPU vs MacBook Pro?

Posted by bajis12870@reddit | LocalLLaMA | View on Reddit | 18 comments

Hey everyone,

I'm a programmer and I'd love to use local LLMs as a kind of "superpower" to move faster in my day-to-day work.

Typical use case: I'm working on a codebase (Rust, Python, Go, or TypeScript with React/Vue), and I want the model to understand the existing project and implement new features on top of it — ideally writing code directly in my IDE, like a pair programming partner.

Right now I've tried cloud models like Claude, Qwen, ChatGPT, and GLM. Results are honestly great (especially Claude), but cost and privacy are starting to bother me — hence the interest in going local.

My current setup:

Ryzen 9 9950X 96 GB DDR5 RAM GPU still to choose

I'm considering a few options and I'm not sure what makes the most sense:

Nvidia 5090 (~€ 3500) AMD R9700 32 GB (~€ 1300)

Option B: Go all-in on a MacBook Pro M5 Max (128 GB RAM, ~€ 7000)

My main questions: 1. Are there local LLMs that actually get close to Claude-level performance for coding tasks?

  1. Are there solid benchmarks specifically for coding + codebase-aware edits?

  2. Which local models are currently best for this kind of workflow?

  3. How much VRAM / unified memory do you realistically need for this use case?

  4. Dense vs MoE models - what works better locally?

  5. Does generation speed really matter that much? (e.g. 45 tok/s vs 100+ tok/s in real usage)

  6. What tools are people using for this? (IDE plugins, local agents, etc.)

  7. How can I test these setups before dropping thousands on hardware?

Curious to hear from people who are actually running local setups for real dev work (not just demos). What's your experience like?