Realistic local LLM rig under $6500? Dev with heavy RAM needs
Posted by TeachTall3390@reddit | LocalLLaMA | View on Reddit | 27 comments
Hey everyone,
I'm a developer looking for practical hardware recommendations under $6500 for local LLM work. My usage breaks down like this:
- 60% local inference
- 30% LoRA training
- 10% light fine-tuning on smaller models
Anything heavy I just rent GPU clusters or use work resources.
I usually run 40-50 services at once, so I need a ton of RAM. Video editing would be a nice bonus but not required. Linux or macOS is fine.
What builds are actually worth it right now? Thanks!
Electronic-Space-736@reddit
this one is launching currently https://hilbert-agentic-computer.kckb.me/b06cccc2
SiXke@reddit
any thoughts on this? I tried to create a post about this but it got deleted (no clue why)
Electronic-Space-736@reddit
I am going for it, it suits my use case, I want access to full weights (or close too), This will be serving just me, not tens or hundreds of people, and I am happy with the tradeoff of not having the tensor core speed, and the slightly slower unified memory, it provides me access to a larger model than I can otherwise afford currently.
Turbulent_Pin7635@reddit
M5 MAX 128Gb
ExcellentTip9926@reddit
DGX Spark at $4,699 is probably the best fit. 128GB unified memory, full CUDA stack, runs up to 200B param models at FP4 locally, Mac Mini-sized. Memory bandwidth is only 273 GB/s so inference is slower than a Mac Studio, and thermal throttling on long training is real, but for a 60/30/10 inference/LoRA/finetune split it’s almost perfectly designed. Other options: Mac Studio M3 Ultra 256GB at $5,600 has more RAM and faster large-model inference but you lose CUDA for LoRA. 4090 build with 128GB DDR5 around $5,500 is best CUDA speed per dollar but 24GB VRAM limits local model size. 5090 build at $6,500+ stretches budget hard with current memory shortage pricing. For 40-50 services plus LoRA plus inference, I’d go DGX Spark.
_millsy@reddit
How would you see that comparing against a strix halo or similar?
ExcellentTip9926@reddit
Strix Halo is genuinely competitive. Framework Desktop with Ryzen AI Max+ 395 at $2,599-$2,999 gets 128GB unified memory with 256 GB/s bandwidth vs Spark’s 273. On apples-to-apples LLM inference (gpt-oss-120b) it’s within 5-10% of Spark while saving $1,700+. Spark wins on prompt processing (3-5x faster), long context (23% faster at 32K), and image gen (2.5x faster on FLUX). So if you’re doing heavy prefill, long-context RAG, or diffusion models, Spark earns its price. If it’s mostly LLM chat + LoRA, Strix Halo is the better buy now.
FreshBowler32@reddit
Adding to and creating a table to this. Also realistically after allocating RAM to the OS VRAM would be pretty equal to a quad 3090 setup.
Snoo_81913@reddit
There's some rumors/leaks that the Ryzen AI max will have a theoretical max bandwidth of 460 gb/s using LPDDR6-14400 ... sounds... EXPENSIVE. meanwhile M3 ultra has 800+ bandwidth. 2 generations ago.
veinamond@reddit
Well, it would all be nice if only DDR6 and LPDDR6 were coming anytime soon. From what I read online, the soonest it hits the consumer market is probably 2030, maybe later.
Enturbulated_One@reddit
Shhhh, don't talk about it! Every time I've mentioned it, or looked, DDR6 release date estimate has been pushed back further!
Snoo_81913@reddit
What this guy said (it's almost like I copied him lol) but those are definitely the two models in your bracket that will do the job. I would also lean into the Spark.
ranting80@reddit
Training leans on the spark.
HopePupal@reddit
if you're going the GB10 route, the Asus version is a lot cheaper than the Spark
Snoo_81913@reddit
So many factors to consider here.
I'll go with the weighted for now.
60% inference. Apple Studio M3 ultra 512gb RAM 800+ gb/s bandwidth loads large LLMs and all your services easily. About 4k used on ebay. Cons: MLX coreML not cutting edge Gonna blow you up with fans. Used. New units are $7,500 and up and you can only get 256gb in new models.
Loras and Fine-tuning: Nvidia DGX Spark. 1 Petaflop pure Raw power. At FP4. It will consume input like a starving animal. Just cram it all in and it will shred it. Cutting edge CUDA architecture. Will rip through fine tuning and loras like they don't exist. Scalible with 200gbps connections for clusters. Cons. 273 gb/s bandwidth. 128gb RAM. Slower token generation. No video Gen. Custom Os DGX OS maxing out your budget at $5,590-$6,400
bionicdna@reddit
Where are you finding a used M3 Ultra Mac Studio on eBay with 512Gb RAM under $10k? The rest seems to be scams.
Snoo_81913@reddit
You could be right there. I saw a couple of them here and there I've been keeping an eye out it's getting harder and harder to find anything like that. Good catch
Powerful_Ad8150@reddit
Single or dual DGX Spark cluster. Single - q3.5 122 @ 50tps vLLM / m2.7 at poor mans q4 quant @ 22 tps llamacpp. I have Asus G10, the only difference being that the Spark has a power button on the front (though that's not a deal-breaker - mine booted once two months ago and I never turned it off again xD ). It's an amazing machine. Although there are some compatibility issues with some solutions because it's ARM, not x86.
cleversmoke@reddit
Recently I bought:
Running Qwen3.6-35B-A3B Q4 with 262k context. PP at 2800 tk/s and TG at 130 tk/s. Simple --fit:on configuration.
I plan on buying another RTX 3090 + Aoostar AG01 so I can utilize the Q8 version.
That would bring my total to be around $3500. I can probably add another RTX 3090 if a Qwen3.6-120B+ model comes out.
Unsure if it can handle 40-50 services though unless I do a lot of throttling.
Snoo_81913@reddit
Man I have poured over setups like this. Which mini did you go with?
cleversmoke@reddit
I went with the Reatan X7 255 with oculink, Radeon 780M, 64GB DDR5 ram, and 2TB NVMe.
I bought it when it was $800 and it has jumped to $1200 since (US Amazon), but even at $1200, I'd buy it again. I got the Reatan specifically for the oculink since I didn't want a tower.
If my second eGPU works on it, I'll be thrilled! The eGPU comes in next week.
No_Mango7658@reddit
Seeing as how qwen3.6 q4km with 256k context basically fits in a 5090, that would be my target.
Charming-Author4877@reddit
I gave Qwen 3.6 and Gemma-4 a quite extensive testrun today (on a 5090) and the results were really impressive, much better than I expected.
https://www.reddit.com/r/GithubCopilot/comments/1ss583x/i_am_not_switching_yet_but_i_tested_gemma4_and
whodoneit1@reddit
Today I tested using Kimi2.6 for planning and Qwen 3.6 Plus for implementation and the results were really good. I was running non locally as I wanted to test this work flow first
Charming-Author4877@reddit
I personally go with this:
- 2x 3090 or 1x5090 +1x3090
- 128GB DDR5 RAM (or 196GB if you can find an affordable pack)
- Large 9100 PRO SSD, or 2 striped prev generation SSD (sums up to the same speed)
I use Windows + WSL
For Speech/Music I run Demodokos Foundry, I put it into on-demand mode or bind it to my 2nd GPU
That gives SOTA inference without taking any VRAM when not used.
For LLM you can run Qwen 3.6 35B at 260K context and still have plenty of primary VRAM available.
Also the dense models (gemma 31B or Qwen 28B) run well, with a bit of KV quantization.
For light fine-tuning or LORA training you can use either one card in background, or both.
I have a second PC like this available in network for long running tasks.
Macbooks offer great value but at the same time they are exotic hardware in the AI world, it's improving a lot but still is a burden. I absolutely hate the Apple development environment.
It is great for running large models that won't fit in my described solution but prefill speed is gruesome.
DGX Spark and similar ARM unified RAM boxes are glorified mini computers, significantly slower than the Macbook and prefill is a total showstopper.
Same with AMD GPUs, they are not impressive in compute.
So my choice went on a conservative CUDA solution, it's hard enough with Local AI that is mutating and changing faster than anyone can easily follow.
Magnus919@reddit
As much MacBook Pro as you can stomach to pay for.
Or a Mac Studio if you never leave home.
Excellent_Koala769@reddit
MacBook Pro M5 Max 128 GB