What's the best machine I can get for $10k?
Posted by TWUC@reddit | LocalLLaMA | View on Reddit | 58 comments
I'm looking to buy a machine I can use to explore LLM development. My short-list of use cases is: 1) custom model training, 2) running local inference, 3) testing, analyzing, and comparing various models for efficacy/efficiency/performance. My budget is $10k. Ideally, I want something turn-key (not looking to spend too much time building it). I need to be able to run massive full model such as full deepseek 671B.
Firm-Fix-5946@reddit
$10k is woefully insufficient to tread into this world. This reads like saying you're ready to buy a car that's both comfortable daily and good for a track day, and so you've saved up $400
present_absence@reddit
Why are you jumping in the deep end to "explore" this topic? Seems kind of absurd, no? Unless you're an extremely wealthy hobbyist.
doradus_novae@reddit
As others have stated, unfortunately its not gonna happen.
10k will get you 1 A6000 pro and a non-server workstation that wont be capable of upgrading beyond 2 gpus if you are lucky.
If your end goal is anything serious at home, you need workstation class hardware:
Motherboard: 1200$ minimum
CPU: 1800$ minumum
RAM: lol, do you think desktop ram is expensive? I paid I think 2000$ for 256gb 8 months ago...
Thrown in an extra 500$x2 for the power supplies you need.
Oh you think you can run this on a normal circuit? Might want that electrician come and install 2 dedicated 15a circuits for 1000$
And you can then one or two small models at nerfed context windows. Nothing near 687b params on consumer hardware will be possible unless its quantized and nerfed to hell.
Apple/G10 memory is too slow. Not a viable option for serious work.
It is a fun hobby but not for everyone yet.
mister_conflicted@reddit
Have you tried renting a lambda instance and trying larger models to see if they accomplish what you want, and then deciding on hardware?
allenasm@reddit
mac m3 ultra max 512gb and its not even close. Model precision is so much more important than inference speed.
Fabix84@reddit
> I need to be able to run massive full model
> such as full deepseek 671B.
Sorry to burst your bubble, but even with $10k you’re nowhere near running a model like DeepSeek 671B. I’m not even close with a $35k setup, and I wouldn’t be, even with $50k worth of hardware.
So before anything else, try to get a realistic sense of what $10k actually represents in this space. For the average person it sounds like a huge amount of money, but in this field it’s basically pocket change.
S4M22@reddit
I'm curious: what's your $35k setup and what can you run with it?
No_Conversation9561@reddit
wait for M5 max/ M5 ultra
don’t get M3 ultra.. trust me, I have two of them
chaosmikey@reddit
What’s your issue with the M3 Ultra? I’m curious. I only like them because of the 512GB RAM.
No_Conversation9561@reddit
Too slow for agentic coding unless you use a smaller model like Qwen 30b a3b.
At first you think you’re gonna use something like GLM 4.5/4.6 since you have so much wrong.
https://i.redd.it/eid4ko6y544g1.gif
blbd@reddit
GLM is definitely a Charlie Murphy to your GPU and Unified RAM Rick James.
cdevr@reddit
Unityyyy!
pmttyji@reddit
What's the performance with 100B models like GPT-OSS-120B, GLM-4.5-Air, Ling/Ring/LLaDA Flash, Llama-4-Scout AND MiniMax-M2(Q4), Qwen3-235B(Q4), etc.,? Please share. Thanks
Consistent_Wash_276@reddit
These are go to, but minimax on my 256 gb M3 Ultra
pmttyji@reddit
No wonder u/No_Conversation9561 insisting not to get M3 ultra
Thought it would run MiniMax-M2(Q4), Qwen3-235B(Q4) since those quant size comes around 120-140GB.
Atleast are you able to run Q8 of 100B models? I see that GLM-4.5-Air's Q8 is 120GB.
So what t/s are you getting for 100B models approximately?
Consistent_Wash_276@reddit
I’ve run all these models, but minimax q4 and I believe minimax q6 came in around 18 t/s. Haven’t ran q4. Just not in my current wheelhouse.
I believe the majority are between 50 to 75 t/s.
pmttyji@reddit
18 t/s is usable one & it's possible to increase that with all available optimizations.
Also 50-75 t/s .... cool!
Thanks for the stats.
Consistent_Wash_276@reddit
In the end I have my go-to’s - gpt-oss:120b - qwen3-coder:30b fp16 - DeepSeek-r1:70b - glm-4.5:air q4
I could use larger models but I have multiple LLMs running in parallel while working on other tasks
Turbulent_Pin7635@reddit
I refuse to touch the oss. GLM 4.6 (15t/s), quen3-235 (not quant) (30-40 t/s)
pmttyji@reddit
Just noticed multiple variants there for that Mac. Yours 256 or 512 GB?
Turbulent_Pin7635@reddit
512gb
The Mac is a beast, but kept in mind that it's memory bandwidth is 819Gbps it is very close to a 3090, also the KV cache and the lack of CUDA impacts the performance of the Mac. But, the things are getting better with more and more effort being put to improve the MacStudio capabilities to deal with LLM.
It is very fun, it already runs models with best answers than ChatGPT and Gemini. At least when compared with the no-PRO versions. Also, you don't need to care about drivers, noise, energy consumption, heat, reselling a Frankenstein... It works.
pmttyji@reddit
Text Generations OK. Really curious to know how good with Image & Video models. I don't see benchmarks on this often here.
MoffKalast@reddit
Lmao, why even bother?
pmttyji@reddit
:D It just came out of head when I was thinking for 100B models. Rarely I saw few people still do use this one.
hyouko@reddit
Some options in that price range:
The Mac the most turn-key and may be able to run the really big models, but probably won't be good for custom model training and won't do anything with CUDA if you need that. An RTX Pro 6000 can do some light model training and will run smaller models fast but won't fit the really big models. The old Epyc server route is probably similar to the Mac situation, but potentially expansible with GPUs down the line, but also it's gonna be noisy and suck down electricity like a mofo.
$10K would buy a lot of server time on various hosted services that are out there, so consider that as an alternative that would let you try out various configurations.
Consistent_Wash_276@reddit
Second the M3 Ultra Mac Studio
Turbulent_Pin7635@reddit
Third the M3 Ultra Mac Studio. For text inference, is the best one in that range of money.
Consistent_Wash_276@reddit
🤝
Own_Attention_3392@reddit
You can run gpt oss 120b on much cheaper hardware -- I have used it with reasonable speed on a 5090 paired with 64 GB system RAM.
Dersonje@reddit
I second the old epyc cpus. With ddr4 ram since ddr5 is price prohibitive right now. Then you’ll also have enough PCIE lanes to add GPUs as needed
chaosmikey@reddit
Mac Studio is the only thing that comes to mind. A 2TB with 512 of RAM is about $9900 USD. You can lower internal storage and use an external SSD with thunderbolt 5. This is the route I would go. You can chain them with Exos and share compute power.
iMrParker@reddit
He mentioned training. So Macs are out the window
General-Yak5264@reddit
And yet 3/4's of the comments...
Kqyxzoj@reddit
Depending on the ridiculous piles of cash you are rolling around in, I'd say maybe rent a couple of configs first. That would allow you to dial in on what makes sense for your use case. And this is coming from someone who firmly believes in the Own All The Shit You Depend On [tm] methodology. Oh wait, not too much time building it. Mmmh, tinybox?
LoaderD@reddit
Yup. The fact op doesn’t differentiate between inference and training means they shouldn’t be buying anything before their use-case is better figured out.
No_Afternoon_4260@reddit
That's the answer
Original-Tree-7358@reddit
Brilliant suggestion
YearZero@reddit
RTX PRO 6000 workstation edition + as much RAM as you can afford.
MengerianMango@reddit
6000 + dual channel ddr5 sucks. Have tried. Do not recommend. Even Qwe3 235b 3bit quants suck on this setup.
I ended up spending another 8k to build a 12 channel ddr5 system (epyc). Deepseek is sorta slow but acceptable in the new setup.
For a strict 10k budget, OP is going to have to compromise: either smaller models or more work building. If he really has to run deepseek, then probably best to buy a bunch of 3090s and do it the janky way.
Past-Reaction1302@reddit
What was your build that worked? I’m wondering and looking as well
DustinKli@reddit
That will put him well over $10k very quickly.
Turbulent_Pin7635@reddit
M3 Ultra... It runs everything, suffers to produce videos. All the MLX files > 300Gb it will run with 15-40 t/s. Anything less than that 25-80 t/s.
I'm getting better answers with GLM 4.6 than I get with the GPT and Gemini paid versions.
abnormal_human@reddit
You’re missing a zero from your budget if you want to run that overparameterized pig of a model in any meaningful, usable way on a turn key system.
6x RTX 6000 MaxQ on a base system that costs your whole budget would do it though.
philmarcracken@reddit
cries in 8gig of vram..
HyperWinX@reddit
Mac Studio M3 Ultra with 512GB of RAM. It will be so damn fast
Narrow-Belt-5030@reddit
With only $10K and a dream to run full Deepseek 671B ... I would suggest API calls to a provider and/or rent hardware on need.
chibop1@reddit
Mac might be ok for inference with popular LLMs, but if you need to do dev work with PyTorch, you may encounter errors such as "NotImplementedError: Could not run xxx from the MPS backend." PyTorch can also produce inferior results compared to Cuda even when running the same model. Overall, MPS support in PyTorch still lags behind Cuda.
960be6dde311@reddit
NVIDIA RTX PRO 6000 + Intel Core 9 285K or Ryzen 9 9950X.
Denelix@reddit
usually would be a server CPU and like 1TB of ram + a crazy GPU buttt...... market lookin pretty bad rn
phido3000@reddit
Old Dual Xeon 6200 series/Eypc 768Gb of ram
Or just buy $10k of machine time.
takuarc@reddit
A maxed out Mac Studio is your best bet, especially that 512gb ram will come in really handy.
false79@reddit
Honestly if you are just exploring, I wouldn't go on the deep end. You would have all this tech under your fingertips and may not be using it to it's fullest potential because of lack of prior experience.
There are so many cheaper options to dive into before throwing cash at the unknown.
ZodiacKiller20@reddit
Better off spending 5K on a RTX 5090 machine and then use the leftover 5k for runpod.
That way you can train large models on runpod while still keeping your 5090 machine free.
Western-Source710@reddit
RTX 6000 Pro with the 96gb vRAM, room to expand to 2-4 GPUs, good processor, probably Ultra 9 285K or Ryzen 9 9950X3D if you aren't going server mobo, bunch of good ram, fast SSD. If you expand later on, add more RTX 6000 Pro with 96gb vRAM each. Four of them would be a nice 384gb of vRAM. :)
juggarjew@reddit
OP would want a threadripper rig at that point, quad channel memory and all the PCIe lanes you could ask for.
kc858@reddit
You can't run 671b at any usable speed for 10k lmao
giant3@reddit
It is better to build your own rather than buy a custom one.
Sooner or later you will encounter issues that you will have troubleshoot and better to get your hands dirty from start.
Also, warranties are 3 or 5 years for components, but only a year for most pre built systems.
_matterny_@reddit
The nvidia spark is an interesting option. But I don’t think anything can run the full 671B model sub $10k in a reasonable timeframe.
I could probably run it as a cpu model with a couple of xenon processors for $10k, but the response time is going to be so slow as to be meaningless.