Is it worth getting a 5090 for my needs?
Posted by BitGreen1270@reddit | LocalLLaMA | View on Reddit | 149 comments
I'm considering biting the bullet and getting a pc with the following specs:
- 5090
- Amd 9950x3d
- X870 motherboard
- 32gb ram (16x2) CL32
Obviously costs a bomb. But I'm hoping it will become cost effective over time (10 years probably) as I intend to use it to learn as much as I can about LLMs and ideate and work on use cases for them. I also feel the future is going to be LLMs in some form or other and it's better late than never to try and keep up.
My questions
- how does it perform with dense models like qwen3.6-27B and gemma4-31B. These are most likely the models I'll be trying to build applications around.
- The alternative is using adhoc compute resources on vast.ai or maybe spend more for Google cloud or something. But that gets expensive also fast. I can keep costs down by keeping it adhoc but that increases friction.
- My only application is LLMs. I don't play games or anything else that needs a gpu like this one.
_Divot_@reddit
I have the cyber power PC with the founders edition 5090 and I do not play games. I bought it for the capability. I’m building a digital twin essentially
Mordimer86@reddit
It's a bad time to buy a PC so if you spend so much be sure that it will make you real money.
Using cloud can be still more effective for a while. Just don't pick flagship models for easy tasks and check out Deepseek for example. It is really decent in many cases and a lot cheaper than Claude.
BitGreen1270@reddit (OP)
I know it's a bad time to buy a PC - but I don't think it's going to get better in the next 2-3 years (speculation of course). I don't think there's any contest between the cloud hosted frontier models and whatever I can run at home. But I do believe there's a lot of scope for identifying use cases that work reliably and dependably on a locally running LLM. I don't expect to make any money from this right now, but who knows, maybe I can hope to earn something via some consultancy gigs.
ovrlrd1377@reddit
You can still buy a PC for 20% of that price and spend the rest on good models, it will probably be a better deal overall
BitGreen1270@reddit (OP)
No doubt about that at all. Can't compare a local LLM with a frontier model. But I can't get into the nuts and bolts of a frontier LLM like I can with a local one. The learning is the biggest motivator.
Orolol@reddit
Then I would just rent GPUs to experiement. You can rent a 5090 for 1$ per hour, which means you can use it for nearly 2 years, 8h per day, 7day per week for $6000, and this mean this is free of charge for electricity, no headaches for configuration, material that break, etc ...
It won't last 10 years let's be honest. First, we don't know anything about what will be the shape of the LLMs in 6 month, let alone 10 years. 2 years ago, a 32gb Vram was a weird size, mostly useless for any serious work, you wanted at least be able to have a 70b.
In one year, maybe the the 5090 will be too small for the most popular llms, or maybe everybody will use CPU instead, or maybe it will still be the perfect fit, we don't know. Trying to rationalize a buy that expensive with hope it will be cost effective is a mirage.
If you have the money lying around and spending it is an option, then go for it, but consider this money gone, not invested.
I have a 5090 and i'm VERY happy with it, but I don't use it only for LLMs or Datascience, and most importantly, I add the money to spare, and I still can pay for cloud services, renting GPU, etc ...
xienze@reddit
And zero resale value. Whereas I think you'll probably be able to sell a 5090 for at least $2500-$3000 in two years.
Orolol@reddit
First , you compare prices with a period when prices were much more reasonable. Today, with very inflated prices, you have absolutely 0 guarantee to resale easily in the future.
Second, for the GPU yes maybe. But it's only half the price here. Ram, motherboard, CPU, all those will lose value quite quickly.
Thrift, you have to pay for electricity, which can quickly get expensive depending where you live with a 575W GPU, and you also don't have to pay for any maintenance.
Finally my calculations were for a 8 hours a day of usage, every day. When you're renting, you usually only spin up the server when you need it.
As I said, I've a 5090 and I'm very happy with it, but this isn't a good investment, it's a spending for a hobby.
sn2006gy@reddit
Supply chain is sold out through 2027 and inflation is up, we haven't seen prices come down yet - memory is seeing some chinese markets come up, but tariffs don't help that
Orolol@reddit
If you're 100% sure, I guess that you hoard GPU and ram ?
sn2006gy@reddit
No, I just work in the industry and have to know the supply chains are screwed because my customers are screwed. It's in the news as well. Nvidia is skippnig out on consumer GPUs this year, they reduced consumer GPU manufacturing by 50% (more i presume), Micron stopped selling ram to consumer markets and announces its entire inventory is sold out. Samsung, is sold out of ram so far in the future, that it can't make ram for Samsung phones. NVME companies that used to sell to consumers pulled out of the market - if you go look on amazon drives that used to cost 60-90 bucks are now several hundred and they're funny chinese names. My 7900xtx that i paid 699 for in 2020 is now worth 1300 bucks. My customers are moving to the cloud because servers cost too much - what used to casto 10-15k is now well over 50-60k - for the same hardware - just because the RAM on it and storage is so expensive. Are people really just not paying attention?
Orolol@reddit
Then why don't you hoard RAM and GPU if this is 100% sure ? This is the safest and most sound investment in the history of humanity.
sn2006gy@reddit
because i'm not an asshole
Orolol@reddit
Then why traders don't do it ? They're assholes.
sn2006gy@reddit
yeah, fuck traders.
WTF are you going on about?
Orolol@reddit
My point is that you say that you can predict the future price. If you can really do this, you should invest.
But the reality is that you can't, and you just have a feeling about future price based on past events.
sn2006gy@reddit
i'm not predicting the future, I just read the news where its already mentioned the supply chain is sold out. It's why Apple is only doing a 96gb M5 max pro instead of a 128,256 and 512gb like they did before.
The data is already out there, the future is already sold out and no one is building factories to fill that in
f5alcon@reddit
Yeah my electricity cost for a 5070ti and a 5060ti 16GB is 12 cents an hour while doing inference. It does around 90t/s on qwen 3.6 35a3b q4. That's 324k per hour. So a little under a million tokens for 36 cents. Deepseek v4 flash is 28 cents for a million output tokens and is a better model.
xienze@reddit
Prices are continuing to go up. You buy a GPU or RAM today and six months later they're worth more. In my example I was listing a card that went up in price, inexplicably, over the last year. You're banking hard on computer technology to follow the same trends it used to. But the environment has changed dramatically if that hasn't become obvious. There is no guarantee that the consumer market will ever go back to what it used to.
Look I'm not saying buying hardware is a way to make a return on investment, just that you can't think of it the way we used to, that in two year's time whatever you bought will be worth pennies. When there's perpetual shortages and manufacturers are drip-feeding hardware that barely improves on the previous generation at ever-increasing prices, the market gets weird. And so for my money, if there's a choice between a system that can run a model acceptably and two years of Claude, I'm going with the option that lets me recoup a decent chunk of the money when the time comes.
Orolol@reddit
And there's no guarantee it will continue to go up.
I'm not saying this. I'm saying that betting on you being able to resell perfectly your hardware in 3 years is very risky. You literally don't know the future. If tomorrow Openai is bankrupt, or if a new inference technology make Vram obsolete, or if a new llm architecture comes out that don't use matmul, etc ...
2 years of Claude is far cheaper than what I suggested. 6k is more like 5 years of Claude max. And Claude give you access to far better models.
viewofthelake@reddit
Where I work, our purchasing guy is saying that he doesn't expect computers / computer part prices to come down until 2030.
He thinks CPU prices will be the next thing to spike.
Mordimer86@reddit
China is ramping up memory production and other manufacturers will also likely increase production in 1-2 years. It just requires time to do that. So the prices should improve.
brianoh@reddit
Compute will always be needed. It hasn’t hit a plateau and imo possibly never will.
Jonathan_Rivera@reddit
I'm using 43GB of ram for windows + processing on a 5090.
BitGreen1270@reddit (OP)
I assume you meant 32GB. How is the performance on it?
Jonathan_Rivera@reddit
My system has 64gb of ram. I think 32gb would choke a bit or you would have to aggressively cut processes. With 64 it runs well. VRAM will never be enough. You’ll always want more lol
youngbitcoino@reddit
Yo! I built a PC with similar specs for gaming and local AI: RTX 5090, R7 9850X3D, 128GB DDR5-6000, 1TB + 4TB NVMe SSDs. The whole setup cost me 7500€.
Performance is fantastic. With llamacpp I can run Qwen3 Coder Next 80B A3B Q4 at 600+ t/s pp and 60 t/s tg, and bloody Qwen3.5 122B A10B Q4 at 500 t/s pp and 25 t/s tg. Dense models are a lot slower, obviously. I ran a quant of Qwen3.6 27B at 25 t/s tg but I can't remember *which* quant.
Point is... I did it because I had lots of money to burn and wanted to gift something to myself. It's definitely not a good use of your money compared to, say, 10 years of a decent cloud model subscription at 30€/month.
Clearly you're at the mercy of the cloud providers, who can decide to aggressively quantize your model or rate limit you or what have you. But you can always switch to the next one.
BitGreen1270@reddit (OP)
Thanks for sharing - that's an amazing build you got. Don't have any doubts that cloud will be cheaper, but I do want the flexibility and convenience to learn. Pretty sure I can't escape cloud models, will need to be a mix of both.
youngbitcoino@reddit
Happy to share. Very satisfied with my build. 😁
You do get a lot of flexibility and of course privacy as well. A prosumer build like mine nets you insane speed with things like SDXL finetunes (image gen) at mere seconds per image, Wan 2.2 FP8 (video gen) at 2m per 30fps second more or less, Flux 2 Q4 and Qwen Image Edit Q8 (image editing) at less than a minute per edit, ACE Step 1.5 (song gen) at 1m per 3m song, coding LLMs wrapped in agentic harnesses like in my previous comment and even deep research tools like DeerFlow.
A colleague, conversely, used Claude 4.6 Opus to do a complete refactor of a large piece of our backend and it was near-flawless. We laughed like little kids as we inspected the result. I think we can all agree on the fact cloud models are the go-to for architectural design choices and large refactors, while anything below that complexity is easily done by local models.
billy_booboo@reddit
Consider two AMD cards and more CPU RAM instead.
Silver-Champion-4846@reddit
Can AMD gpus be used flexibly like Nvidia ones, i.e training all other neural nets not just llms? Or is the software support just for llms?
billy_booboo@reddit
They certainly can! The challenge is if someone makes their cutting edge research paper in cuda, then you can't use it right away, so you just go port it with 27b and kimik2 or whatever.
Silver-Champion-4846@reddit
Ugh. Nvidia got the crown so they make the rules. Nasty baggenses
billy_booboo@reddit
Meh, these days thanks to vibe coding it's not nearly as big of a deal as it was.
Silver-Champion-4846@reddit
Vibecoding doesn't reach meticulous human engineering grade.
billy_booboo@reddit
It does if you apply meticulous engineering logic. The thing is, with vibe coding people who understand the problem can iterate quickly without understanding the exact quirks of the APIs. It basically erases all the bullshit overhead of a porting project like that.
Silver-Champion-4846@reddit
I guess you're right. But will those people get down from their purch and actually help?
BitGreen1270@reddit (OP)
It sounds a lot more work. 64GB VRAM will be like a dream come true, but I do want to stick to nvidia so I can also explore more on the transformers API.
billy_booboo@reddit
Idk, maybe it will be more work but I think the gaps are closing and that it will be worthwhile
ImportancePitiful795@reddit
Given the price of the 5090 today you have 2 options.
Either use 2x R9700s for more VRAM running larger models or get RTX6000 96GB.
There is no middle ground here and 5090 ain't worth the price tag it has today.
BitGreen1270@reddit (OP)
Rtx pro 6000 is much more expensive. Even the rtx 6000 is way more than my entire build. I'm pretty sure it won't last 10 years at current capacity, i just think it might be usable in many forms for 10 years.
ImportancePitiful795@reddit
They are selling you the RTX5090 for $4500+, considering how much the rest of the parts cost on the street.
Imho is not good value.
So the whole point is what you want to get out of your system. And no do not look for future value.
Can get away with 2x R9700s (around $2600 both) having 64GB VRAM running twice bigger models than with 5090 at half price.
Can get a DGX Spark, and still have money left, if you only want the machine for inference.
You can also get a Strix Halo miniPC with 128GB RAM and a R9700 to hook externally. If you want something to work as normal x86 desktop/workstation, have 128GB unified RAM.
And the worst part, 128GB DDR5 today costs north of $2000. Which you can buy basically a whole miniPC with much faster RAM for the same money. The 32GB you have selected are too low. Need 64GB bare minimum. But given their price, is ridiculous.
tecneeq@reddit
I have a Strix Halo 128GB and a 5090. The 5090 is fast, but lacks precision, you have to use quants. The Strix Halo has all the precision, but lacks speed.
I would say start with two Intel B70, get a board that has enough PCIe and NVME-Slots so you can add 2 more with time. CPU and RAM isn't critical, if you have your stuff running inside the GPU 100%. With two B70 you can. Should be way faster.
Use Vulkan and llama.cpp.
billy_booboo@reddit
I'd argue that RAM is critical for MoE models. You can run MoE models with weights offloaded and this is a superpower the unified systems don't have.
u23043@reddit
The strix halo devices dont need to offload for MoE though because they have more than enough memory for any model that can run at decent speeds.
billy_booboo@reddit
My point is that system RAM does still matter for a system with discrete GPU.
tecneeq@reddit
There is no need to balance, the operating system does that. Assign max memory to VRAM, in my case 126GB and 2GB to RAM. If you don't use the 126GB VRAM, but say only half, the rest is available for RAM.
billy_booboo@reddit
That's what I mean, you me to tweak your operating system if you actually want to pull this off using e.g. vllm with generous batch parallelism and multiple models at once as well as enough room left over for running rustc etc in the background.
DigitalguyCH@reddit
How about using both together? I currently have a 20GB AMD card and I just bought a Strix Halo mini pc 128GB for $2300 (not shipped yet). Wouldn't it be faster to use both via thunderbolt? (I have an egpu I am currently using with an AMD laptop, 8840u and 64GB RAM)
BitGreen1270@reddit (OP)
On the spectrum of PC building as an interest, I consider myself in the middle. I'm okay with maintaining a standard PC, but getting into multi-gpu setups and then figuring out the power requirements and non-standard enclosures for them starts getting too tedious for me. While definitely it will perform amazingly well, I am pretty sure I won't have the ability to maintain it.
sn2006gy@reddit
The 9950x3d is wasted if you're not playing games. 32gb of ram is probably not enough, you usually need about 1.2x your VRAM just to run models efficiently and 48gb is such a weird mix i'd suggest 64gb.
The real question is what type of work? that system should chew up and spit out tokens on qwen3.6-27b like no ones business but if you're coding you may not be happy with some of the quants
Silver-Champion-4846@reddit
Wouldn't the 9950x3d futureproof them in case ternary models get better?
sn2006gy@reddit
Not really future proofing unless they're really into data processing on something like .2b or .8b models. If you're doing something like high throughput vectorization where the 3d cache can hold that in cache then sure, but for "LLMs" as chat/coder/agentic workload i'd just go for core count and memory speed.
Silver-Champion-4846@reddit
So there are no architectures prioritizing 3d cach as their working environment eh?
sn2006gy@reddit
nope, the bottleneck is always the memory bus. Again, if your doing vector encoders at .2b it can scream pretty quick with 3d cache. Now, if you have Genoa-X with 1.1GB of 3dcache that's a different story.
Silver-Champion-4846@reddit
AMD Server cpus comparable to Intel Xeon?
sn2006gy@reddit
EPYC yup
Silver-Champion-4846@reddit
Would probably eat as much electricity as it takes to run my sanity
sn2006gy@reddit
yup, that's the crazy part... even if prices collapse on hardware, the electricity cost is more than what we'd pay for subscribing to services.
Silver-Champion-4846@reddit
I wouldn't be surprised if I bring one of those high-tdp desktops and the local grid self-destructs
BitGreen1270@reddit (OP)
Would you recommend a downgrade to 7800x3d instead? Some savings to be made there which could be used for 64gb ram. I do plan to do coding among other things. I don't have a fixed use case other than experimenting and trying to identify opportunities.
sn2006gy@reddit
sure.. that's a fast CPU, i'd probably go for more cores on 9000 series vs x3d but definitely more ram if you can do ram vs x3d if not gaming.
Narrow-Belt-5030@reddit
I have an AMD 9950x3D 192Gb Ram and a 5090.
It can easily run those models - vLLM / NVFP4 models work like a charm.
Your proposed system will work fine - the RAM (32Gb) is a bit low, but it depends on what other models you might like to experiment with.
BitGreen1270@reddit (OP)
Yea - I think folks here have convinced me to downgrade the CPU and upgrade the RAM. I don't know of any other models that I want to work with, right now, these two sound good to work with.
Narrow-Belt-5030@reddit
That's probably a smart move - unfortunately RAM prices are insane. Mine (4x48Gb pack) was 2700AED ( $735 ) and now its 10,000AED ( $2700 ) ... nuts - 4x ...
BitGreen1270@reddit (OP)
Yea I was thinking I could just get 4x16 GB which is cheaper than 2x32GB 😞.
youngbitcoino@reddit
Unless you get a server mobo, 4 RAM sticks will be unstable. Consumer mobos are dual-channel even when they have four slots, so get 2 sticks instead.
BitGreen1270@reddit (OP)
Oh that sucks, thanks for letting me know 😕
youngbitcoino@reddit
Yeah, that's why I went for 2x64GB. 😁
u23043@reddit
4x16 (2DPC) will be slower, maybe much slower than 2x32 (1DPC)
relmny@reddit
How are you able to run nvfp4 in a 32gb?, specially since some are 8-bit.
I was trying actually today to run them (qwen3.6-27b/35b, gemma-4, etc) in sglang with an rtx 5000 ada, and there was no way. The only model I could run was qwen3.5-9b and with about 16k context
Narrow-Belt-5030@reddit
The actual model I use currently is nvidia/Gemma-4-26B-A4B-NVFP4 with 128K context.
relmny@reddit
yeah, I tried that one (and some redhatai and another nvidia and a few more), I don't remember why this one failed, but on that gpu I wasn't able to run any of those, even when the folder size looked like it should...
I haven't tried vllm (only sglang), I might try it, or just stick to llama.cpp/ik_llama....
Narrow-Belt-5030@reddit
FWIW I don't run a monitor off my 5090 (no display) so that helps re vram.
relmny@reddit
yeah, me too, nvtop graph is always flat at the bottom when a model is not loaded and no processes showing up there... anyway, I'll give a try at vllm, but good to know that someone can run it.
Btw, how does NVFP4 feels like? like a q4 quant or higher?
KFSys@reddit
The 5090's 32GB VRAM handles both of those model sizes without issue — qwen3-27B and gemma-31B fit comfortably at reasonable quants, and you'll get solid throughput on that card.
The buy vs. rent math really comes down to usage frequency. If you're running inference for hours every day, owning hardware pays off over a few years. But 'I want to learn and ideate on use cases' sounds more like sporadic workloads, and for that, cloud GPUs are honestly cheaper once you account for the upfront cost. I've used DigitalOcean's GPU Droplets for heavier stuff I don't need running constantly: spin up, run the job, shut it down. No idle hardware, no depreciation.
If you're still figuring out your actual workflow, I'd start cheap on cloud for a few months before dropping $5500+ on hardware. Once you know exactly what you're building and how often you need it, the ownership argument gets a lot stronger.
ea_man@reddit
Buy a used PC with 2 used GPU, if in year it's not enough you sell that for almost the same price and get current gen. I could do something decent with \~1000 for starting out.
nakedspirax@reddit
For that price just go a strix halo with 128gb of vram
billy_booboo@reddit
Don't do it, you'll be disappointed.
nakedspirax@reddit
I'm not disappointed. Kinda love it now. Low power, big inference.
sn2006gy@reddit
They're ok for chat, but for coding, the prompt processing is so slow, with a 128k context you can wait a minute between turns
nakedspirax@reddit
You are right with speed, no arguments there. And it's like more than a minute. A session prompt can take almost 2 hours lol.
I bought the strix to get quality over speed. I use it for coding and can run big ass models. Minimax 2.7 and all. Currently running qwen3.6 35 Q8!! with 2 parallel slots on the strix with 250k context window each. So context window 500k in total.
I use it to code. I set the tasks and walk away.
BitGreen1270@reddit (OP)
Really want to stick to nvidia to be able to learn more.
lukistellar@reddit
What exactly? If it is only about optimizing and building local inference systems, you might be totally OK in avoiding Nvidia. Everything I know about local AI and the ecosystem, I learnt on an old RX580 8G, in combination with Qwen 3.6 35B-A3B. Will try with cheap used RDNA2 GPUs via Pipeline Parallelism next, to hit higher speeds and try my way with dense models.
nakedspirax@reddit
NVIDIA is great for image and video generation. Other than that. Other chip manufacturers are catching up.
HolidayPsycho@reddit
To be honest with you, if you need to come to reddit to ask this question, you don't know enough LLM to make the purchase useful. Just use cloud services.
JohnBooty@reddit
If you're doing it to save money vs. using commercial AI providers, HELL NO... not worth it. You'll be restricted to 27B-ish models running rather slowly compared to far more powerful 400B+ models running on cloud providers.
However...
Local AI does guarantee you independence. Which is important to many, including me.
tecneeq@reddit
GPT 5.5 High costs $12 per 1 million tokens. I burn 30 million a weekend. Do the math.
BitGreen1270@reddit (OP)
Definitely will need to shop around for cloud models as well. With cloudflare actually laying off employees as redundant because of AI, I'm fairly certain next year the token costs will be much higher.
MRDR1NL@reddit
Local will only get better and cheaper. Cloud will only get more expensive. It's always the same with subscription models. And long term calculations always forget about it.
JohnBooty@reddit
Let's stay grounded.
GPT 5.5 is allegedly a 9.7 trillion parameter model, so that is not a reasonable comparison at all. You're not serving up anything like that in your home lab.
This is not a knock on local AI. I'm in the planning stages for my own right now. It sounds like whatever you're doing meets your needs, and that's badass.
BitGreen1270@reddit (OP)
Haha - absolutely not to save money. Honestly, I'd probably save more money just renting GPUs on vast.ai on demand. And can't really compare with gemini, chatgpt, deepseek, claude. But I'm so fascinated with how useable qwen3.6 and gemma4 are today. It can only get better. Want to start building things on top of those.
fluffywuffie90210@reddit
As someone with 3 5090s (bought before price went crazy) Go for 2 x 5070ti if your spending that much money, still get the vram to run gemma 4/qwen and its not too much slower if you get a mb with 2 pci slots.
bebackground471@reddit
within this price range and not into gaming, and with the goal of exploring LLMs, maybe consider an Nvidia Spark or Asus Ascent GX10. Yes, they are slower, but have a whooping 128 GB shared RAM. So you can use MUCH bigger models.
txoixoegosi@reddit
Stick to vast.ai or other providers if you want to experiment. Meanwhile, speed up your workflows leveraging on claude/codex.
Once you know what you want, what you do NOT want, and the resources requires, you can move on to buy what you NEED.
BitGreen1270@reddit (OP)
I have been using vast.ai for the past 2 weeks. It adds a lot of friction to startup, reliability, choosing an instance etc. Maybe I should look for other cloud providers or stick to only datacenter instances and see if that helps.
txoixoegosi@reddit
runpod is a bit more expensive but less friction for what I have read
AdamDhahabi@reddit
Qwen3.6 27B Q8 quant, full context, vision enabled and MTP enabled needs \~52GB VRAM. So 32GB VRAM only makes sense if you want to go down to Q4 quant. Not good for coding.
cmndr_spanky@reddit
Honestly it might not be worth it at all. If your use case is experimenting with wrapper software that uses LLMs for inference or coding.. and specials if you’re just trying to teach yourself stuff. No. I would not blow $6k on a rig and you will absolutely save money and have an easier time using cloud models or cloud infra here and there. You can also use google collab for free within certain limits.
That said ,you didn’t say anything about your actual use cases other than vaguely “learning”
BitGreen1270@reddit (OP)
That's a fair question. Things I want to learn more are fine-tuning, bulk tokenization, personal automation (not openclaw). I would also like to learn more on transformers and start implementing from ground zero. I've been coding for fun and work for 25 years, so there's a lot I feel I could explore. That's about as concrete as I can come up with right now.
cmndr_spanky@reddit
Fine tuning and LLM training changes my perspective a bit. Here’s my advice, buy a much cheaper PC with a cheaper 12 or 16gb GPU or even a Mac mini with 32gb ram is a good option.
When you write scripts for fine tuning and even raw LLM training, you can use your local hardware for a lot of experimentation and trial and error and just keep the params small enough to fit in 16gb or 32. Once you have a working concept and you want to see how your model trains at a larger scale. Pay once for a big cloud GPU to try out your training script on higher params.
But that said if dropping $6 to 8k on a 5090 rig is nothing to you… go for it
nacholunchable@reddit
Honestly, if you want raw speed, theres nothing wrong with going that route. But just know after a few months theres a good chance youll be hungry for memory. If you are sure you just want it for llms, or even AI in general, theres nothing wrong with going with a stryx halo box or a dgx spark. You will not get the same speed as a 5090 due to memory bus speed limitations, but having 128 gigs of unified memory opens a lot of doors 32 gigs of vram keeps shut. Itd really chug with dense models tho, but there are some really nice MOE options in that size range if youre willing to come to the dark side, and for like half the price. Upgradability is mid (tho sparks can cluster), but its foolish to future proof in the current market anyways.
BitGreen1270@reddit (OP)
I haven't looked into the 128GB strix halo options. Honestly, my goal is mostly learning and that also means I want to try and get into fine tuning models as well. I just prefer sticking with nvidia because it's got the most support and easiest to try out things with.
FinalCap2680@reddit
If you want to learn and just starting, by the time you will use the full capacity of 5090, it will be quite obsolete. You can start to learn on a second hand workstation with cheaper DDR4 and a single or dual 3060 12 GB cards. It will be slow, but fast rig will not make learning fast or easy. If you go second hand - buy local, so you can check what you are getting.
Anyway, I think your decision to go with Nvidia is correct, but get more RAM and larger modules, so you can upgarde it.
relmny@reddit
As others say, and I fully agree, after VRAM, if you consider offloading (and you might if you need more context), RAM is the next. Period. And 32gb doesn't leave you any room for offloading.
Btw, are you gonna build it yourself? because sometimes already built PCs are way cheaper (unbranded, ofc)
ConsortFromTOS@reddit
You can do 4B models on a cheaper graphics card. Try that.
These smaller models will only get better, so there might not even be a need to run larger ones in the future.
I used my 5070ti to automate reddit comments (proof in my comment history) as just a test run, and as you can see, it did well.
JumpyAbies@reddit
I was in the same position as you, unsure whether to build the PC or not. I'm developing some AI projects and studying model training in more depth. My main goal is for this PC to help me make money, but at the very least it will be very rewarding to be able to delve deeper into model architecture. If I manage to make money with this PC, it will be doubly good.
I still need to buy the memory kit, which is the most difficult part, and I don't have the option to buy it in my region (it costs three times as much), so I'll have to import it.
This is my spec:
AMD Ryzen 9 9950X3D
Gigabyte RTX 5090 AORUS MASTER 32G
ASUS ROG CROSSHAIR X870E DARK HERO
DDR5 96GB (48GBx2) 6400MHz CL32 1.35V AMD EXPO SK
Samsung 9100 pro 4TB PCIe 5.0 M.2
Fonte 1600W Cooler Master V Platinum V2, ATX 3.1
Water Cooler Corsair Icue Link Titan 360 RX LCD, RGB, 360mm, Preto, CW-9061023-WW
Due_Duck_8472@reddit
I would suggest 2 x 5090 32GB, that's a growth multiplier. If you made 50000USD/mo on one GPU you will have the potential to make 250000USD/mo on 2 GPUs, the ROI is insane. Currently joggling 25 apps on App Store, bringing in close to that each month after. Just be prepared to put in the long hours, it took more than a quarter for me to go from 1000USD to 50000USD/mo.
MelodicRecognition7@reddit
what about spending these 6k USD on a better GPU like Pro 5000 48GB + external harness? Check /r/eGPU/
BitGreen1270@reddit (OP)
I checked - my laptop doesn't support thunderbolt or oculink 😞
Dany0@reddit
That changes things. Sell your laptop, you can buy a cheaper laptop for remoting into your desktop and browsing/travel. Invest that money towards your workstation
m31317015@reddit
USB4 works.
MaruluVR@reddit
If it has a nvme slot you can get a oculink adapter for 10USD
MomSausageandPeppers@reddit
Given your edits, I would not optimize this build around the X3D CPU. For LLM-only work, the order of spend is usually VRAM first, then system RAM, then fast NVMe / cooling / PSU stability.
A 5090 makes sense if the goal is low-friction local iteration and you accept that it is not going to beat frontier cloud models on raw capability. It makes much less sense if the goal is saving money versus APIs or rented GPUs.
I would do 64GB RAM minimum, and 128GB if you expect long-context RAG, multiple services, or CPU/offload experiments. A cheaper non-X3D CPU is fine unless you also game or do CPU-heavy workloads. The 32GB VRAM is the real value here: for 27B/31B models it buys you better quants and more context headroom than a 4090-class setup.
Before spending 5500-6000 USD, I would rent a similar 32GB GPU for a week and run the exact workflows you care about. If the friction of renting breaks your habit loop, buying starts to make sense. If you only use it occasionally, cloud/rental wins.
BitGreen1270@reddit (OP)
This is a very good suggestion. Yes the biggest issue I have is with the friction and having used vast.ai the friction is quite high. High latency and random downtimes. I think I've had like 1 out of 3 rentals being reliable on every session. Yea the occasional vs frequent does makes sense and spending money for a week to test this out also makes sense. Maybe I shouldn't jump the gun and try other options a bit more. Thanks for taking the time.
Dany0@reddit
There is a pricier competitor which I know a lot of ML people use (please bear in mind all the ML people I've known IRL were university people who were doing it long before deep learning/transformers/gpt)
From what I've heard they're very reliable
I will not name it because I don't want to advertise something which I personally cannot vouch for, but take it as a hint at where to go research
10 years is going to make the 5090 utterly obsolete. You'd be better off upgrading every release cycle while selling the old card/parts to recoup part of the cost
Renting in the cloud has some obvious downsides but if cost is a factor there's a reason most researchers rent. There are nicer options if you have a large budget, you can rent data centre space and colocate a rig of your own. Oh btw years ago some people were using rent to own programs but I imagine those are probably all long gone lmao
If you want a real workstation, even the 5090 (which I have so speaking from experience) is really only going to be good for making "sLM"s which are still plenty good for actual hardcore first principles research but are no good for pursuing a general LLM et al.
What you really want is the rtx 6000 pro. Invest into VRAM first like the others said. Still, it's better to start small
If you don't care about electricity cost or DIYness there are other good options. You can get a crapton of P60s or MI50s and build an insane build. It would probably be only good for inferenece though, and expect a lot of problems. A better option would be 6x 5060 Ti build. That will get you more everything. But multigpu is a headache for research especially without nvlink. Don't ask how I know but it's hell. You always always want to experiment small, then scale up, GRADUALLY
m31317015@reddit
IMO:
For 5090, you have to consider:
- the case (heat management & the 16 pin bullshit connector routing issue)
- the shits (some cards are definitely flawed, do your research)
- the gain (afaik it's 50-80% faster than my 3090 on Q4_K_M with gemma 4 31B with 100k ctx)
- the VRAM trap (if you don't care about speed, and power bill ain't that big of a problem to you, quad 3090 + threadripper / epyc might be better but depends on your workflow)
- the alternatives (saw you want to stay in CUDA in another comment, but if budget is your concern maybe try R9700)
And lastly, no it will not become cost effective over time, the fact is in the near future if you need to keep up with the game (w/ speed and ctx), now is the most terrible time to build a PC for that, albeit prices dropping slightly in the past month or so. It will stay like that for at least a year or two so prepare for the ice age, either by building the igloo yourself with enough resource, or save up and pay protection fees regularly to owners of shelters, the giants.
m31317015@reddit
It may be a good idea to buy a 90 degree adapter if your psu is , mind the rotation of the plug on your specific card.
Consider Wireview Pro II / Ampinel if you concern the uneven voltage.
Creative-Type9411@reddit
whatever you get if you want it as future proof as possible, if you're going to be doing any offloading to cpu/Ram make sure it's current gen and highest speed, because any offloading and CPU/Ram speed is the bottleneck and will dictate how many tokens you get out
BitGreen1270@reddit (OP)
Yea am looking into that. I don't think I'll be able to super-optimize it to that level. I'm already pushing my budget way way ahead. I'll probably try and get 64GB RAM and see what I can get in my location.
Creative-Type9411@reddit
my point is to get the fastest CPU you can possibly get that has the highest memory bandwidth, and then as you add RAM at the highest speed ram that you can slot, if you go with a lower speed, you'll have to replace everything to upgrade, so stay within your budget, but get the right pieces that as you build out in a few years, you can still add Parts to it and it won't be a complete waste of money
Master_Studio_6106@reddit
If you're not gaming, then the x3d cpu is a waste of money. Also, the rtx 4500 pro blackwell (also 32gb vram) is cheaper than the 5090 in some areas.
GestureArtist@reddit
Backstory: My current windows PC (Main PC) is an AMD 9950x3D with 256GB of RAM and an RTX 5090 FE. I built to replace my previous PC, a 13900K with 96GB of RAM with an RTX 4090. The 13900K had degraded so I contacted Intel for an RMA. The RMA went smoothly but I decided I'll just upgrade to the 9950x3D since it had just come out at the time.
I ended up with a 13900K system laying around so I thought I should run linux on it and experiment with AI since I knew nothing about how AI worked at the time. I'm an old computer user from the dos days and I always like exploring and understanding new things in the computer field which became my profession. So AI was one of those things I really kind of ignored as I focused on other things but here I had this PC. I figured it was time to set off on a new adventure and learn all about it.
So I bought a second RTX 5090 FE for the 13900k. I sold the rtx 4090 on ebay for the price of the 5090. Yay free upgrade.
Fast forward a bit... I replaced the 5090 FE in the linux pc (AI PC) with an RTX PRO 6000 Blackwell Workstation card. I've also since replaced the 13900K but that's not as important to the story. If you're curious where it is, it's now in my file server which had really old aging hardware.
Ok that's the backstory.
Let me tell you about the 5090 FE and answer your question. I'm sure others have already chimed in about this but VRAM is the problem. The RTX 5090 FE is a FANTASTIC GPU. Powerful, fast. It'll do all your AI needs... that is until you fill up 32GB of VRAM, and you will fill it up fast.
There is a reason why AI hardware is in such demand. You too will experience that "thrist" for better hardware rather rapidly if you catch the "bug" and want to take your AI journey further.
Do not think of the 5090 as a 10 year investment. For AI It's honestly a dead end. For gaming and even content creation, it'll last a long time but for AI... it's kind of crippled due to it's VRAM limit.
Consider this. Most 27B LLM models are "ok". They'll impress you but they wont be as good as the larger models or the frontier models. You'll fit 27B into Vram and have a little VRAM left for context... but that's about it.
You see the problem is the 5090 will give you a taste of what is possible but it wont really do much more than that. It can do a lot, especially if you're generating images or even video but you will soon realize that while a 5090 is about as fast as an RTX Pro 6000 Blackwell GPU in terms of processing.... the big problem is the 32GB of VRAM that you're limited to.
Now I would not expect it to be worth it for AI over 10 years. I'd plan to sell it when the next flagship RTX gets released and upgrade to that because for AI, things are moving so fast.... and you're starting at the bottom already far behind what is actually possible on a local machine. There are people running 8 RTX PRO GPUs in machines locally in workstations to do far more complex things and they still dont even approach what the giant servers are capable of.
So yes you COULD get a 5090 and learn today. You will learn a lot quickly but you will just as quickly learn that you're limited by the card's VRAM.
Now an RTX PRO 6000 maybe out of your budget so the 5090 could be worth it. After all it's better to start learning now rather than later. However it will only take you so far. Consider that waiting a little longer maybe the best move if you're serious about local AI because newer GPUs are coming and they likely will have more VRAM.
If you want to learn local AI, it's going to be expensive. the 5090 is already expensive... but it's not even close to the price of RTX PRO which is better suited for local AI and workstation tasks.
So perhaps save your money and subscribe to one or two of the best AI services out there. I subscribe to ChatGPT. I also have Google Gemini as well.
They will do far more than your local AI will ever do but there is value in learning how to setup AI at home, how it works, etc. There is freedom in it that none of the AI services will allow you.
So if you want to learn how AI works, how to set up AI systems, how to train them locally... local AI is a great adventure that's worth paying for but it requires AT LEAST a 5090, or a DGX Spark or similar device. It starts there... The sky is the limit. You will want more powerful, capable hardware. It's unavoidable. Again the 5090 is a great start but 32GB of VRAM is a limitation that you will find yourself confined by.
Liringlass@reddit
There is two possibilities people mention.
One is the AI bubble cracks and hardware becomes cheaper. Could be that an AI company crashes or that electricity becomes the bottleneck, not the chips.
Another is that it doesnt end anytime soon and so now wouldnt be worse than in a year or two.
I can't see the future but I hope this helps you
farkinga@reddit
I am experimenting with 2x 5060 ti. Tbh it's running 31 and 27b dense models at satisfying speeds (1000 t/s PP, 30+ t/s tg).
It's a compromise on performance and it's harder to use 2 GPUs than 1. But you get 32gb for a fraction of the cost of a 5090.
No question the 5090 is the better value - but it sounds like you're on the fence about a huge cash outlay. Well, this is one way to test it out first. And if you want a 5090 in the future, the 5060s will have good resale value.
Theverybest92@reddit
Use claude 4.6 sonnet or opus. Those local models are as good as jr dev at best. If you not a jr dev then stop coding and do some learning first instead of having models code for you and debug and code and debug indefinitely.
MaruluVR@reddit
I have one and for image/video gen and training models in general it makes a big difference but if you are only going to use it for inference I dont think its worth it. There are other ways to get 32gb of VRAM for cheaper and the speed advantage while nice isnt that big in inference.
TinyFluffyRabbit@reddit
The 9950x3d is overkill if you’re primarily interested in using this for AI. You’re generally bottlenecked on memory bandwidth, not CPU compute. Also the x3d cache doesn’t help much for AI inference, unless this is also your gaming PC.
NeytotheNey@reddit
For performance, with my 5090 in Llamacpp with the MTP branch I’m getting about 90 t/s at 150k context using Qwen3.6-27B heretic MTP Q6_k. Without MTP you’d get half the tokens per second. I saw someone mention the AMD R9700, I actually have that one too and I get about 40 t/s and I squeezed in 175k context with the same model using MTP. I’m happy with both and if you want to save some money I don’t think the R9700 is bad, especially with MTP. Just know that there is a bug with vision at the moment that causes Llamacpp to timeout if you go the MTP route. At least that’s the issue I’m having right now.
disgruntledempanada@reddit
I've had that system for a few months...
Tried a lot of local LLM stuff on it and it's just too limited. Qwen 3.6 is a fun toy but once you use Codex it isn't even fun anymore.
Maybe I'm just working on bigger projects and the light model would be sufficient for a lot of people but I find the VRAM limitation to be huge and anytime anything has to be pushed to the 9950X3D it's just a joke. Not enough memory bandwidth.
I have found it to be great with Codex, the model does the thinking and the PC does the rendering and processing work. Anytime I try local stuff on it I just get frustrated.
tenebreoscure@reddit
A 9900X will give you the same performances memory wise if you plan to offload MOEs layers on RAM, has the same memory bandwidth because of the two CCDS. An Intel core ultra 270K would give you even better memory bandwidth at the cost of E/P core complexity and a dead platform, since nova lake will require a new socket. With that money saved you can afford 32GB more maybe, 32GB of RAM are barely enough to work with any AI application, 64GB would be way better.
As others have written, cloud services are way cheaper, and you can experiment with them as easily, even small models via openrouter. Local AI use cases are privacy and reliability of the service, i.e. no models disappearing after an upgrade, no sneaky quantizations of weights or KV cache.
BitGreen1270@reddit (OP)
Yea the 9900X does save a decent amount of cash. I can save that for RAM. Definitely on the cloud services, I'm going to use them as well. Wanted to have both options available.
Herr_Drosselmeyer@reddit
10 years? This space moves so fast that I feel predicting even two years ahead is foolish.
BitGreen1270@reddit (OP)
Well the 3090 is in great demand right now. Anyways, the idea behind saying 10 years is that I probably won't invest in another PC for 10 years. LLMs also will have to get better and faster on existing hardware, so I'm hoping it will last.
Herr_Drosselmeyer@reddit
Could be. It's just very uncertain is what I'm saying. It won't be like the past 10 years in gaming where a 1080 actually did last you 10 years.
BitGreen1270@reddit (OP)
100%. I know it's a large amount of money that I'm risking. Hence the hesitation.
tecneeq@reddit
I don't think a 5090 will be obsolete in the next 5 or so years. Look at the 3090.
jojotdfb@reddit
So, Qwen3.6-27B is cool and all but Qwen3.6-35b runs like a champ on a 5060 ti 16gb. Good enough for basic dev work. You can always upgrade later when prices come down or something better comes out.
BitGreen1270@reddit (OP)
that's what I'm running on my laptop right now. With MTP I'm getting about 25 tps on my laptop which is quite decent. Half my brain is telling me not to spend the money. Hence the question here.
Azibo98@reddit
I bought an Omen laptop last year specifically to get into AI properly and it was genuinely one of the best investments I made. Not at 5090 level spend but enough to run things locally and experiment.
What changed for me wasn't the hardware though, it was actually starting to use AI tools seriously, building with them, not just chatting. I ended up building Socrate (usesocrate.com) which is a live AI product, just me, no coding background. That came from curiosity and a decent machine, not a $3k GPU.
For your use case, running 27B+ models locally does need serious VRAM so the 5090 makes sense if that's the specific goal. But if you're still in the learning phase, a cloud API + a solid laptop gets you surprisingly far before you need that level of hardware.
BitGreen1270@reddit (OP)
That's such a cool idea, thanks for sharing. I agree I don't really *need* a system. I can just plug in the tools with the online cloud models and get a pretty good experience straight out. But it is primarily for learning and I feel I'm pretty bottlenecked on my current laptop.
Btw, are you paying for the token usage on usesocrate.com yourself? Must be costing a bit with the free tier?
Azibo98@reddit
Yeah I cover it myself. There's already a paid tier live, just focusing on getting real users first before pushing it. Had my first email campaign go out today and just launched the Instagram page too, starting to pick up momentum.
Azibo98@reddit
Yeah I cover it myself. There's already a paid tier live, just focusing on getting real users first before pushing it. Had my first email campaign go out today and just launched the Instagram page too, starting to pick up momentum.
shokuninstudio@reddit
It's a bad time to build but also prices aren't going to fall for more than a year. There's no weakness in demand and even the 3090/4090 have gone up in price a lot. So it depends if you're happy to spend current prices.
BitGreen1270@reddit (OP)
Yea I figure I'm paying something like 1000 USD or more as a premium versus same time last year. That can't be helped. I considered getting a used GPU, but I'm not confident that I won't get ripped off. Obviously a lot of hesitation before spending this much money, hence asking here.
shokuninstudio@reddit
Even used is priced very high. If you go that route choose a place that gives a warranty. In the UK we have CeX which has 5 year warranties on used hardware.
BitGreen1270@reddit (OP)
No warranty here. I got ripped off on a 500$ used laptop.
samoxis@reddit
Running qwen2.5:32b and gemma4-31B daily on a 4090 (20.8GB VRAM). Both fit fine at Q4_K_M with 8k context. 5090 with 32GB VRAM would be a big upgrade — you'd fit larger quants and higher context without compromise. One thing: 32GB RAM is tight, bump to 64GB. And the 9950x3d is overkill if it's LLM-only, save that money for RAM or storage.
BitGreen1270@reddit (OP)
Oh that's good to hear. Yea based on the feedback I'm thinking of going for a cheaper CPU, maybe non x3d.