Save and invest your money for future rigs
Posted by segmond@reddit | LocalLLaMA | View on Reddit | 79 comments
I have long had the itch to build out. Plan was for a 1tb genoa this year and what would have been a $6,000 affair is now a $30,000 affair. So I held out for the mac M5 studio ultra and then looks like Apple is gonna get hit with shortages, they have pushed from Q1 to Q2 and now Q3, meanwhile RTX blackwell pro 6000 prices keep rising. What are we to do?
Then I saw some good news, more Chinese ram manufacturers are coming online - https://wccftech.com/another-chinese-dram-maker-breaks-into-ddr5-memory-mass-producing-64gb-rdimms/
We now have 9200mhz ddr5 - https://wccftech.com/micron-doubles-down-on-ai-memory-256-gb-ddr5-rdimms-hitting-9200-mtps/
Imagine a 9200mhz on a 12 channel genoa. So what does that mean? In a few years we are going to have 12000mhz memory 16 channel systems. That system will crush 5090 in token gen, just throw some cheap GPU for prefill. So, save your money, be patient with what you have and in about 2-3 years you can take your money and returns and get a bad ass system.
If the demand stays here, supply will show up to fill in the needed capacity, and if demand falls off, there will be surplus. Either way, there's reason to be very optimistic looking out about 3 years from now.
FullOf_Bad_Ideas@reddit
what's your electricity price?
I'd rather buy 6000 Pros now and rent them out while rent prices is high to offset some purchasing costs, and then you keep them once rental prices are not attractive for you.
akram200272002@reddit
Considering the amount of cash in the field, not to mention it's likely that companies at least manufacturers predict that cloud providers will shrink if they don't deliver return on investment, someone will make local inference more reasonably priced if not the big players than someone has to be trying to get an ASIC type card working as fast as possible
FullOf_Bad_Ideas@reddit
isn't tiiny kind of that? I don't see hype for it here.
lithium_bromide@reddit
Amazon and Google are already running ASICs. But also the big 3 still have a lot of training demand in this arms race so I doubt investing in inference only hardware is something they care for right now.
There’s also at least one company making model specific ASIC which is MASSIVELY efficient and 100x faster. But ofc if you train a new model you have to start the silicon process from the beginning again. I think that’s what Elon is after. He wants to tighten the loop to iterate faster. Right now it’s like 1-2 years between a new design and packaged silicon. If you can get that down to weeks or months you can start to unlock that tech. Even if you still have to pay TSMC.
I work in edge AI (not with LLM) and I think that’s the most exciting next step in automation. If I can have that inference at 1Khz at least Then that’s when we really have to start worrying about AI taking our jobs. But I think that’s also what’s necessary to prevent humanity from becoming slaves to LLMs who don’t have hands and feet. But even for that I still need GPU servers to train.
tecneeq@reddit
Get a 3090 and a 5070 ti for 48GB fast VRAM and you should be golden for years for less than $2k.
suicidaleggroll@reddit
In 3 years there will be a whole new set of hardware slated for release 3 years later that will make it look pitiful. And 3 years later the same thing.
Not in 3 years. It takes a lot longer than that to plan, get funding, build, test, and start producing wafers from a high end fab, closer to 5-7 years.
Far-Low-4705@reddit
honestly i think ur better off just buying old used hardware now.
it cant depreciate much more than it already has, and it's cheap. if u really need more speed then u can upgrade in the future.
natermer@reddit
Right now we are in a memory crunch.
Not every year is the same as every other year.
ProfessionalSpend589@reddit
Memory, ssd and HDD, and (warnings for) cpu crunch.
RogerRamjet999@reddit
...but, but, but, AI!
Far-Low-4705@reddit
as always, you will always get better tech by waiting.
But if you're always waiting, you'll never get anything, there will always be the next best thing just around the corner no matter where tech currently is.
It may be true that CPU + RAM would destroy a 5090 in token gen, but what about a 6090 or a 7090? what would have been the point in waiting? might aswell wait 2-3 years for CPU + RAM to destroy the 6090 in speed aswell?
sine120@reddit
I've fully got my tin foil hat on. The trend at the moment seems to be to get tech out of consumer's hands and into service providers. Everything from RAM, to drives, to networking equipment seems to be disappearing. Maybe in 3 year's time China might have a tech utopia happening with better hardware than we have access to, but I'd 100% believe the US will regulate its way out of importing it to allow its billionaire buddies the chance to rent it to you for a monthly fee forever. I'm concerned what we have today is just what we get.
darktotheknight@reddit
The reason why everyone says "memory bandwith is king" is due to the fact, that on a GPU, the bottleneck is 100% the memory, because even the lowest end entry-level GPUs are capable to crush through matrices at full memory speed. But that's different on a CPU. If you look at e.g. the EPYC 9135 w/ 12-Channel RAM and DDR5-6400 memory, you're getting a theoretical bandwith of 614.4GB/s, whereas e.g. the RTX 5060 Ti offers 448GB/s. Though the CPU offers higher bandwith, why does the 5060 Ti ends up faster (on models which fit entirely into its VRAM)?
Because compute matters, too. GPUs are highly optimized for high throughput matrix operations, whereas e.g. EPYC 9005 only offers AVX-512. And even though Intel (and soon, Zen 6 from AMD) offer much more capable extensions with AMX (or equivalent), they still lack behind what GPUs offer in terms of compute power. The theoretical bandwith numbers, e.g. 614.4GB/s for EPYC 9005, are for the simplest memory operations, requiring low CPU overhead. But inference tasks on CPUs expose bottlenecks other than memory bandwith, thus it won't translate 1:1 to performance, like it usually does on GPUs.
This is also why APUs (e.g. Strix Halo) are so interesting: they're using unified RAM, but at the same time offer GPU crunching power. However, these implementations have been held back by memory bandwith, since these platforms usually offer the equivalent of Quad-Channel RAM (instead of e.g. EPYC 9005 12-Channel or upcoming EPYC Venice's 16-Channel). There are hybrid implementations like AMD Instinct MI300A (CPU + GPU on the same package), but these are still a bit different than what you would expect, e.g. no dedicated RAM modules, so a maximum of 128GB HBM3 per APU (search for Supermicro H13QSH, if you're interested).
What I'm trying to say is: we're not there yet. I'm 100% sure inference tasks will be very affordable and extremely capable in a few years. The hardware we have today wasn't even designed with these type of workloads (usually engineering until mass product takes about \~7 years). So yes, just waiting is a good strategy. But also keep opportunity costs in mind - having the ability to inference while it's not affordable for everyone is giving you an edge over other people. Once everyone can do it (like streaming games to the internet, or content creators/influencers on YouTube), it will lose its magic.
MDSExpro@reddit
No, on GPU bottleneck is not 100% on memory bandwidth. Memory bandwidth dictates token generation speed, but even more important metric, prefill processing, is compute bound, not memory bound.
Blues520@reddit
Great reply. We'll probably end up with some form of asic
iVXsz@reddit
I really went deep with planning an AI build (that I will probably never afford) but I have come to the same conclusion: wait, or put your money in (decent value) APIs like glm/deepseek/kimi; they are near SOTA for not much.
I know its a bit annoying and unsatisfactory, but within 1.5-2 years, your hardware will lose %80 of its value. It's majorly not worth it, ESPECIALLY for EPYC-based builds, they are the cheapest way to get decent speed inference, but it will seriously lose its value very fast.
Cards will hold their value a lot better generally, and honestly the only route I'd recommend unless you are quite rich; has decently multiple uses in a server, tangibly useful in many workloads (like you could game with it when you aren't AI'ing), and can be easily sold for a good sum of its value.
And with that, you'd have to accept that you won't be able to run any models bigger than 50B or so at decent speeds; qwen3.6 35B a3b should be able to cover your local needs anyhow (IMO any bigger and you might using AI wrong) and you can buy a couple of cards if you really need a harness or something.
Macs are a good in-between but they too seem they will die fast and even be more locked-down e-waste faster.
Think about this, using glm5.1 via API 24/7, everyday, non-stop, in a multi-agent harness, for 5 years, is much less money than the $30k build, and that's before the electricity and other bills (e.g., ~400w electricity over that range is easily thousands in the USA after years of operation) and you can actually hop between models and quickly try stuff out, with high TPS speeds etc.
I say this as someone who really dug deep into the market and spreadsheets over a similar AI-build plan.
Substantial-Ebb-584@reddit
Yeah, I'm not buying anything more that's less than lpddr6. I'll wait till then.
dataexception@reddit
3 years from now is definitely going to be different. We'll be able to pick up all of those data center H100s for pennies on the dollar for home use, I'm guessing. At least the A100s, but given the recent and revolutionary technology shift away from transistor to photonic GPU, I won't be surprised if it leads to faster advancement in other areas, like memory.
bilalba@reddit
You’re saying new technology will outdate the older one. But the newer technology will be more efficient and bring down the cost to rent down too.
dataexception@reddit
I would say eventually it will. New technology always comes at a premium, so I would guess they're expensive AF right now. Plus the current tech is only a coprocessor to manage matrix multiplications and neural network calculations. By them working with Nvidia, however, it will very likely accelerate the advancement of the tech, now that it's proven ridiculously effective and reliable.
AnomalyNexus@reddit
As much as I love toying with local models I don’t see myself buying a mega rig at all.
Financially it just doesn’t make sense and suspect it never will. I don’t have utilization density of API, industrial power pricing or favourable wholesale hardware pricing.
Can definitely see myself using more of a mix. A ton of stuff just doesn’t need opus intelligence or can happen overnight etc so speed is irrelevant
FearFactory2904@reddit
Tldr: "I waited for something and it became unobtainable. So i waited for the next thing and it too became unobtainable. Now I am preaching that instead of getting what is currently obtainable, you should wait like me and see if this next thing pans out."
jacek2023@reddit
I don’t feel confident enough to predict what will happen in a few years, such as what kind of computers will be used or which countries will exist.
SkyFeistyLlama8@reddit
Just to hedge against a future gone The Postman, I would keep a bunch of computers around with power generation equipment for electricity. And lots of dried food, canned food, and plenty of can openers.
Global pandemics, climate breakdown, nuclear war.... 2020s making the 1980s look like a fun day at the playground.
BoxWoodVoid@reddit
In this kind of scenarios, computers are a waste of space, resources and time. You'd better have essential paper books or at best some kind of Kobo / kindle loaded with books : it can be charged from a small solar panel.
SkyFeistyLlama8@reddit
Paper books and LLMs, why not have both?
BoxWoodVoid@reddit
Because you're wasting precious ressources (electricity, petrol etc) on a hallucination machine that won't provide any value at all.
Instead you should be relying on your common sense and available knowledge in the form of books in these scenarios.
Don't you realize how much energy a LLM requires? If I had the available power I'd rather power a freezer than a LLM!
These scenarios require a scarcity mindset: I'm going to use the minimum resources to sustain life for the longest possible.
bespoke_tech_partner@reddit
Agree that in the apocalyptic phase, you simply need to focus on survival. The LLM has value when society is being re-established as you will control the only means of knowledge dissemination.
You can look at it as dead weight or as an investment in being the emperor of the future.
yes2matt@reddit
Salt. Alcohol. Ammo. Fuel. Water.
CorpusculantCortex@reddit
Well to be fair, your home server/model hosting is highly unlikely to impact geopolitics in the next few years.
Demo233@reddit
Far better to just buy what you need now, and just compound that into productive capacity IMO.
Chris266@reddit
Wait, you guys are productive??
Dany0@reddit
... a little bit? Not thanks to AI though lmao
Demo233@reddit
productive at producing images of my AI girlfriend
Dany0@reddit
Props for being honest
Borkato@reddit
Agreed
Tagedieb@reddit
Accidental Peter Zeihan
linkillion@reddit
I'm pretty confident that things will change enough for us to be unable to imagine what the future holds, such as nuclear winter or space jesus.
I'm pretty certain it won't be what we have now lol
OrbMan99@reddit
Unexpected ending. A+
deleted-account69420@reddit
Am I going to be alive in a few years?
YetAnotherAnonymoose@reddit
I'm not sure hardware will ever get cheaper tbh, compute was getting better and better with a total ceiling on general practical usefulness for a long time → hardware got cheaper and cheaper. Now we have a use case for compute where we want as much as possible and a shortage of fabrication → extreme price pressure upwards. Even with new vendors, they will just buy up MORE compute instead of it becoming cheaper.
DigitalguyCH@reddit
Good take, while hardware could get somewhat cheaper at some point, thinking that we will get back to the levels of 1 year ago in a couple of years is delusional, this is a massive shift and even if one big AI companies go bankrupt, others will take the markets and consumers are contrubuting to demand too
toptier4093@reddit
Something the Chinese are exceptionally good at, is working with a tight budget and limited resource. Their mindset is to get as much out of as little as they can, because they have always had to compete by cutting cost and increasing production capacity.
We may be starving for vram currently, and probably for a while longer, but I'm confident that we'll be pleasantly surprised by what we can run locally in a year or two.
Fresh-Letterhead986@reddit
i think its funny that you think chinese-made RAM is going to be cheap. they're going to price that as high as they can. haha :-)
Lorian0x7@reddit
It's not just about hardware. Just reflect to the fact that we can now run last year SOTA levels models on a 24gb vram gpu.
Now Q2/Q3 quantization are totally usable, 2 year ago this was far from true.
NNN_Throwaway2@reddit
Nah. Consumer hardware is going away. People still oblivious/in denial, but the writing is on the wall.
datbackup@reddit
I’m not 100% in agreement but I do think there’s a strong argument to be made that the economic incentives lean that way
There is this question of what is meant by “consumer hardware”
Does it just mean low price?
Or does it mean built for small installations?
I think the logical thing that we see happen is that manufacture aimed at scale deployment (i.e. non consumer) becomes the default, and the whole consumer market becomes based on adapters/retrofit in order to use data center stuff comfortably at home.
xienze@reddit
In the context of this sub, it pretty much means "stuff I can buy for home use that is relatively affordable." And again, "relatively affordable" has different meanings for different people, but I agree with the parent, the days of that are numbered. The days of just running down to Microcenter and picking up a DDR5 system with a good CPU and a mid-range GPU (read: a modern and decent system) for $1000 or whatever are gone. There will always be stuff that is wildly out of date that can be picked up for cheap, but the performance is going to suck. And it's all going to be less cheap than before since you won't be the only person trying to scoop that e-waste up.
Fast-Satisfaction482@reddit
There is absolutely a market for consumer hardware that will not go away. Manufacturers have just pivoted away and towards data center hardware because the amount of silicon they can ship is currently the bottleneck, while they can demand much higher prices for the same silicon from data center customers.
But once demand from data centers is saturated, manufacturers will be able to increase their profits by pivoting back to consumer hardware, so that's what they will do.
NNN_Throwaway2@reddit
Nvidia's Huang has said that scarcity is good for them, and likely other companies are thinking the same thing. Bottlenecks will be the norm from here on out (as they have been with Nvidia for some time).
The demand from data centers will never be saturated. There is literally hardware just sitting around unused, waiting to be installed as we speak, and yet prices and demand are still sky high.
The landscape of consume technology is changing forever. I'm not sure how else to make you understand this. Making baseless assertions like "there is absolutely a market for X" is not thinking critically. Its like saying the market for the horse collar will never go away. Its missing the point.
Fast-Satisfaction482@reddit
Yes it's changing forever, but not on the business side. Companies investing billions into hardware that doesn't earn money will not go on forever. Hardware that was already sold to companies that intend to deploy it does not affect pricing at all.
If NVIDIA for some reason abandons consumers (which they have not done and are also unlikely to do, they are still gamers at heart), there are half a dozen alternative processor manufacturers that would be interested in expanding into the consumer market.
As much as I hate the lack of high VRAM consumer cards, NVIDIA does this in order to separate the gaming market from the AI market. But why do they even want to? Every chip that they make into an entry-level gaming card or APU could have been a high-end data center GPU with much higher margin.
The answer is that they do not want to abandon the consumer market for two reasons. One is that their roots are gaming and as much as they like money (and oh boy do they), they still want to keep gaming around for themselves. The second is strategic planning.
Yes, AI is the biggest business they have ever seen and I do believe that it is here to stay. But economy always has cycles. There will be an AI lull sooner or later, possibly even a dotcom scale crash of AI. And in this situation, the gaming market will help NVIDIA stay afloat.
Financially, AI is now their core business, but they need to keep some business diversity in place to be able to survive future crises.
That's why I believe customer cards will not go away, but also keep painfully low VRAM sizes.
NNN_Throwaway2@reddit
Nvidia has literally been repeatedly criticized for abandoning gamers over the past several years. This has been THE character of the discourse surrounding their consumer attitudes since basically the 20 series.
Fast-Satisfaction482@reddit
They have been criticized for it, but that doesn't make it true.
NNN_Throwaway2@reddit
Well, it is. Dunno what else to tell you. You obviously have your own private version of events so I'll let you enjoy that in peace.
Fast-Satisfaction482@reddit
If you had made this claim for micron, you would have been correct, with Crucuial being terminated and a full shift to cloud. But while NVIDIA is not offering the kinds of cards we wished we had for cheap, they still produce and sell cards for a wide spectrum of private customers.
If you need a gaming card for below 500, NVIDIA does have solid offerings for you.
You can call that abondoning, but then that's YOUR private version of events.
NNN_Throwaway2@reddit
What do you mean "for me"?
SkyFeistyLlama8@reddit
Edge inference is absolutely not going away, not with the likes of Qualcomm and Nvidia still making NPUs and GPUs for consumers.
anzzax@reddit
Can you monetize on your investment today, if not - I would wait
AvidCyclist250@reddit
Affiliation says independent. Did you manage to publish this entirely on your own without formal training? Would give me some hope but for a philosophical paper with an attempt on moral grounding I’ve been not submitting for quite a while now because I’m afraid they’ll tell me to gtfo as a „layperson“ without direct ties to academia in that field at least.
Sofakingwetoddead@reddit
It's much broader than just RAM. TPU's, GPU's as well. Intel in the mix, AMD also. Prices are starting to fall already. Now is def not the time to spend unless you have a cost benefit for doing so.
Public_Umpire_1099@reddit
I would gladly bet on this not being the case. Literally everything I have seen is the complete opposite. Prices since Jan have skyrocketed even further. You used to be able to get a 3080 for 700-800, that's now at least 1k for a beat up card.
DeltaSqueezer@reddit
I should have invested my money in chip stocks, then I would have enough to actually buy their products!
Synor@reddit
Capitalism hates saving
CodeDominator@reddit
The way things are going, in the future we will likely be in the middle of a fucking WW3 - hardly the best time to worry about "rigs".
lemondrops9@reddit
Its going to get worse IMO. Even HDDs are shooting up in price and shortages which is crazy to me.
downunderjames@reddit
you invest today so you can purchase in 3 years rather than being out of business. If you are betting on future cheaper hardware, that means you are not generating returns to justify your investment at all
Turbulent_Pin7635@reddit
All it needs is a war. A war eclipse ng and this was the peak of technology for several years
deathcom65@reddit
I did a temporary build for now and it's good enough to keep me satisfied.... For now.
SuperWallabies@reddit
I know, I know... but honestly, I just can’t wait. 😮💨
ptear@reddit
3 years is a long time to wait
Miserable-Dare5090@reddit
cheap gpu for prefill…literally the most important part. Anyone can stand not seeing a wall of text, but waiting 10 minutes for an answer is just not possible.
ea_man@reddit
I agree, while there will always be a race to the top now being SOTA is required to "get things done" in these early day tech. In a couple of years with optimizations, distilling, caching, better hw it will end up like usual: most people will find out that they can have most things done with the cheaper models, more stuff will be done local.
This is like the worst timeline for stupid prices on even 2 gen old used hw, what is overpriced this year may well be complete obsolete in one year or two.
EmPips@reddit
Tech has always been a depreciating aspect if you zoom out just a little.
Buy it when you want to use it.
mr_tolkien@reddit
Yeah, the hardware needs of local AI are pretty different from what was used so far as dGPUs were mostly a luxury good for people, and RAM was really not that needed since most apps ran well with a few GBs
So it’s normal that production will catch up to the new needs at some point and prices will go down. Nvidia was alone in the field simply because the demand was not that high until recently
ttkciar@reddit
Yep, not buying any more hardware for a while.
If it takes years, I'll wait years, but if the hardware market recovers faster than that, of course that would be better. We'll just see what happens.
larrytheevilbunnie@reddit
From my understanding, we’re guaranteed to be fucked until eoy, so if you want something now, just get it. However, if you’re okay with a wait, please do actually wait cuz we’re gonna see some monster hardware coming out 2027-2028 assuming Xi doesn’t invade Taiwan
a_beautiful_rhind@reddit
Here's hoping... if something else doesn't get us first.
rayc25@reddit
If the demand stays high, there will still be shortages 3 years from now. If it falls off, it means ai didn’t take off like we thought, local ai stays not much more than a hobby and some other tech takes over. If you see anything you want and can afford, buy it now.
BitGreen1270@reddit
Thanks - I needed to hear that. I keep browsing for prices and closing the tab because it's ridiculous. I don't make money with LLMs (nor do I even have a business plan). Can't justify spending rent money on a PC 😞.