honest question: how are people actually getting reliable RTX 5090 access for inference without paying hyperscaler prices
Posted by Exact_Football9061@reddit | LocalLLaMA | View on Reddit | 5 comments
been trying to sort out GPU access for a side project running 70B class models and the gap between “available on pricing page” and “actually available when I need it” has been frustrating
not asking about training runs where you can plan ahead and reserve capacity. specifically inference, where the demand is variable and committing to reserved capacity months out doesn’t make sense at this stage
what I keep running into: marketplace options have the price but the node quality and availability during busy periods is inconsistent. managed single-provider options are more predictable but when their inventory for a specific SKU is gone you just wait
curious what setups people are actually running in production for this use case, not what the pricing pages say
LocalLLaMA-ModTeam@reddit
Rule 3 - Minimal value post. slop
OSlukeo@reddit
the variable demand point is the part that makes reserved capacity a bad fit for a lot of inference workloads. works great if you can predict load. almost nobody can at early stages. and then you’re either over-committed or scrambling
MK_L@reddit
Which model are you running that you prefer the 5090 or are you training something?
HopePupal@reddit
70B models, last paragraph starts with "curious"… bot
sagiroth@reddit
Second hand market ?