honest question: how are people actually getting reliable RTX 5090 access for inference without paying hyperscaler prices

Posted by Exact_Football9061@reddit | LocalLLaMA | View on Reddit | 5 comments

been trying to sort out GPU access for a side project running 70B class models and the gap between “available on pricing page” and “actually available when I need it” has been frustrating

not asking about training runs where you can plan ahead and reserve capacity. specifically inference, where the demand is variable and committing to reserved capacity months out doesn’t make sense at this stage

what I keep running into: marketplace options have the price but the node quality and availability during busy periods is inconsistent. managed single-provider options are more predictable but when their inventory for a specific SKU is gone you just wait

curious what setups people are actually running in production for this use case, not what the pricing pages say