Cheapest $/vRAM GPU right now? Is it a good time?
Posted by Roy3838@reddit | LocalLLaMA | View on Reddit | 79 comments
I have an rtx 2080 which only has 8Gb vRAM, and I was thinking of upgrading that GPU to an affordable and good $/vRAM ratio GPU. I don't have 8k to drop on an rtx pro 6000 like suggested a few days ago here, I was thinking more in the <1k range.
Here are some options I've seen from most expensive to cheapest:
$1,546 RTX PRO 4000 Blackwell 24 GB GDDR7 $64/Gb
\~$900 wait for 5070 ti super? $37/Gb
$800 RTX titan, $33/Gb
$600-800 used 3090, $25-33/Gb
2x$300 mac mini m1 16g cluster using exolabs? (i've used a mac mini cluster before, but it is limited on what you can run) $18/Gb
Is it a good time to guy a GPU? What are your setups like and what can you run in this price range?
I'm worried that the uptrend of RAM prices means GPUs are going to become more expensive in the coming months.
Dr_Superfluid@reddit
I think your best bet is M2 Ultra Mac Studios. You can find 192GB ones for around 3.5k.
By clustering just 2 of them you have almost 400GB which fits almost everything, and you don’t have to deal with a big cluster just two computers that are easy to connect via Thunderbolt bridge.
ConnectBodybuilder36@reddit
rx 470/580 8gb version
Pure_Design_4906@reddit
You are kinda forgetting a player in this, Intel has some cards that could do. I'm not really sure but here in Spain in pccomponentes.com there is an Sparkle ROC OC Edition Intel Arc A770 16 GB GDDR6 memory card for 350 euro give or take. if you can spend 1k more or less, and your motherboard allows it, use 2 graphics cards and get 32gb vram at gddr6 speeds. not the fastest but fine.
Russ_Dill@reddit
You can get dual Radeon RX 6800's (32GB total) for about $540 or $17/GB.
lostborion@reddit
I'm in your same situation and I decided to try get some used 3090, they can be found in my country around 3000zl, approx 700$. I needed 2 failed attempts, first one was a scammer and second one was a Zotac that was throttling as soon as I'd try to load nvidia-smi in Linux. Finally I was rewarded with a 3090FE mint condition, stable 90 degrees Hotspot. Now what I don't know is which model should I try first
truci@reddit
With that card I would first go have some fun with stable diffusion and image and video generation. The noob friendly place to start would be swarmUI. Download, install and have fun playing with all the image models.
lostborion@reddit
Thank you for the recommendation I didn't know about it, installing rn
eloquentemu@reddit
$/GB isn't really a good metric since it hides how fast that memory is, and that's and extremely important part of the spec (if it didn't need to be fast a CPU would be fine). Also, one large card is better than two smaller cards, unless you really want to tune execution and then you're probably using more power, etc.
Some thoughts:
NeverEnPassant@reddit
I actually think the 3090 is highly overrated considering it's ~$700 used. That means you take a lot of risk and the lifetime of the card and resell may be significantly diminished.
For $2000, a 5090 gets you 8GB more memory, 2x the memory bandwidth, pcie5, more efficient power usage, MUCH more computer, and native 4-bit support.
eloquentemu@reddit
While true, I imagine the 3090 has plenty more years in it. Enough, at least, that it'll probably end up being cheaper to get a 3090 now and another GPU in a couple years (used 5090?) when (if) it dies.
I'll also say that the 5090 (well, I tested the 6000 PRO) doesn't really live up to its bandwidth in a lot of cases and I find the 4090 is pretty competitive, especially when doing CPU+GPU MoE. Of course, the 4090 has 2x the compute of the 3090 and you can definitely feel that. But regardless, the 3090 is still very solid.
NeverEnPassant@reddit
But then again, the 5090 resale will be even better. No strong opinion here.
See my numbers for CPU+GPU MoE on a 5090 here: https://old.reddit.com/r/LocalLLaMA/comments/1oonomc/why_the_strix_halo_is_a_poor_purchase_for_most/
It's not possible to get close to those pp numbers without pcie5.
CrunkedJunk@reddit
Rtx 5090? Where’d you see a 5090 that cheap?
NeverEnPassant@reddit
nvidia.com gets restocks every couple weeks
I bought mine from centralcomputers for $2k, it was in stock for >2 weeks when I pulled the trigger.
Roy3838@reddit (OP)
thanks for your reply! that's really helpful!
Noxusequal@reddit
Also a side note buy now prices for gpus and ram will be rising the next half year most likely you can already see it with ddr5. Open ai bought up 40% of global dram capacity which will over the next 1-2 minths at the latest start effecting GPU prices.
vtkayaker@reddit
The other thing that hurts is that multi-GPU configurations often require higher-tier motherboards, CPUs and power setups. Which is where even RTX 6000s start looking vaguely reasonable.
starkruzr@reddit
yeah, came to basically post this, although it looks like the prices of 3090s are ticking back up towards $800 which starts to make the twin (or more) 5060Ti option look better and better again. there are a few good guides for getting parallel inference running smoothly on them.
LA_rent_Aficionado@reddit
Exactly, not all VRAM is created equal and most of these options except for the 3090 are either hypothetical or not worth it. I rather have XGB of speed than 2XGB or snail paced vram - more so if you want to train at all
gratman@reddit
I got a 5080 for 999 new from Newegg
Terminator857@reddit
96 GB of GPU: https://www.bosgamepc.com/products/bosgame-m5-ai-mini-desktop-ryzen-ai-max-395
bladezor@reddit
Runs like dookie though no?
Terminator857@reddit
People have reported 70 tokens per second for qwen3 coder. What is your cup of tea?
Boricua-vet@reddit
Yea, that's not very fast for the money it costs.
This is the performance I get from my 20GB vram 70 dollar investment.
Well now it's like 110.. but still.
redoubt515@reddit
what GPU are you referring to
Boricua-vet@reddit
https://www.ebay.com/itm/156284588757
Icy_Gas8807@reddit
There are methods, to unleash full 128 GB, I’ve been doing it. But dense model performance is not very satisfactory, which is fine and acceptable to me.
noiserr@reddit
On Linux you can use most of the RAM for iGPU, or like 110GB if I'm not mistaken.
Roy3838@reddit (OP)
I mean that's $1800
Terminator857@reddit
$1,800 / 96 gb = $19 per gigabyte
PhantomWolf83@reddit
I'm in your situation and I think my choice will come down to between a used 3090 or dual 5060 Ti 16GBs. I'd love to have dual 3090s or dual 5070 Tis but the cost, space, and power requirements is prohibitive.
A single 3090 is of course much faster but I think I would feel the limits of 24GB much sooner than a combined 32GB, especially when running large models with long context windows. If I'm using LLMs for roleplaying, I would rather be able to have the model remember more over having fast token generation if I have to choose.
T-VIRUS999@reddit
If you purely want $/GB of VRAM, old compute cards are your best bet (without needing like 10-20 cards for useful amounts of VRAM)
Own-Lemon8708@reddit
Rtx 8000 48gb for ~$1800 has been working great for me for a while. Get two and have 96gb VRAM for less than most other options. 220watts each and 10.5" long dual slot means they're very easy to accommodate too.
dunnolawl@reddit
Currently the best VRAM per dollar would be:
NVIDIA P100 16GB (HBM2 with 732.2 GB/s) that have started appearing for ~$80 on alibaba. $5/GB.
AMD MI50 32GB (HBM2 with 1.02 TB/s) was the best deal when it could be had for ~$120-170, but the price has now gone up to ~$320-400. (was ~$5/GB) now $13/GB.
AMD MI250X 128GB (HBM2e with 3.28 TB/s) can be found on the used market for around ~$2000. $16/GB.
All of these cards have their own quirks and issues: P100 and MI50 lack features and are EOL with community support only, the MI250X needs a ~$2,000 server with OAM, but these are the types of the tradeoffs that makes them cheap.
If you're looking a bit into the future, then the cards to look out for would be: V100 32GB (2018), MI100 32GB (2020), A40 48GB (2020), A100 40GB (2020), MI210 64GB (2021). Using the P100 (2016) as a benchmark, we might start to see reasonably priced V100 cards next year and the A40 or A100 in 2028.
evillarreal86@reddit
I got the last cheap MI50. Incredible how expensive they are now.
Rocm 7.0 works with them without issues
GamarsTCG@reddit
How did you run rocm7 with them? Thought they were only good up to 6.3
dunnolawl@reddit
You can either compile the experimental build of ROCm (TheRock), which still builds and passes with gfx906.
Or you can copy the missing files over. Even the most recent ROCm (7.1.0) works with this method.
AMD is not actively developing or supporting the gfx906 anymore so it's just a matter of time when ROCm just stops working, but for now it works. There even was a performance boost for MI50 on one of those ROCm version that doesn't support it officially and needs the above trick to make it work.
GamarsTCG@reddit
So, whats the compatability of this with vllm for multi gpu? Just like native rocm? or still using the vllm fork for gfx906
dunnolawl@reddit
You need to use the vLLM fork for gfx906. It's not amazing, but it does even work with some MoE models these days. The performance I've gotten with 8x MI50 32GB (each gets x8 PCIe 3.0) is:
GLM-4.6-GPTQ: 7.2 tokens/s --- ~10k tokens in 70s => 142t/s
Llama-3.1-70B-AWQ: 23.4 tokens/s --- 12333 tokens in 55s => 224t/s
Llama-3.1-70B-BF16: 16.9 tokens/s --- ~12k tokens in 45s => 266t/s
Mistral-Large-Instruct-2411-W4A16: 15.7 tokens/s --- ~15k tokens in 95s => 157t/s
Mistral-Large-Instruct-2411-BF16: 5.8 tokens/s --- ~10k tokens in 60s => 166t/s
The power draw while using vLLM can get absolutely bonkers though. After a bit of tweaking I got it down to 1610W from 2453W. That's not at the wall, that's what the software reports.
evillarreal86@reddit
I'm using llamacpp atm with 2 MI50, tomorrow I will test 4 with llamacpp.
GamarsTCG@reddit
Oh I also have 8x Mi50, my server is coming in soon. Do you have the performance for Qwen3VL 235b awq?
dunnolawl@reddit
I haven't used it. The only MoE I've tried was GLM 4.6, which had worse performance with vLLM than with llama.cpp for a single user. Based on that I'd guess the performance would be similar with Qwen3VL 235B.
waiting_for_zban@reddit
Where is that market ...
dunnolawl@reddit
A few resellers have listed it on their websites. It's the HP part number "HP P41933-001" and also ebay.
These are still longways from finding their way into recyclers, but they are being sold now as "Refurbished" with differing warranties.
noiserr@reddit
Problem is those are all OAM boards, so you can't just plug them in a regular PCE slot. And good luck finding a cheap OAM server. They are mostly 8 way.
There are OAM to PCIE conversion boards but I haven't seen any that support the mi250x.
waiting_for_zban@reddit
I went down that rabbit hole (OAM to PCIe) apparently few years ago a redditor tried it and quickly regretted it.
That aside from what I read, it's quite challenging to get it working as it's usually comes soldered on the server and AMD does not sell it as an "individual" unit. So most likely if it ever runs it will be unoptiimized.
llama-impersonator@reddit
keep in mind V100 and older are stuck on cuda 12 or lower, that's gonna be a pain in the ass at some point.
grimjim@reddit
The Super series may cost more than next year due to DRAM scarcity. Don't expect it earlier this Q3 2026 in my estimation.
noiserr@reddit
I don't think there will be a Super series. Pretty sure they are canceled due to DRAM situation.
grimjim@reddit
GDDR7 4GB memory modules are on the roadmap around a year out. They'll occupy the high end and free up the 3GB modules that the Super series would need. Delay too long, and there's still the issue of what VRAM the Rubin series of RTX 60x0 GPUs would have. Buyers are already avoiding 8GB GPUs on the desktop, based on 5060/5060ti sales. Awkward situation.
calivision@reddit
My 3060 12gb runs Ollama locally, I got it for $160 used
Thrumpwart@reddit
7900XTX is still best bang for buck.
iamn0@reddit
The RTX 3090 is still the best option (relatively high VRAM with relatively high bandwidth). The prices for used cards are fairly stable, no idea how the market will develop in the next 1-2 years.
Roy3838@reddit (OP)
I didn't consider memory bandwidth because I just want to run bigger models, even if the tokens/second is not as good. But thank you for your chart! I'm discarding the RTX titan option due to the price/bandwith comparison.
TechnicalGeologist99@reddit
Bigger models will need bigger bandwidth, the tokens per second is very sensitive to the bandwidth.
noiserr@reddit
Depends on the architecture. MoE models only activate a portion of the model saving on memory bandwidth or running faster depending on how you look at it.
StardockEngineer@reddit
FYI Exo seems to be a dead product. So don’t buy hoping to use that.
No-Refrigerator-1672@reddit
Cheapest VRAM right now is on AMD Mi50: 32GB for $150-$200 depending on from whom are you purchasing from. But beware: you can only rely on Mi50 in llama.cpp, any other usecase is not for that card.
Cheapest Nvidia that's actually usable has to be sourced from China. They are modifying cards to double their capacity. At this moment, their offers are 2080Ti 22GB for roughly $300; 3080 20GB for roughly $400; 4090D 48GB for roughly $2700, which is not cheap, but probably the cheapest 48GB card on the market. All prices listed without import taxes. Buying those cards depends heavily on your local market: is you can get a 3090 for $500-600, by all means go get it, it's a better deal than Chinese ones; but if your best price is $700-$800, then Chinese cards get the lead.
Macs should be avoided. Right now there will be at least three persons who will jump in and say that macs are great for LLMs; but the reality is that ever with M3 Ultra, the fastest chip llm-wise that's available, your PP is very low, and basically Mac is usable only for chats. The moment when you realise that you want more sophisticated workflows and tools, you'll find out any task taking too long to complete. There might be debate about mac vs pc for 100B MoE model; but for 16GB memory - just don't touch them and get a 16GB GPU.
Roy3838@reddit (OP)
I would worry about the stability/support of chinese-modded GPU's but i'll check them out. Do you have a post where people talk about their experience?
No-Refrigerator-1672@reddit
i would suggest reading mine. Information about long-term stability is very sparse and I've discussed it in the last paragraph. Otherwise, I would dare to say that this is the most information-rich post on reddit.
huzbum@reddit
RTX 3060 12GB is like $250, so $20/GB.
CMP 100-210 16GB is like $150, so $10/GB.
These are great for small models that fit, but if you have to use multiple GPUs, they are only PCIe 1x, so they are slow to load models and can't do tensor parallel.
Ssjultrainstnict@reddit
I think if you want warranty, long term support, out of the box use and good amount of vram on a single slot, amd r9700 is the only viable option at $1299
Dontdoitagain69@reddit
Look at decommissioned racks on eBay , don’t pay these crazy prices
wakalakabamram@reddit
Would love to see an example of a suggested rack linked if you get the time.
Dontdoitagain69@reddit
https://www.ebay.com/itm/127317604189?_trkparms=amclksrc%3DITM%26aid%3D1110006%26algo%3DHOMESPLICE.SIM%26ao%3D1%26asc%3D295747%26meid%3D1b86ee43613b43f5b44e47f881fa4795%26pid%3D101875%26rk%3D3%26rkt%3D4%26sd%3D127317606133%26itm%3D127317604189%26pmt%3D1%26noa%3D0%26pg%3D2332490%26algv%3DSimVIDwebV3WithCPCExpansionEmbeddingSearchQuerySemanticBroadMatchSingularityRecallReplaceKnnV4WithVectorDbNsOptHotPlRecallCIICentroidCoviewCPCAuto%26brand%3DSupermicro&_trksid=p2332490.c101875.m1851&itmprp=cksum%3A1273176041891b86ee43613b43f5b44e47f881fa4795%7Cenc%3AAQAKAAABoG96wQ16jds4VFcrhy1F3d4mbwZUJI9Fs%252BgdXYAHIzlX2e3YaNh7x%252BEnKA3G%252BCqSl1Xn4McfcWFK1GytmS2qxJ87mtE8Gm3iR1Ja4WBwh0hNHJrJx3Ki5mp04ow4CO7lP%252BooCybZDDU%252BbbSwmg7CbTin%252BBzBzbCYVnbjvyQAHu6--HI4MB7SvJl5IJqlyvomgoLMlgT6qAJzX0SANJhty2foaVXowoTjTXsPykdKoIdMsF2b1HgsFwXQXw6dFvS8bjZfB%252BrfgCsnGRaOXK8F3x%252F0gBM9nKymEqMQeDqSqwQ4%252BEpCQJ9wcNDH3ar%252FsVNnASG39e3T4oX7fYvdxpUiZIdqNw7%252FqLrz%252BUXdx4No9c06UbyjIfP5Rk5H1Qrc5y45bCQNPHx%252FlV3tTHkrrgfrhNPxv4F67AoS7VfL3Nd1E9mjR7uzhjPBcbUi5GB4L8nESJvcCuQhXI%252F7aZFfmHtqgMbddxKGEIk9x0%252Bl6bJUCVv%252FcgJJ9f3coSS8S6AjTZS%252FqONj8mINWkKkxKG3xbpwWRSPE2qFZjd%252Fh1ZpVFeEytgO%7Campid%3APL_CLK%7Cclp%3A2332490&itmmeta=01KAYHXXVTBW19PR299C7DQFJS
I see these below 3k sometime, just keep looking and offering low prices
CertainlyBright@reddit
48GB 4090 - 3400$
Mountain-Hedgehog128@reddit
I wouldn't go the mac route. I'd do a cuda compatible GPU.
ThisGonBHard@reddit
The 5070 Ti Super is unlikely to ever launch form this point on, because of the general memory issues until 2027.
The placeholder date was Q3 2026, and that is VERY far away, with them being likely canceled. Everything I found on the global RAM situation says that things are very fucked till at least Dec 2026, if not later.
LA_rent_Aficionado@reddit
Used 3090 is you best bet on that list
hp1337@reddit
2x arc b580
Same performance as 5080 with 24gig vram
BoeJonDaker@reddit
Well, Amazon just announced it's spending another $50B on data center capacity, and Meta is in talks to buy a bunch of TPUs from Google, so I don't think prices are going to get better any time soon. Now's probably the time to buy.
Depending on where you are, the 5060ti 16Gb is selling for less than MSRP on pcpartpicker right now.
Roy3838@reddit (OP)
You're right I didn't consider the 5060ti because I was looking for 24Gb of vram but it's a super good deal rn.
It's $25/Gb on the ratio. Maybe buying two is a good idea.
BoeJonDaker@reddit
My mistake. I didn't realize the cards you listed were all 24Gb or higher.
If you can handle a (physically) big card go for it. I buy small because I have a bunch of hard drives in my case.
runsleeprepeat@reddit
32gb V100 "OEM" on Alibaba. Roughly USD 500-550
Roy3838@reddit (OP)
I'm a bit skeptical about alibaba but that's a good option!
runsleeprepeat@reddit
I can understand your worries. I gave it a shot (but I bought 5x 3080 20gb) and it worked out smoothly. Other sellers may be better or worse.
Late-Assignment8482@reddit
The Blackwell generation (most of 5xxx and also the Blackwell Pro xxxx) has some very useful features like native support for MXFP4 quantizations which are about the size of Q4, precision closer to Q8.
New-Yogurtcloset1984@reddit
2x 4060 Ti 16Gb
£800 = £25/GB.
Roy3838@reddit (OP)
that's a good idea!
grabber4321@reddit
5060/3090