Cheapest $/vRAM GPU right now? Is it a good time?

[-]

Dr_Superfluid@reddit

I think your best bet is M2 Ultra Mac Studios. You can find 192GB ones for around 3.5k.

By clustering just 2 of them you have almost 400GB which fits almost everything, and you don’t have to deal with a big cluster just two computers that are easy to connect via Thunderbolt bridge.

[-]

You are kinda forgetting a player in this, Intel has some cards that could do. I'm not really sure but here in Spain in pccomponentes.com there is an Sparkle ROC OC Edition Intel Arc A770 16 GB GDDR6 memory card for 350 euro give or take. if you can spend 1k more or less, and your motherboard allows it, use 2 graphics cards and get 32gb vram at gddr6 speeds. not the fastest but fine.

[-]

Russ_Dill@reddit

You can get dual Radeon RX 6800's (32GB total) for about $540 or $17/GB.

[-]

lostborion@reddit

I'm in your same situation and I decided to try get some used 3090, they can be found in my country around 3000zl, approx 700$. I needed 2 failed attempts, first one was a scammer and second one was a Zotac that was throttling as soon as I'd try to load nvidia-smi in Linux. Finally I was rewarded with a 3090FE mint condition, stable 90 degrees Hotspot. Now what I don't know is which model should I try first

[-]

truci@reddit

With that card I would first go have some fun with stable diffusion and image and video generation. The noob friendly place to start would be swarmUI. Download, install and have fun playing with all the image models.

[-]

lostborion@reddit

Thank you for the recommendation I didn't know about it, installing rn

[-]

eloquentemu@reddit

$/GB isn't really a good metric since it hides how fast that memory is, and that's and extremely important part of the spec (if it didn't need to be fast a CPU would be fine). Also, one large card is better than two smaller cards, unless you really want to tune execution and then you're probably using more power, etc.

Some thoughts:

The 3090 is still a champ for the large and fast memory: even most modern cards don't have faster memory. It's probably the only thing really worthwile under $1k.
The super series might replace it, but that still doesn't exist so IDK if it's worth waiting for.
The R9700 is not amazing, but does offer 32GB of RAM at roughly 5070 (not Ti) performance.
Dual 5060 ti 16GB is a popular pick and can be okay if you get parallel inference running smoothly, but keep in mind that's still not plug-and-play AFAIK. Without parallel, they're slow and splitting across GPUs can be inefficient for memory utilization.

[-]

NeverEnPassant@reddit

I actually think the 3090 is highly overrated considering it's ~$700 used. That means you take a lot of risk and the lifetime of the card and resell may be significantly diminished.

For $2000, a 5090 gets you 8GB more memory, 2x the memory bandwidth, pcie5, more efficient power usage, MUCH more computer, and native 4-bit support.

[-]

eloquentemu@reddit

While true, I imagine the 3090 has plenty more years in it. Enough, at least, that it'll probably end up being cheaper to get a 3090 now and another GPU in a couple years (used 5090?) when (if) it dies.

I'll also say that the 5090 (well, I tested the 6000 PRO) doesn't really live up to its bandwidth in a lot of cases and I find the 4090 is pretty competitive, especially when doing CPU+GPU MoE. Of course, the 4090 has 2x the compute of the 3090 and you can definitely feel that. But regardless, the 3090 is still very solid.

[-]

NeverEnPassant@reddit

While true, I imagine the 3090 has plenty more years in it. Enough, at least, that it'll probably end up being cheaper to get a 3090 now and another GPU in a couple years (used 5090?) when (if) it dies.

But then again, the 5090 resale will be even better. No strong opinion here.

I'll also say that the 5090 (well, I tested the 6000 PRO) doesn't really live up to its bandwidth in a lot of cases and I find the 4090 is pretty competitive, especially when doing CPU+GPU MoE.

See my numbers for CPU+GPU MoE on a 5090 here: https://old.reddit.com/r/LocalLLaMA/comments/1oonomc/why_the_strix_halo_is_a_poor_purchase_for_most/

It's not possible to get close to those pp numbers without pcie5.

[-]

CrunkedJunk@reddit

Rtx 5090? Where’d you see a 5090 that cheap?

[-]

NeverEnPassant@reddit

nvidia.com gets restocks every couple weeks

I bought mine from centralcomputers for $2k, it was in stock for >2 weeks when I pulled the trigger.

[-]

Roy3838@reddit (OP)

thanks for your reply! that's really helpful!

[-]

Noxusequal@reddit

Also a side note buy now prices for gpus and ram will be rising the next half year most likely you can already see it with ddr5. Open ai bought up 40% of global dram capacity which will over the next 1-2 minths at the latest start effecting GPU prices.

[-]

vtkayaker@reddit

The other thing that hurts is that multi-GPU configurations often require higher-tier motherboards, CPUs and power setups. Which is where even RTX 6000s start looking vaguely reasonable.

[-]

starkruzr@reddit

yeah, came to basically post this, although it looks like the prices of 3090s are ticking back up towards $800 which starts to make the twin (or more) 5060Ti option look better and better again. there are a few good guides for getting parallel inference running smoothly on them.

[-]

LA_rent_Aficionado@reddit

Exactly, not all VRAM is created equal and most of these options except for the 3090 are either hypothetical or not worth it. I rather have XGB of speed than 2XGB or snail paced vram - more so if you want to train at all

[-]

gratman@reddit

I got a 5080 for 999 new from Newegg

[-]

Terminator857@reddit

96 GB of GPU: https://www.bosgamepc.com/products/bosgame-m5-ai-mini-desktop-ryzen-ai-max-395

[-]

bladezor@reddit

Runs like dookie though no?

[-]

Terminator857@reddit

People have reported 70 tokens per second for qwen3 coder. What is your cup of tea?

[-]

Boricua-vet@reddit

Yea, that's not very fast for the money it costs.
This is the performance I get from my 20GB vram 70 dollar investment.

Well now it's like 110.. but still.

[-]

redoubt515@reddit

what GPU are you referring to

[-]

Boricua-vet@reddit

https://www.ebay.com/itm/156284588757

[-]

Icy_Gas8807@reddit

There are methods, to unleash full 128 GB, I’ve been doing it. But dense model performance is not very satisfactory, which is fine and acceptable to me.

[-]

noiserr@reddit

On Linux you can use most of the RAM for iGPU, or like 110GB if I'm not mistaken.

[-]

Roy3838@reddit (OP)

I mean that's $1800

[-]

Terminator857@reddit

$1,800 / 96 gb = $19 per gigabyte

[-]

PhantomWolf83@reddit

I'm in your situation and I think my choice will come down to between a used 3090 or dual 5060 Ti 16GBs. I'd love to have dual 3090s or dual 5070 Tis but the cost, space, and power requirements is prohibitive.

A single 3090 is of course much faster but I think I would feel the limits of 24GB much sooner than a combined 32GB, especially when running large models with long context windows. If I'm using LLMs for roleplaying, I would rather be able to have the model remember more over having fast token generation if I have to choose.

[-]

T-VIRUS999@reddit

If you purely want $/GB of VRAM, old compute cards are your best bet (without needing like 10-20 cards for useful amounts of VRAM)

[-]

Own-Lemon8708@reddit

Rtx 8000 48gb for ~$1800 has been working great for me for a while. Get two and have 96gb VRAM for less than most other options. 220watts each and 10.5" long dual slot means they're very easy to accommodate too.

[-]

dunnolawl@reddit

Currently the best VRAM per dollar would be:

NVIDIA P100 16GB (HBM2 with 732.2 GB/s) that have started appearing for ~$80 on alibaba. $5/GB.

AMD MI50 32GB (HBM2 with 1.02 TB/s) was the best deal when it could be had for ~$120-170, but the price has now gone up to ~$320-400. (was ~$5/GB) now $13/GB.

AMD MI250X 128GB (HBM2e with 3.28 TB/s) can be found on the used market for around ~$2000. $16/GB.

All of these cards have their own quirks and issues: P100 and MI50 lack features and are EOL with community support only, the MI250X needs a ~$2,000 server with OAM, but these are the types of the tradeoffs that makes them cheap.

If you're looking a bit into the future, then the cards to look out for would be: V100 32GB (2018), MI100 32GB (2020), A40 48GB (2020), A100 40GB (2020), MI210 64GB (2021). Using the P100 (2016) as a benchmark, we might start to see reasonably priced V100 cards next year and the A40 or A100 in 2028.

[-]

evillarreal86@reddit

I got the last cheap MI50. Incredible how expensive they are now.

Rocm 7.0 works with them without issues

[-]

GamarsTCG@reddit

How did you run rocm7 with them? Thought they were only good up to 6.3

[-]

dunnolawl@reddit

You can either compile the experimental build of ROCm (TheRock), which still builds and passes with gfx906.

Or you can copy the missing files over. Even the most recent ROCm (7.1.0) works with this method.

AMD is not actively developing or supporting the gfx906 anymore so it's just a matter of time when ROCm just stops working, but for now it works. There even was a performance boost for MI50 on one of those ROCm version that doesn't support it officially and needs the above trick to make it work.

[-]

GamarsTCG@reddit

So, whats the compatability of this with vllm for multi gpu? Just like native rocm? or still using the vllm fork for gfx906

[-]

dunnolawl@reddit

You need to use the vLLM fork for gfx906. It's not amazing, but it does even work with some MoE models these days. The performance I've gotten with 8x MI50 32GB (each gets x8 PCIe 3.0) is:

GLM-4.6-GPTQ: 7.2 tokens/s --- ~10k tokens in 70s => 142t/s

Llama-3.1-70B-AWQ: 23.4 tokens/s --- 12333 tokens in 55s => 224t/s

Llama-3.1-70B-BF16: 16.9 tokens/s --- ~12k tokens in 45s => 266t/s

Mistral-Large-Instruct-2411-W4A16: 15.7 tokens/s --- ~15k tokens in 95s => 157t/s

Mistral-Large-Instruct-2411-BF16: 5.8 tokens/s --- ~10k tokens in 60s => 166t/s

The power draw while using vLLM can get absolutely bonkers though. After a bit of tweaking I got it down to 1610W from 2453W. That's not at the wall, that's what the software reports.

[-]

evillarreal86@reddit

I'm using llamacpp atm with 2 MI50, tomorrow I will test 4 with llamacpp.

[-]

GamarsTCG@reddit

Oh I also have 8x Mi50, my server is coming in soon. Do you have the performance for Qwen3VL 235b awq?

[-]

dunnolawl@reddit

I haven't used it. The only MoE I've tried was GLM 4.6, which had worse performance with vLLM than with llama.cpp for a single user. Based on that I'd guess the performance would be similar with Qwen3VL 235B.

[-]

waiting_for_zban@reddit

AMD MI250X 128GB (HBM2e with 3.28 TB/s) can be found on the used market for around ~$2000. $16/GB.

Where is that market ...

[-]

dunnolawl@reddit

A few resellers have listed it on their websites. It's the HP part number "HP P41933-001" and also ebay.

These are still longways from finding their way into recyclers, but they are being sold now as "Refurbished" with differing warranties.

[-]

noiserr@reddit

Problem is those are all OAM boards, so you can't just plug them in a regular PCE slot. And good luck finding a cheap OAM server. They are mostly 8 way.

There are OAM to PCIE conversion boards but I haven't seen any that support the mi250x.

[-]

waiting_for_zban@reddit

I went down that rabbit hole (OAM to PCIe) apparently few years ago a redditor tried it and quickly regretted it.

That aside from what I read, it's quite challenging to get it working as it's usually comes soldered on the server and AMD does not sell it as an "individual" unit. So most likely if it ever runs it will be unoptiimized.

[-]

llama-impersonator@reddit

keep in mind V100 and older are stuck on cuda 12 or lower, that's gonna be a pain in the ass at some point.

[-]

grimjim@reddit

The Super series may cost more than next year due to DRAM scarcity. Don't expect it earlier this Q3 2026 in my estimation.

[-]

noiserr@reddit

I don't think there will be a Super series. Pretty sure they are canceled due to DRAM situation.

[-]

grimjim@reddit

GDDR7 4GB memory modules are on the roadmap around a year out. They'll occupy the high end and free up the 3GB modules that the Super series would need. Delay too long, and there's still the issue of what VRAM the Rubin series of RTX 60x0 GPUs would have. Buyers are already avoiding 8GB GPUs on the desktop, based on 5060/5060ti sales. Awkward situation.

[-]

calivision@reddit

My 3060 12gb runs Ollama locally, I got it for $160 used

[-]

Thrumpwart@reddit

7900XTX is still best bang for buck.

[-]

iamn0@reddit

The RTX 3090 is still the best option (relatively high VRAM with relatively high bandwidth). The prices for used cards are fairly stable, no idea how the market will develop in the next 1-2 years.

GPU	Price (from your post)	VRAM	Memory Bandwidth	Power Consumption (W)
RTX PRO 4000 Blackwell	\~$1,546	24 GB GDDR7	672 GB/s	140
RTX 5070 Ti Super	\~$900	16 GB GDDR7	896 GB/s	300 W
RTX Titan	\~$800	24 GB GDDR6	672 GB/s	280 W
RTX 3090	\~$700	24 GB GDDR6X	936 GB/s	350 W

[-]

Roy3838@reddit (OP)

I didn't consider memory bandwidth because I just want to run bigger models, even if the tokens/second is not as good. But thank you for your chart! I'm discarding the RTX titan option due to the price/bandwith comparison.

[-]

TechnicalGeologist99@reddit

Bigger models will need bigger bandwidth, the tokens per second is very sensitive to the bandwidth.

[-]

noiserr@reddit

Depends on the architecture. MoE models only activate a portion of the model saving on memory bandwidth or running faster depending on how you look at it.

[-]

StardockEngineer@reddit

FYI Exo seems to be a dead product. So don’t buy hoping to use that.

[-]

No-Refrigerator-1672@reddit

Cheapest VRAM right now is on AMD Mi50: 32GB for $150-$200 depending on from whom are you purchasing from. But beware: you can only rely on Mi50 in llama.cpp, any other usecase is not for that card.

Cheapest Nvidia that's actually usable has to be sourced from China. They are modifying cards to double their capacity. At this moment, their offers are 2080Ti 22GB for roughly $300; 3080 20GB for roughly $400; 4090D 48GB for roughly $2700, which is not cheap, but probably the cheapest 48GB card on the market. All prices listed without import taxes. Buying those cards depends heavily on your local market: is you can get a 3090 for $500-600, by all means go get it, it's a better deal than Chinese ones; but if your best price is $700-$800, then Chinese cards get the lead.

Macs should be avoided. Right now there will be at least three persons who will jump in and say that macs are great for LLMs; but the reality is that ever with M3 Ultra, the fastest chip llm-wise that's available, your PP is very low, and basically Mac is usable only for chats. The moment when you realise that you want more sophisticated workflows and tools, you'll find out any task taking too long to complete. There might be debate about mac vs pc for 100B MoE model; but for 16GB memory - just don't touch them and get a 16GB GPU.

[-]

Roy3838@reddit (OP)

I would worry about the stability/support of chinese-modded GPU's but i'll check them out. Do you have a post where people talk about their experience?

[-]

No-Refrigerator-1672@reddit

i would suggest reading mine. Information about long-term stability is very sparse and I've discussed it in the last paragraph. Otherwise, I would dare to say that this is the most information-rich post on reddit.

[-]

huzbum@reddit

RTX 3060 12GB is like $250, so $20/GB.

CMP 100-210 16GB is like $150, so $10/GB.
These are great for small models that fit, but if you have to use multiple GPUs, they are only PCIe 1x, so they are slow to load models and can't do tensor parallel.

[-]

Ssjultrainstnict@reddit

I think if you want warranty, long term support, out of the box use and good amount of vram on a single slot, amd r9700 is the only viable option at $1299

[-]

Dontdoitagain69@reddit

Look at decommissioned racks on eBay , don’t pay these crazy prices

[-]

wakalakabamram@reddit

Would love to see an example of a suggested rack linked if you get the time.

[-]

Dontdoitagain69@reddit

https://www.ebay.com/itm/127317604189?_trkparms=amclksrc%3DITM%26aid%3D1110006%26algo%3DHOMESPLICE.SIM%26ao%3D1%26asc%3D295747%26meid%3D1b86ee43613b43f5b44e47f881fa4795%26pid%3D101875%26rk%3D3%26rkt%3D4%26sd%3D127317606133%26itm%3D127317604189%26pmt%3D1%26noa%3D0%26pg%3D2332490%26algv%3DSimVIDwebV3WithCPCExpansionEmbeddingSearchQuerySemanticBroadMatchSingularityRecallReplaceKnnV4WithVectorDbNsOptHotPlRecallCIICentroidCoviewCPCAuto%26brand%3DSupermicro&_trksid=p2332490.c101875.m1851&itmprp=cksum%3A1273176041891b86ee43613b43f5b44e47f881fa4795%7Cenc%3AAQAKAAABoG96wQ16jds4VFcrhy1F3d4mbwZUJI9Fs%252BgdXYAHIzlX2e3YaNh7x%252BEnKA3G%252BCqSl1Xn4McfcWFK1GytmS2qxJ87mtE8Gm3iR1Ja4WBwh0hNHJrJx3Ki5mp04ow4CO7lP%252BooCybZDDU%252BbbSwmg7CbTin%252BBzBzbCYVnbjvyQAHu6--HI4MB7SvJl5IJqlyvomgoLMlgT6qAJzX0SANJhty2foaVXowoTjTXsPykdKoIdMsF2b1HgsFwXQXw6dFvS8bjZfB%252BrfgCsnGRaOXK8F3x%252F0gBM9nKymEqMQeDqSqwQ4%252BEpCQJ9wcNDH3ar%252FsVNnASG39e3T4oX7fYvdxpUiZIdqNw7%252FqLrz%252BUXdx4No9c06UbyjIfP5Rk5H1Qrc5y45bCQNPHx%252FlV3tTHkrrgfrhNPxv4F67AoS7VfL3Nd1E9mjR7uzhjPBcbUi5GB4L8nESJvcCuQhXI%252F7aZFfmHtqgMbddxKGEIk9x0%252Bl6bJUCVv%252FcgJJ9f3coSS8S6AjTZS%252FqONj8mINWkKkxKG3xbpwWRSPE2qFZjd%252Fh1ZpVFeEytgO%7Campid%3APL_CLK%7Cclp%3A2332490&itmmeta=01KAYHXXVTBW19PR299C7DQFJS

I see these below 3k sometime, just keep looking and offering low prices

[-]

CertainlyBright@reddit

48GB 4090 - 3400$

[-]

Mountain-Hedgehog128@reddit

I wouldn't go the mac route. I'd do a cuda compatible GPU.

[-]

ThisGonBHard@reddit

The 5070 Ti Super is unlikely to ever launch form this point on, because of the general memory issues until 2027.

The placeholder date was Q3 2026, and that is VERY far away, with them being likely canceled. Everything I found on the global RAM situation says that things are very fucked till at least Dec 2026, if not later.

[-]

LA_rent_Aficionado@reddit

Used 3090 is you best bet on that list

[-]

hp1337@reddit

2x arc b580

Same performance as 5080 with 24gig vram

[-]

BoeJonDaker@reddit

Well, Amazon just announced it's spending another $50B on data center capacity, and Meta is in talks to buy a bunch of TPUs from Google, so I don't think prices are going to get better any time soon. Now's probably the time to buy.

Depending on where you are, the 5060ti 16Gb is selling for less than MSRP on pcpartpicker right now.

[-]