Should I sell my RTX3090s?

[-]

Karyo_Ten@reddit

Cloud API are rising tremendously (GLM, Claude). It's clear the era of subsidized subscription is over with even Alibaba / Qwen shelving their coding plan.

This means owning hardware is getting more and more attractive.

In your plan there is no bound for "a while", meaning if you can't gather the extra money you might even pay more in Cloud APIs/subscription.

For me, sell when you have the money to get the RTX Pro 6000 right away.

And to be honest, Nvidia dropped the ball on SM120 / NVFP4 support.

vLLM / SGLang only care about H100 and B200.

[-]

IMO the biggest concern for local deployment is not the rising rate of cloud API, but the withering of middle size models (say \~70b-120b). These models can really be close to real flagship models and can comfortably fit in consumer devices. Small models like qwen3.6 27b score high in benchmark but IMO it is still a toy and I cannot use it in real serious scenarios.

Cloud API rate is a concern to me until Deepseek V4 released. It is as cheap as air. Since it runs on huawei chips rather than nvidia chips, I wont worry that their rate will go up much. They also promised to drop the rate.

[-]

boutell@reddit

As someone who is seriously looking at buying a card to run 27b on I'm curious about the scenarios where it did not meet your needs, if you have tested that...

[-]

Historical-Crazy1831@reddit

I run qwen3.6 27b GPTQ 4bit on dual 3090, using qwen code cli as harness. My main scenarios are (a) reading code base, help me understand the details and revise codes; (b) scientific writing.

When the code base is big, it sometimes cannot even type the directory name correctly in its tool calls. And usually I need to chat with it back and forth to get things done right, and end up I switch to bigger model like GLM5.1 and solve the problem in a single shot.

For writing, I tried different models and the same prompt. Usually I will send my whole article and some comments, and ask it to provide a revision suggestion. It is very clear that bigger models like GLM5.1, Deepseek v4 pro possess a superior global perspective when handling long texts; they are capable of deeper reasoning, and the recommendations they provide appear more sophisticated. Qwen3.6 27b is also good at that, but if you do the comparison you can see the difference.

Since my work are not top secret, I end up using deepseek these days for serious tasks, and give qwen3.6 27b some simple tasks such as new fetching, fact check etc. Really look forward to qwen3.6 122b a10b to replace 27b. The 27b dense model is indeed smart, but larger world knowledge is another kind of smart that may influence the final output quality.

[-]

Karyo_Ten@reddit

The Qwen3.5-122B-A10B and Qwen3.5-27B were a wash in terms of perf. Assumong this is the same for 3.6. the extra knowledge from 122B might help for scientific writing or specialized domain codebases but the 27B with more active params might be more nuanced.

[-]

boutell@reddit

Gotcha. It sounds like the smart money is still holding out for a 122b a10b MoE, at which point buying a single 24MB card would look like a mistake?

[-]

Historical-Crazy1831@reddit

I buy my 3090s one by one and I do not regret for now. 3090s are great for the computing capability, memory size and price (price is bit high now). It is fun to upgrade step by step and try everything that fits in your vram.

[-]

jakeman8888@reddit

I use Qwen 27b for production customer facing products on a RTX 6000 at full FP8 and it’s a beast. Idk. It’s good and can deliver if you know what you’re doing.

[-]

boutell@reddit

48GB isn't cheap but it makes a difference

[-]

daviden1013@reddit (OP)

I run Qwen-3.6-27B on 4 RTX3090 with vLLM for Claude Code (Claude Code Router config). Performance is pretty good. For simple frontend and bash tasks it replaces Sonnet. Also use Qwe3 5/3.6 for OCR. Performance is better than proprietary models.

[-]

Karyo_Ten@reddit

I have replaced Qwen3.5-397B-A17B with swarms of Qwen3.6-27B in certain scenarios.

And I'm excited about PFlash/DFlash.

It's really good and at one point, being able to iterate, test and validate faster trumps cleverness. "No plan survives first contact with reality". And a fast implementer can quickly find understanding gaps.

[-]

Historical-Crazy1831@reddit

Interesting! Do you mind if I ask what kind of scenarios you are using these swarms of Qwen3.6-27B? And how you achieve the fast Iterate, test and validate? I am mostly still chat with the AI on cli, and manually save the useful output or let it write down notes. But it seems to be not very efficient.

[-]

Karyo_Ten@reddit

The key thing is that unconstrained LLMs will produce slop faster than you can verify if you don't reign them in. So you need a good review process.

What I do is a use multiple reviewers in parallel each with a "personality" / focus (bug, architecture, documentation, edge cases, domain specialist, performance, ...) and an orchestrator to try to do a 360º review.

Once you have that you can have a way to create a plan, say the /grill-me skill.

And then you ask an orchestrator to do the impl identifying isolated tasks that can be done currently, and that at the end of the "sprint" you launch your parallel reviewers, pass the report for refinement, rince and repeat.

[-]

Historical-Crazy1831@reddit

Thanks! Do you vibe code it or using some exist framework? It sounds like a multi-agent system. Qwen code cli has multi-agent system but it is a built-in feature and is auto-triggered. Trying to read the docs of qwen code to see if it can be customized.

[-]

Karyo_Ten@reddit

I created a SKILL.md that just explained the review process: "1st step ask what to review, 2nd step launch multiple agents, 3rd step format a report"

[-]

ormandj@reddit

As far as I can tell, SM120 will support/supports all of this, it's just taking a little longer for optimized implementations. It's missing tcgen05 but there are other ways to solve the problem that are nearly as performant. Are you referring to something else?

[-]

Karyo_Ten@reddit

RTX 5090 was released almost a year and a half ago.

The RTX Pro 6000 was touted as an entreprise solution for serious workload.

I totally understand if open-source devs with a full-time job on the side take even months to implement a vision tower, MLA, DSA, lightning indexer, Gated Delta Nets, Mamba, tensor parallelism, DFlash, MTP, ...

I have more expectations on the most valuable company of the world, that can literally put thousands of GPUs in autoresearch / reinforcement learning loop to optimize their kernels when they sell an almost $10k piece of hardware.

[-]

Huge-Safety-1061@reddit

This point right here lands

[-]

ieatdownvotes4food@reddit

is ok, you can still vLLM benchmark tune models for 6000pro nicely. no complains not using the h100 h200 defaults

[-]

SocietyTomorrow@reddit

I am so very glad I prepaid for a year of GLM coding plan Max. A month later and the prices are almost 4x higher

[-]

shuozhe@reddit

How's qwens plan? Still 200rmb/month and looks promising. Need something post deepseek promotion

[-]

horrorpages@reddit

Hot take, 3090s probably carry the highest resale-drop risk over the next 12-18 months. Their value is held up by cheap 24 GB VRAM and not because Ampere is aging well.

Once newer 24-32 GB cards hit the market through 2027, the 3090 becomes undesirable. In comparison they'll be power hungry, substantially weaker, and use older inference features. Their premium disappears.

I would sell 2 and get $2k while I can. Maintain the other 2 for local model development (more than enough) and use frontier cloud models for higher reasoning. Do this until you have enough net cash for upgrade.

[-]

tecneeq@reddit

Except that there are no such cards in the cards. If you drift my catch.

[-]

mon_key_house@reddit

Which cards are you expecting next year?

[-]

jirka642@reddit

I hope they become undesirable. The current used prices are almost double of what I paid for one 1-2 years ago.

[-]

SnooPaintings8639@reddit

I remember when Ethereum was going to drop PoW mining and all the GPUs were going to crush, back in 2019. But it got delayed again and again. After that other projects started using them. And then the AI came, where old and worn off RTX 3090 found their new demand, three or so years ago. It is mid 2026 and people still think that we're a year or so from the selloff... but they the price is only increasing.

[-]

Bootes-sphere@reddit

Honestly, the calculus has shifted hard in favor of cloud inference for most workflows. You're already splitting your pipeline (local testing → cloud deployment), so selling could make sense if you're not maxing out those GPUs regularly. The per-token costs on Qwen/Llama are now <$0.01/1M tokens via providers like DeepInfra or Together. Your electricity and cooling costs on 4×3090s probably exceed that for most research workloads. If you do keep them for occasional heavy lifting, just make sure you're stress-testing your deployment code on real hardware before pushing to prod.

[-]

a_beautiful_rhind@reddit

Sales tax and fees on ebay will eat your profit. Barely half way to a pro6k anyway.

If you don't want to do local anymore, sell them. With inflation and the value of electronic goods, I think you might be making a mistake. Then again I should have sold my P40s when they were worth a lot because now they're back down again (but still more than I paid).

How bad do you need the money?

[-]

daviden1013@reddit (OP)

Ebay charges a crazily 13% fee. My calculation is sale f

[-]

bigdoghat32@reddit

also, depending on what state you're in, you might get a form 1099 from ebay. and that doesn't mean you're income taxed on the sale, but it does mean you need to *answer for the 1099* and explain WHY it wasn't $1000 in income, for instance.

[-]

Interesting-Gap6070@reddit

Sell them through reddit on r/hardwareswap. No fees except PayPal. I may be interested btw

[-]

MarcusAurelius68@reddit

I might be as well.

[-]

StrikeOner@reddit

cant you buy like 8 more 3090's for the open funds you have there? that would make a whooping 12x3090's = 288gb vram vs 96gb.

[-]

daviden1013@reddit (OP)

My motherboard only has 5 PCIe 4.0 ×16. Though I can split to ×8 to support up to 10 GPUs, power and cooling will be problems.

[-]

StrikeOner@reddit

yeah that most probably will be a little to much energy consumption on top of it aswell but if you ask me i would stick with a couple 3090's instead. You can run like imagegen on one gpu, embeddings on another, a couple other ones for text inference agentic work etc.. maybe add whisper aswell.. a lot of tiny workhorses with their environment are way more fun then one big chunker for me.

[-]

Conscious_Cut_6144@reddit

Honestly my 3090's are more likely to successfully run an NVFP4 model than my pro 6000
NVFP4 has been a pretty big disapointment on pro 6000

[-]

daviden1013@reddit (OP)

Wow, would like to learn more. I thought PRO 6000 is good except the price.

[-]

ImportancePitiful795@reddit

Do not sell until you have replacement or you want to downscale, in which case means R9700s.

4 of these will set you back less than $5000 for 128GB VRAM, support FP4/FP8 etc.

However if you have some obscure CUDA libraries need to use workarounds.

[-]

quickreactor@reddit

You have gold, you must hold!

[-]

RE20ne@reddit

sell them all. $850 is still a reasonable local sell price

[-]

Farmadupe@reddit

I feel like it should depend on what value you're getting from them now compared to what you could get from Apis during that period? If the answer is nothing then by all means convert them into cash and take stock on the 6000s in a couple of years time!

Fwiw vllm will happily chew through fp8 at acceptable speed without having hw support. And afaik the 4 bit ecosystem is still flooding hugging face with mostly unusable models.

So in a practical sense I don't feel like 3090s are going to suddenly become deadweight.

Plus with 4 3090s, 96G vram will fit plenty of models at 16 but anyway

[-]

daviden1013@reddit (OP)

There's no policy stopping me from using office's GPUs. But the wait time and K8s setting makes it no fun. That's the motivation for my personal server. I agree RTX 3090 won't suddenly become trash. But the resale value will drop.

[-]

codehamr@reddit

Plan makes sense. FP8/FP4 is where the stack is heading and Ampere will keep falling behind. I made the same jump to a Pro 6000 this year, prefill on 30B+ with real context was the big win.

Only catch: $3500 is a long way from a Pro 6000, and going zero-GPU stalls offline work. If you can't stomach the gap, sell 2 and keep 2 as a bridge.

[-]

daviden1013@reddit (OP)

True, $3500 buys less than half a PRO 6000. But maybe the price will go down when 6000 series come out? I wouldn't go with 2 GPUs. 27-35B models takes 50 to 70+ GB. I'd either keep the 4 GPUs or use cloud API.

[-]

diablozzq@reddit

This is terrible advice - NVFP4 only helps with prompt processing speeds. Generation is bottlenecked by memory bandwidth.

It's sad but with a 5090 I don't even use NVFP4 despite Nvidias marketing

1) Few good quants use it
2) The ones that do often don't do it for vision
3) It doesn't make a difference on token generation and sometimes even hurts generation performance due to memory bandwidth

4) Frameworks such as Llama.cpp have been slow to implement NVFP4 because it just wasn't worth it

The 4x 3090 build is very very good still. A single blackwell RTX 6000 pro is all that I would upgrade to, otherwise the 96gb of RAM you have is a beast.

[-]

horrorpages@reddit

I came to the same conclusion. Sell 2, keep 2.

[-]

Last_Mastod0n@reddit

I would sell them as my 3080 died after 2 years of straight etherium mining. I believe it was the memory controller that died.

Also for anyone saying of course it died you were stressing it, I had it waterblocked with vram and hotspot temps never going over 70c.

The 4000 series is a lot more power efficient and resistant to wear in my experience.

[-]

Kholtien@reddit

sell them to me for cheap

[-]

jedsk@reddit

well what kinda work load do you need to run in the next 12 months or so?

[-]

AbbreviationsSad5582@reddit

NVIDIA is focusing on datacenter cards and have literally abandoned the prosumer market. No NVLink on consumer cards, no ECC, and the architectural divide keeps getting wider.

That said, I'd push back on FP8/FP4 as a reason to sell right now.

FP4 requires SM100+ for real hardware acceleration. Your 3090s (SM86) fall back to weight-only dequantization, so you get memory savings but zero throughput gains. You're locked out of the FP4 path, yes, but so is almost everyone else. NVFP4 on consumer Blackwell is still experimental. MoE correctness issues on SM120 are actively being patched. Production-ready FP4 on consumer cards is probably 3-6 months away minimum.

Meanwhile AWQ Marlin on SM86 is mature and works great today. I'm running Gemma-4-31B-AWQ TP=4 on 4x 3090s at 70-79 tok/s on vLLM in production. The 3090 isn't a dead card, it's just not getting faster.

[-]

daviden1013@reddit (OP)

This is very helpful. The professional answer I expected on localllama. Thanks.

[-]

FlyingDogCatcher@reddit

lol, I'll take em. DM me if you want

[-]

Bulky-Priority6824@reddit

run em til theyre paperwieghts

[-]

datbackup@reddit

yes 🙌 we need more local ai diehards on this local ai sub

[-]

Ok-Measurement-1575@reddit

I suspect now is the worst possible time to sell.

[-]

Evildude42@reddit

I’ll give you five cents each, since by your own words, you’ve been running them into the ground for the past two years.

[-]

rorykoehler@reddit

$3.5k is a weeks worth of cloud spend for heavy work

[-]

marscarsrars@reddit

No but if you do want to think of me.

[-]

Long_comment_san@reddit

Not worth it yet. You have clean 96gb VRAM. Drop it to 48 and you get what, 4 bit support? Lower power bill? Naah. I don't see those gains. If you were doing like 3x the speed then sure but I doubt that.

[-]

whodoneit1@reddit

I expect cloud prices will increase 2x-4x from where we are now by the end of this year even. It's why a lot of people are also now looking to do local builds. Also, it hits at a perfect time as open source models such as Qwen3.6 have hit a level and are easily capable to run locally.

The throw everything at Opus for cheap days are over, the future is going to be using different models for different tasks and using them wisely.

[-]

Lissanro@reddit

I have four 3090 in my rig too. I have no plans of selling them, and plan to use them for at least few years more. Even if I buy RTX PRO 6000, having four 3090 still would be useful - even if they are slower they are still faster than offloading to RAM.

Even if you do not need more than 96 GB VRAM, my suggestion would be to sell when you actually have extra money to buy RTX PRO 6000 right away. Otherwise, you may end up paying more and more for cloud API (which may add up after a year or two), not to mention loss of privacy that comes without local hardware.

Of course, it is up to you. If your tasks do not require privacy and you do not use them actively, and really need the money now, you can sell. This is your hardware, so only you can decide.

[-]

ea_man@reddit

If I had to guess I would say that old GPU for AI prices will go up in the next months as cloud providers are heavily rising prices and decreasing limits. You may actually get more money later on.

[-]

Makers7886@reddit

I've had to make my own int8s recently for mistral medium and qwen3.6 27b since they went to the GEMM FP8. I would only consider getting out of my 3090s if electricity was a larger factor/concern where you are.

[-]

munkiemagik@reddit

If you can actually do the work in cloud then it makes little financial sense to have all that money tied up in hardware.

I had that very thought just last night, though bear in mind I dont do anything meaningfully useful with my gear, I just faff about. With Qwen3.6 27b/35b the LLM machine barely gets powered on and I tend to run LLM more on the single 5090 in my gaming machine and cloud so maybe I should just dump everything LLM server related. I dont need the money for anything else and having the money tied up isn't stopping me from spending on anything else I want to, so that wouldn't be my motivation for selling it.

I contemplate future purchases as 'unified memory' platforms kick up a notch and memory bandwidth increases. Would be great in maybe a year or two to be able to buy an AMD box that has more than 250GB/s memory bandwidth (Medusa Point?). But I imagine prices are going to jump a bit. Is there sense in offloading the 3090s now that they've also spiked to offset the next jump up I will have to make? I have a revenue generation schedule this year and predict good upside, if it pans out as I predict I could justify to myself splurging on the RTX 6000 Pro just to play about. But qwen3.6 has actually put a bit of a check on my greed for VRAM.

However I still don't like to see the 3090's sitting idle as much as they are, what does stop me from selling right now is the whole 'what if' in between now and when I might buy something new

We can try and predict but we just dont know whats coming up ahead. what if later I find that some amazing new models come out that the 5090 just isn't gong to handle well by itself and I'll be pissed that I no longer have the 3090's, So I'm still holding on to it all.

[-]

Farmadupe@reddit

I feel you... my single 3090 isn't quite enough to run qwen3.6-27b just how I'd like (it's annoyingly close but only if I choose one of: no context, no speed, or no output quality), and I could drop £850 on a second card to make fp8 usable on vllm. And I absolutely know I could kid myself into not factoring in the cost of the inevitable case, mobo and PSU upgrades.

But at the same time, a single A40 on runpod is basically a 48Gb 3090, and I can rent one for $0.44 per hour. So I have to ask myself if I'm really going to get £850/$0.44 = 2600 hours of continuous use out of that second 3090 and... I wish it was actually a hard question because I want to play with toys!

[-]

munkiemagik@reddit

That there is the crux of the issue for many of us, forget the fact that its your own/container/environment you are faffing with to your hearts content on runpod/vast so you can do whatever you want with it, but its not your own hardware that you enjoy poking and prodding 🤣

[-]

TokenRingAI@reddit

I am interested in the whole rig

[-]

UnifiedFlow@reddit

Selling them is an obviously bad idea.

[-]

hp1337@reddit

I'm keeping my 3090s til the day I did. Power limited they are still best bang for buck.

[-]

Every-Arachnid-1133@reddit

I will buy 2 from you plsss

[-]

Dry_Yam_4597@reddit

I can sell you 3 if you are in the UK :P

[-]

Every-Arachnid-1133@reddit

I wish I was in the UK rn 😔

[-]

False_Ad_5372@reddit

And I’ll buy the other two

[-]

Orlandocollins@reddit

Id only sell when you will directly use the funds to purchase the 6000. The way the market is going you want to have hardware

[-]

ortegaalfredo@reddit

3090 does support FP8 and some formats of FP4 via emulation and it's surprisingly fast. But the emulation needs to be done in the software and inference engines like VLLM are starting to cut support on older archs.

[-]

SettingAgile9080@reddit

How much did you buy them for? How often do you regularly utilize the capacity of all 4 where speed matters? Do you have to sell them all? If it were me, I'd consider selling 1-3 of them so my remaining hardware was "free", that way you're well hedged against all possible outcomes.

[-]

Pitpeaches@reddit

Is fp8 or fp4 better quants? As others have said cloud / frontier models are going up in price and are really unreliable in quality. Maybe having a tool that is always constant might be better.

Ps. Weird other replies not understanding your questions. Guess they could be bots

[-]