Best bang for the buck GPU
Posted by Ok-Cucumber-7217@reddit | LocalLLaMA | View on Reddit | 95 comments
I know this question is asked quite often, but going back to old posts makes me want to cry. I was naive enough to think that if I waited for the new generation of GPUs to come out, the older models would drop in price.
I'm curious about the best GPU for Local LLMs right now. How is AMD's support looking so far? I have 3 PCI slots (2 from CPU, 1 from chipset). What's the best bang for your buck?
I see the RTX 3060 12GB priced around $250. Meanwhile, the RTX 3090 24GB is around $850 or more, which makes me unsure if I should, I buy one RTX 3090 and leave some room for future upgrades, or just buy three RTX 3060s for roughly the same price.
I had also considered the NVIDIA P40 with 24GB a while back, but it's currently priced at over $400, which is crazy expensive for what it was a year ago.
Also, I’ve seen mentions of risers, splitters, and bifurcation—but how viable are these methods specifically for LLM inference? Will cutting down to x4 or x1 lanes per GPU actually tank performance ?
Mainly want to run 32b models (like Qwen2.5-Coder) but running some 70b models like llama3.1 would be cool.
troughtspace@reddit
China mi50 hbm2 32gb models, they wotk with radeon pro vii bios. Around 150-200€ each. I got 10pcs radeon pro vii/mi50, gigabyte server 10 pcie slots, heh 400e all of them, lucky me :) 3x1600w psu. https://www.gigabyte.com/Enterprise/GPU-Server/G431-MM0-rev-100
I read that all gpus, get trought pcie 4.0 16x....
bunny_go@reddit
Neither, rent the hardware online if needed or pay for the services. Owning a hardware you use 1% of the time is extremely expensive and dumb
GeroldM972@reddit
You lose money when you rent (hardware and/or services), you lose money when you buy.
Except when you buy, you still have the hardware. Which you still can peddle as a whole or even for parts, so in the end, you'll spend less.
Renting is ease-of-use, buying less so. But that doesn't make buying dumb.
techantics@reddit
If you can't get two 3090s right away, I would get one, and wait wait for a good deal for the other. 48GB of VRAM and the overall performance of this card will make this really worth it in the end.
fmlitscometothis@reddit
It depends on what you value, start with your target model/requirements and work backwards. If you don't know what your target model is, a single 3090 gets you in the game with an upgrade path to 2x3090 if you need more vram. As long as you have the power for them (will you need to upgrade psu as well?)
A 4090 will give you faster inference on the same model. And imo means you dodge the 3090 minefield of getting a bad card (that a miner has run 24/7/365 etc).
However the price of 4090s last i looked was around £1.8k (3090 for £700). That's a lot more expensive. And only £500-800 less than a 5090. And when 5090s have good supply, your 4090 might drop a lot more in resale value. This is how I talked myself into a 5090 🤣. If my expected drop in 4090 value is (for eg) £400, if I want to sell and upgrade to 5090 in future, it'll cost me £400 loss plus difference in 5090-4090 (£2300-1800=500). So my upgrade path is £900. So if I buy a scalped 5090 for less than £900 above RRP, I actually save money and I get the 5090 now 🤣🧠🤸♂️.
And don't get me started on RTX Pro 6000s! When people say "you can buy tokens, so rent, don't own" they don't seem to factor the (apparent lack of) depreciation. I could be wildly wrong, but I can't see a 96gb Blackwell going out of fashion any time soon. The first "big card" for under 10k, 300w and no faffing with power, risers, heat etc. Invest in one now, get the street cred and loads of juicy VRAM and if you need to, flog it in a year or two. You might drop a couple of grand in the process, but paying £100/month to own it for 2 years doesn't seem like the worst deal to me 🧠🤸♂️🤪
Get a 3090. Or an RTX Pro 6000 💀
inteblio@reddit
For specific tasks, ASICs might suddenly wipe the floor with nvidia. So maybe a big-bucks card might not be such a future-proof option. That said, gpus are a mature (and incredible) product, so they might remain untouchable for years.
Lanky-Question2636@reddit
This is the best answer to that question https://timdettmers.com/2023/01/30/which-gpu-for-deep-learning/
prompt_seeker@reddit
It's a great article really.
prompt_seeker@reddit
2x3060 would be similar performance to 1x3090 with tensor parallelism.
But it would be slower on llama.cpp, and also have difficulty of training or quantization.
So, I recommend 3090, two if you can afford.
Arkonias@reddit
3090 is the answer. Best GPU with 24gb of Vram that works out of the box with everything.
Dundell@reddit
I'm a personal sucker for RTX 3060 12GB's. I bought a total of 4, with 2 at $180 OEM models, and the other 2 at $225. I run QwQ-32B 6.0bpw 64k Q8 cache with a 8.0BPW 0.5B draft model around 30\~8 t/s depending on how full the context is. All limited to 100W's so 400W's altogether + 125W's for the X99 open desktop build. x3 RTX 3060's I don't know if it's a good Idea. I don't know if 3 can be tensor paralleled in exl2?
I push then pcie 3.0@8 lanes each, but for inference I don't think they get hindered if they were pushed to pcie 3.0@4lanes each. So you could get a motherboard that supports bifurcation, a simple x4 lane bifurcation card, and just split it 4 ways to simplify your options.
I also have a P40 back when it was $200, and it can run like a Q3 QwQ-32B 24k Q6 context + 0.5B Q8 Draft which is pushing it to the limits of its Vram, and while it's fine...it's around 9\~3 t/s. I'm thinking of attempting something else with the P40 maybe see if here a good vision model it can support.
ikaraas@reddit
i am building for similar setup with 4x3060, used 3090s are 800-900$ here. Right now i have 32GB RAM, do I need to more RAM for running 32B models? Is it practical to run quantized 70b on 48GB VRAM for document QA, summarizing? Can you please give some idea?
Dundell@reddit
For document summarizations, I kind of do this with a current workflow I'm working on with my QwQ-32B.
In which case QwQ-32B does a very good job of this, but you'll want to keep it to 64K contextost of the time for max comprehension. You'll want to setup the server and API, and just think about what you want. Don't chat it the info, start off by asking QwQ-32B, or like gemini 2.5 pro "this is what I want to summarize. Help me come up with a plan to automate this process using the URL api as follows", etc etc.
ikaraas@reddit
Thank you for the detailed suggestions. I will look more into it by setting up the server. Currently, I am just testing using Ollama, cannot even fit 4 GPU on the mobo. i have ordered an Mining rig case, so it can accommodate all 4 GPUs using risers.
Dundell@reddit
I have a saved parts list of my current build, or close to my current build. The dual PSU is probably not needed. It's also a mining rig setup in the corner of my office:
Type|Item|Price
:----|:----|:----
**CPU** | [Intel Xeon E5-2690 V4 2.6 GHz 14-Core Processor](https://pcpartpicker.com/product/fNcMnQ/intel-cpu-bx80660e52690v4) | Purchased For $35.00
**Memory** | [Corsair Vengeance LPX 64 GB (8 x 8 GB) DDR4-2400 CL15 Memory](https://pcpartpicker.com/product/9ycMnQ/corsair-memory-cmk64gx4m8a2400c14) | Purchased For $60.00
**Storage** | [TEAMGROUP AX2 1 TB 2.5" Solid State Drive](https://pcpartpicker.com/product/FPFbt6/team-ax2-1-tb-25-solid-state-drive-t253a3001t0c101) | Purchased For $55.00
**Video Card** | [Zotac GAMING AMP GeForce RTX 3060 12GB 12 GB Video Card](https://pcpartpicker.com/product/DXH7YJ/zotac-geforce-rtx-3060-12-gb-gaming-amp-video-card-zt-a30600f-10p) | Purchased For $180.00
**Video Card** | [Zotac GAMING AMP GeForce RTX 3060 12GB 12 GB Video Card](https://pcpartpicker.com/product/DXH7YJ/zotac-geforce-rtx-3060-12-gb-gaming-amp-video-card-zt-a30600f-10p) | Purchased For $180.00
**Video Card** | [Zotac GAMING AMP GeForce RTX 3060 12GB 12 GB Video Card](https://pcpartpicker.com/product/DXH7YJ/zotac-geforce-rtx-3060-12-gb-gaming-amp-video-card-zt-a30600f-10p) | Purchased For $225.00
**Video Card** | [Gigabyte WINDFORCE OC Rev 2.0 GeForce RTX 3060 12GB 12 GB Video Card](https://pcpartpicker.com/product/rvNYcf/gigabyte-windforce-oc-rev-20-geforce-rtx-3060-12gb-12-gb-video-card-gv-n3060wf2oc-12gd-rev20) | Purchased For $250.00
**Power Supply** | [FSP Group Hydro G PRO ATX3.0(PCIe5.0) 1000 W 80+ Gold Certified Fully Modular ATX Power Supply](https://pcpartpicker.com/product/nmCZxr/fsp-group-hydro-g-pro-atx30pcie50-1000-w-80-gold-certified-fully-modular-atx-power-supply-hg2-1000gen5) | Purchased For $100.00
**Custom** | [Amangny GPU PCI-e 8 Pin Female to Dual 8(6+2) Pin Male PCI Express Braided Sleeved Splitter Power Cable 9 inch (3 Pack)](https://pcpartpicker.com/product/c9D7YJ/placeholder) | Purchased For $12.00
**Custom** | [Qaoquda Dual PSU Power Supply 24 Pin Adapter Cable for ATX Motherboard 18AWG - 1FT](https://pcpartpicker.com/product/dK9wrH/placeholder) | Purchased For $8.00
**Custom**| CORSAIR RM750i 750W Desktop PSU| Purchased For $52.00
**Custom**| Sluice V2 12GPU Stackable Open Frame Mining Rig Frame Chassis| Purchased For $52.00
**Custom**| Machinist X99 MR9S| Purchased For $70.00
**Custom**| Xeon CPU Cooler| Purchased For $17.44
**Custom**| x6 x16 PCIE to x16 PCIE 3.0 cables| Purchased For $120.00
**Custom**| PCIE Bifurcation x8x8 adapter| Purchased For $35.00
**Custom**| PCIE Bifurcation x8x8 adapter| Purchased For $35.00
| *Prices include shipping, taxes, rebates, and discounts* |
| **Total** | **$1486.44**
ikaraas@reddit
Cool, Thank you for sharing the build! I was lucky to get a EVGA 1k PSU for 70$ and a asus x99 E WS mobo for 100$. I cheaped out on the frame, got one in Amazon for 36$. I grabbed those GPUs from different brands, lol, the 4th one is a Gigabyte. it's hard to get a 3060 here within 200. Waiting for the risers now, currently I put 3 GPUS directly on the mobo. they're close, but will fix it once I get the risers. Hopefully, with these 4, I will be able to fool around and can get them for quick summary or grammar fixing.
Dundell@reddit
I used to run 4.0bpw qwen 2.5 72b instruct + Q8 0.5b draft model with 30k context around 22~15~7t/s depending on the context. It worked good, but it just seems the QwQ-32B is better in my testing.
The upcoming 72B models might end up getting better and change my mind to swap back to 72Bs.
Overall right now QwQ-32B with all the recommended settings for coding and performance and 64K context, I reach 38GB Vram. The 72B model with more limited context probably up could push it up with Q4 context to around 45k... It's maxing the Vram of the 4 rtx 3060's.
Glittering_Mouse_883@reddit
I started with 2x 3060. One I had bought new a few years ago, second one I got on eBay for like $180 last summer. and then added 2x 3090s a little while later. I pulled the trigger on the 3099s back when they were about $700 and a few people on this forum warned me prices were not going down.
What I learned is that Vram is king at the end of the day. I think I would honestly have been fine buying like 6 more 3060s if I had the pcie slots for it.
By the way in terms of power consumption I run the 3060s at 100w and the 3090s at 300w without noticeable performance drop. So you really can get 3 for 1 on these, not just in terms of $ but also in terms of Power consumption. Which I guess is also $ at the end of the day.
Just my two cents.
bfrd9k@reddit
I went with 2x 24G 3090 and I have no regrets. I think I paid $1600 for both GPU at the time, $200 for nvlink. I've been wanting another so I went looking at hardware again and damn I wouldn't be able to afford any other build without knowing I'd have a real cash dollar ROI.
I can run llama3.3:70b, probably my favorite general purpose model, around 19t/s, which is perfectly fine, smaller models are obviously much faster.
Great platform for cutting your teeth on LLMs.
GreedyAdeptness7133@reddit
Which motherboard?
bfrd9k@reddit
asus rog crosshair viii hero, got it for free so can't complain.
_hephaestus@reddit
How’d you cool them? I have one 3090 in my rig right now and wish I could put another, but the PCIE slot situation means all the hot air would go directly onto the theoretical second
bfrd9k@reddit
Stock heatsink and fans. Case is rack mounted with good circulation and I don't run them hard enough to really worry me. They're cool until I give them work and some reasoning models like qwq or deepseek-r1 will get the cards super hot if I confuse them with a bad prompt, but it's rare.
caetydid@reddit
does the nvlink give you any speed ups? I have two 3090s but read everywhere nvlink is not useful for llm inference
bfrd9k@reddit
I actually built my machine for more than inference so tbh, I don't know. From what I understand the nvlink creates a logical pool of vram that can be used by both GPUs, and the data is accessed over the nvlink vs the pcie bus, but I actually don't know.
Maybe someone else can chime in.
green__1@reddit
Is there any value in looking at anything other than nvidia?
Ok-Cucumber-7217@reddit (OP)
I don't think so for now.
AMD cards are cheaper, so I was wondering if anyone had success utilizing them for local inference
Psychological_Ear393@reddit
What do you mean by success? I run 2xMI50 on Ubuntu and on another machine 1x7900 GRE on Windows and both work without problems.
If your budget is low and you just want VRAM, AMD wins every time because you can at least load a model on a GPU for a lower price.
e.g. 1x W6800 with 32Gb VRAM is cheaper than 1x3090 with 24Gb VRAM. If you're willing to fiddle a little, 2xMI50 (Linux only) for 32Gb VRAM is 1/3 the price of a 3090, although that will be significantly slower but if your goal is just to be able to load a model, that's a win - e.g. olmo2 fp16 is about 17tps in ollama on the MI50s if that's acceptable.
If you want performance and have the money to spend on it, then nvidia may be the best option, although ROCm and other technologies is rapidly catching up because a lot of the AMD cards are technically better for the price but software lets them down.
Karyo_Ten@reddit
I've used AMD APU in a 7940HS (780M iGPU) successfully in ollama with GTT memory patch, so I can load up to 96GB models.
It's slow 6.55 tokens/s on Phi4-14b Q4_K_M but it works.
a_postgres_situation@reddit
Hmm... how did you measure that? Mine seems faster... https://i.imgur.com/vwsmP6x.jpeg
Karyo_Ten@reddit
ollama run --verbose
Where you using the GPU or the CPU?
a_postgres_situation@reddit
ah, I got: "dmidecode -t 17 |grep "Configured Memory Speed" -> Configured Memory Speed: 6400 MT/s
Or its Vulkan. Or llama.cpp. Anyway, mine is faster, I'm fine... :-)
a_postgres_situation@reddit
QC25Coder, running on Zen4 integrated RDNA 3 (either 8xxxG CPU or mobile ones), tested via llama.cpp (llama-bench -n 32) and using GPU Vulkan acceleration:
Qwen25C 32B Q4.KM... ~4 token/s
Qwen25C 32B Q6.K... ~3 token/s
Qwen25C 32B Q8.0... ~2.3 token/s
Monad_Maya@reddit
What about the 7900xt from AMd, 20GB of memory at about $600 ± 50? You can also get the XTX with 24GB of memory.
Does AMD suck that much for this kinda stuff?
Stepfunction@reddit
The answer is almost always going to be a 3090.
To be honest, I would recommend learning to use RunPod and experimenting with the hardware before you buy it to get a feel for what works for you. Setting up a kobold template takes just a few minutes and you try running whatever model you'd like within a few minutes.
ResponsibleTruck4717@reddit
Dont you think that maybe 4060ti 16gb or 5060ti 16gb be better? (when going for multi card setup), the 3090 are already quite old, most of them are probably have been worked quite hard, and they consume way more power.
Stepfunction@reddit
VRAM is king. You can reduce the power cap on them to cut down the requirement. Even given their age, they're still fantastic cards for running LLMs
As far as age goes, both the 3090 and 4090 are in the CUDA 8.x compute capability category, which means as long as the 4090 is supported, the 3090 will generally be supported as well.
You can get away with a dual 4060ti setup, but the dual cards will be cumbersome and slower. A single 3090 will be easier to work with in general.
AppearanceHeavy6724@reddit
for slightly higher price than of unobtanium single used 4060ti you can get 2x3060. Which is faster and has more memory.
panchovix@reddit
Not OP but one of the downsides of the 3090 is not FP8 native support. Even that it will outperform the 4060Ti running at FP8 though.
Cyberbird85@reddit
Yeah, quite unfortunate, but I've also just bought a 3090 to replace one of my p40s and the difference in speed now just makes me want to buy another.
The used ones go for around the equivalent of 650-700 USD here, which is not great, but not exactly terrible either.
(Altough, if you take into account that the avarage net salary per month here is around 1163 USD, (which is the avarage not the median so reality is even worse) you might come to a different concolusion about the price.)
bfrd9k@reddit
I disagree that $700 per 3090 is "not great". Intuitively, sure, years old used GPU should not be so expensive but the reality is they are great cards for this use case and a fraction of the cost of newer cards.
a_beautiful_rhind@reddit
More than 200 for P40 was "not great".
AppearanceHeavy6724@reddit
1) Super low budget: P102, P104
2) low budget: used 3060
3) medium-high budget 3090
fallingdowndizzyvr@reddit
Are you thinking about doing any video gen. If so, get the 3090. Since that 12GB on the 3060 will hold you down. Video gen models don't split across multi-gpus.
Beneficial_Tap_6359@reddit
I found the RTX Quadro 8000 48gb was best option if you need more than 24gb of VRAM on a single GPU. Next up is the modified 4090 48gb for a fair bit more. Otherwise is comes down to what deals you can find on multi GPU setups and your budget.
green__1@reddit
yeah, but it's also $7,000-$10,000 for 1 card... With 4070 Super and 5070 TI both sitting around $1500 I could buy 5 or 6 of them for the same price (Prices in my local currency)
Beneficial_Tap_6359@reddit
Fair point. But a 6x GPU setup vs a single new GPU isn't really the same type of setup either. I could sell my quadros for 5-6k and get a bigger setup too, but that brings a ton of complications with it. I don't think my current use warrants 7-8k+ for a blackwell though, so it would be more like a few years down the road. I honestly only grabbed the quadro(s) because it was cheaper than a 4090 or 5090 and gives me more VRAM. I kept the 2nd one because why not play with the cool gear while I got it?
green__1@reddit
I'm not really advocating for a 6x GPU setup, but at the same time I'm trying to point out just how much of a price difference we're talking here.
Absolutely if you can get it ridiculously cheap go for it! I'm just saying that for us "plebs" it's a bit out of reach.
Beneficial_Tap_6359@reddit
Single 48gb Quadro is what I'm actually suggesting, the 2nd one is just because. I got mine for \~$2100, but they're currently going for closer to 25-2800 on eBay. I didn't change anything about my existing desktop just to add the GPU, so there wasn't any additional costs. If it was a freshly compiled setup the discussion changes a little too.
I do certainly agree its a fair points towards multi-gpu setups. What kind of multi GPU setup with all the supporting weird bits can you get for \~2500 nowadays?
YouDontSeemRight@reddit
I thought I looked up the Quadro and there was something about unsupported features that could make it obsolete? Did I read that wrong? Or is it just missing certain technologies like flash attention?
It would be nice if the 48gb 6000's dropped. Ideally Nvidia would just release a 5099 PRO 48GB for the masses. That's all we really want and puts 96GB configurations needing only two cards within reach. I would run it at restricted wattage.... But it would need to be mass produced in quantities sufficient for demand.
Beneficial_Tap_6359@reddit
I saw that too before buying. They are far from obsolete, but there are some features starting to be lacking. Honestly so far everything has worked right out of the box with no tinkering required. From drivers to LLM, Image, and Video models it all actually just worked. I do think there is a lot of potential left in the software stack if you want to get into the weeds optimizing.
green__1@reddit
Thanks for pointing at ebay, at $3,000 it's a lot easier to justify, though I see all those cheap ones are straight from china and "passive cooling" which does make me a bit hesitant. Adding a fan seems to add $1,000, and I hear some horror stories about fake cards from china
Beneficial_Tap_6359@reddit
I did take a while to actually find the regular non-passive ones for a decent price. Sadly a month before I bought them they were even cheaper. But once the 5090 landed and became impossible to get for MSRP I decided the Quadro was a "deal". Not a good justification of the price exactly though lol
a_beautiful_rhind@reddit
No flash attention support kinda kills the turning series. BF16 you can get around and they are still capable cards.
Yea yea.. llama.cpp.. not the same as the one in transformers.
Beneficial_Tap_6359@reddit
Also a fair point. They are starting to age out with some features. Supposedly someone with more skills than me can still make them work though. I half intend to use it as a learning platform to get deeper into the nuts and bolts of making stuff like that work on the older architecture. Realistically everything so far has just worked though so I haven't bothered to dig in further, yet.
a_beautiful_rhind@reddit
You will learn a bit more about cuda kernels than you want to going down that road.
RTX8k appears slower than the 2080ti 22g from the benchmarks people put up here. Not sure why it's still over 2k on ebay.
Beneficial_Tap_6359@reddit
I actually caught your post about it being slower than the 2080ti the other day too, I'm not sure I've seen it mentioned elsewhere though. I'll have to dig up some benchmarks to compare with. But really I think it just comes down to being the most affordable 48gb card with recent-enough features. It still has \~768gb/s memory bandwidth, which is better than some newer cards on paper too That's what it was for me at least. I do think its a bit overpriced, but the other options are twice as expensive.
a_beautiful_rhind@reddit
If you really really need the form factor it's the cheapest. Best value though, not so sure.
Ok-Cucumber-7217@reddit (OP)
How much better will RTX Quadro 8000 48gb be from something like X2 3090 ?
Beneficial_Tap_6359@reddit
Well, "better" really depends. It is a single GPU and runs at about 220 watts max, is only 10.5" long, and uses regular PCI-E power connectors. Performance wise it is slower, but performance is irrelevant if you don't have enough VRAM. The single GPU performance is slower than newer setups, but it is faster than my 4090+CPU/RAM overflow for models that don't fit in 48gb. Even when I run x2 of them with NVLink the total system draw is still barely above 400watts, compared to my gaming rig with a 4090 that can consume 400w just itself. I prefer this solution over trying to fit two big gaming cards, or going down the weird setup route. These literally just plugged into my "old" x470 board with regular PCI-E power connectors(no fire hazard), no space or cooling issues. Very simple setup and I'm happy with it(cost aside). If I could sell them both for a fair bit and get a blackwell 96gb card for reasonable I would, but that is years down the road with the way things are looking.
AmericanNewt8@reddit
I'm actually thinking about making a flowchart for this. The answer is "it depends" and "how brave are you precisely".
Massive_Robot_Cactus@reddit
It's a great time to learn GPU repair!
TheSilverSmith47@reddit
I would love this. It should also be pinned for newcomers
epigen01@reddit
Surprisingly as long as you have sufficient RAM (16-32GB) +30/40/50XX RTX 8GB-12GB is sufficient to run 32B LLMs with pretty fast token/s generation.
I havent tried the intel arcs bc wasnt available at the time - but they have made a lot of progress in their support (so that might be most cost effective)
HugoCortell@reddit
Best bang for buck? It's not a GPU, it's a CPU setup that uses RAM. It will be slow, but much cheaper.
s101c@reddit
Or a combination: entry GPU + CPU with lots of RAM.
GPU will accelerate prompt eval speed significantly, even though only a small part of the model will be loaded onto the GPU.
Beneficial_Tap_6359@reddit
Also a great point. I upgraded to 128gb RAM first to dabble with the bigger models before going in on a bigger GPU setup.
celeski@reddit
If you can get 3060 12 GB for 250, that would be the best entry point to run 14b models at good quants like q4 or even q6 with some spillover. But if you want 32b models like qwq, 24gb is very beneficial. There were some posts here with 4x 3060 for 48gb vram that ran 70b at q4 quite well but the setup with 4 cards will be more difficult in a regular case.
That's where you need to think about if it's better to just buy your first 3090 which will give you plenty to run and explore. That's what I did, and now I am on my 3rd 3090 with 72 GB vram which allows a lot more content and higher precision 70b models. But I bought most of them over a year ago and got a decent price for around 500-600$ each. Now they can go for close to 1k since I got OC EVGA and MSI models.
I would not go for 4090s since they are still super expensive and can't seem to find any under 2k :( Not worth the extra cost for a little better interference since bandwidth and memory stay same through both generations.
Another thing to note is that if you get 3060 and then choose to get 3090, your 3060 will bottleneck the 3090 due to slower memory (half the bus at 192 bit)
I'm interested to know what you end up going with and good luck!
fizzy1242@reddit
I got three 3090s aswell. Have you run any larger models than 70b? 111b command-a can be run at Q4 too
Sorry_Sort6059@reddit
I just bought a 2080ti 22g, great cost-performance, $422.
p4s2wd@reddit
I agreed totally, I bought 7 x 2080ti 22G \^_\^
jrherita@reddit
Found the seller!
Sorry_Sort6059@reddit
What solution did you use to integrate 7 2080ti together?
crapaud_dindon@reddit
422 shipped? Which supplier?
Sorry_Sort6059@reddit
I purchased it in China, JD.com has a two-year warranty.
Karyo_Ten@reddit
So who volunteers as a tribute to test the warranty?
Is shipping included?
Sorry_Sort6059@reddit
Including that, but I made it very clear, I am in China, and I buy on platforms in China. Even so, there are still significant risks, such as suppliers going out of business. I heard that buying directly in Shenzhen is cheaper, just for reference.
Karyo_Ten@reddit
Ah I see.
Sorry_Sort6059@reddit
Ok-Cucumber-7217@reddit (OP)
Where did you get it from ?
it looks good on paper, did you have any issues with it ? Any computability issues ?
Sorry_Sort6059@reddit
I just installed it, and so far the data looks fine. I'll run the big model and check again
Reader3123@reddit
Are you only going to be doing inference? AMD is not horrible if the budget is tight. Im running 2 rx 6800 rn. 32 gb vram for less than $500?
Secure_Reflection409@reddit
3060 is king unless you've got to wait for 16,000,000,000 thinking tokens.
Qual_@reddit
3090 used. Found mine for 350€, but most of the price were between 400 and 650.
MachineZer0@reddit
I acquired and tested a bunch with a 8B model. Best bang for the buck was P102-100, but then it went out of stock from a couple volume sellers on eBay. A strong one was P40 which offered 24gb for $150, but has since more than doubled. 3090 is all around king still. Still testing out 7900 XTX, which could possibly edge out.
https://www.reddit.com/r/LocalLLaMA/s/5ABL1EmaLE
LingonberryGreen8881@reddit
"Bang for the buck" and only factoring a single part is not the way to go IMO. Factor the entire cost over the expected lifetime of that PC including all input costs. For example, factor the price of the total system, factor the cost of the games you will play, factor the cost of the electricity.
The cost per hour to game can be found and it will be much higher than you thought. If the total cost over 5 years spent on "budget" gaming is say $5000, spending an extra $300 on a 50% better experience for that 5 years seems more than reasonable.
val_in_tech@reddit
While 3060 might look ok on paper it's much slower inferance plus you need more PCI slots. RTX 3090 is the best performance to price right now. This 5y old card is faster at inferance than Apple M4 Max. And insanely much faster for prompt processing. The upcoming desktop AI computers are rumoured to have only 25% of 3090 memory bandwidth. Which is the most important thing for inferance performance.
If you need max memory at lower speeds you can wait for Digits computer or get older M1 Max with high RAM.
Naiw80@reddit
Intel Arc B580
Naiw80@reddit
I’m not joking, it’s extremely capable for it’s price point, the following video demonstrates it.
https://youtu.be/NupMydGNAv8
You_Wen_AzzHu@reddit
3060x3 is very affordable and you can load mixtral q4 with 100k context and still get decent speed.
nother_level@reddit
thatkidnamedrocky@reddit
Went with two modded 2080ti each have 22gb of vram for a total of 44. I’m able to run llma3.3:70b with decent speeds. Each card was around $500 on eBay.
OmarBessa@reddit
3090
ThenExtension9196@reddit
5090
Alauzhen@reddit
I recommend 3090s or workstation GPUs with 24GB or more VRAM if you can find them for cheap.