What would 2x RTX 3060 12GB get me?
Posted by ObjectiveActuator8@reddit | LocalLLaMA | View on Reddit | 43 comments
TLDR: I’m considering buying 2 RTX 3060 12GB as opposed to single 24GB card to gain experience and need to know what can be realistically accomplished with this setup.
Sorry in advance, I know you guys are probably tired of these kinds of post but I wanted to shoot my shot at asking.
Last year I bought an RX 5700 XT 8GB for gaming and when I tried local ai models, for the life of me I couldn’t get it to work. So all my inference was CPU only. I have 32GB RAM and I’m looking to upgrade that at some point. So the rest of the hardware, I know I gotta take care of (RAM, PSU, etc).
What I’m trying to accomplish is, first of all, agentic coding (I know I shouldn’t get my hopes up there and it will definitely not become my daily driver at this scale, but if centering a div can be accomplished in less than 5 minutes, maybe that’s a win). The second goal is to gain experience with workflows, putting models with heavy chains that could be applicable to small business tasks… and I mention wanting 2 cards instead of one for the experience of running multiple GPUs.
So with this in mind, what models can this VRAM power actually accomplish in your experience?
Thanks guys.
its_a_llama_drama@reddit
I would reccomend the 3090 or another single 24GB card. There is no gain from having two cards, you are not missing out on learning how to get two cards working, as it is one or two extra lines in the env file to say tensor parallelism = 2, and cuda visible devices = 0,1 splitting the vram is just limiting for no gain
You will get far more from a 3090 than 2x 3060. I am guessing there is not that much price difference used.
For the card you have now, i would reccomend you try using chat gpt to get your gpu working. Rocm can be a pain and if you're happy to feed back errors and fault find for it, chat gpt should be able to get a working stack gping for your card, i used it to get everything set up initially. Just tell it what card you have, what you want to do with it, and what is wrong.
There is more 'learning' involved with using non nvidia cards than there is using more than one card.
fallingdowndizzyvr@reddit
TP makes it a gain.
Your guess is wrong. A 3090 is like $1000. 2x3060s is like $600.
its_a_llama_drama@reddit
Woth 12GB per card in tp, You just hit overhead sooner as the layers are unlikely to split perfectly into 12GB. So you are wasting VRAM.
If your argument is tp is faster, yes as pcie traffic is unlikely to saturate for plain old inference. But it is still not faster than a single 3090. VRAM capacity is usually worth more to people than speed.
2 x 5060ti is a good suggestion for the extra vram. 2 5060 ti will be faster than one 3090 when in tp. They use less energy too. And they cost about the same as a 3090 (where I am anyway)
ObjectiveActuator8@reddit (OP)
The splitting part is very enlightening. Might consider the single 24GB card then
FullOf_Bad_Ideas@reddit
You can rent 2x 3060 12gb on Vast for 0.5 usd/hr and play with it.
Play with 3090 too and you'll have a solid first hand experience without spending much money on it
WishfulAgenda@reddit
100% this. Try it and see if it works for you. I did exactly this in working through whether I buy another card or not. In the end I decided to buy simply due to not being able to rent what I wanted often enough due to demand.
FullOf_Bad_Ideas@reddit
That's an awesome way of finding out you have a real "demand" for a particular hardware.
WishfulAgenda@reddit
Yeah, it’s even fun to try things like H200’s just for a comparison.
I had fun hooking up opencode on a raspberry pi with sensors and a h200 and watch it go!
emaiksiaime@reddit
For what it’s worth I run qwen 3.6 35b at 60tok sec with 131k context on a single p40 that cost me 350$cad
fallingdowndizzyvr@reddit
What quant? MTP?
commanderthot@reddit
I run dual 3060 and 32gb ddr4, I can comfortably run stuff like Gemma 31b dense and lower on a 80/20 split gpu/cpu. For a budget solution it’s very usable, especially when a 3090 (locally, non-US) costs upwards of 700-800$ where I am at compared to dual 3060 being a little above 450$ for two.
fallingdowndizzyvr@reddit
Here in the US, that would be considered cheap.
TinyFluffyRabbit@reddit
If you're considering dual 3060s, you're probably going to be better off just getting a 3090. There is some cost and inconvenience associated with getting a motherboard that splits PCIe lanes (unless you just want to layer split but that's going to be slower) and making sure the GPUs fit.
Force88@reddit
I think you can run qwen 3.6 27b fully on vram, but with lower quant (q3 or q4) and low context.
You can run 13b model comfortably on vram alone, or you can try MoE models like gemma 26b a4b, or qwen 3.6 35b a3b, but you still have to need system ram since your vram is only 24gb.
YourNightmar31@reddit
With 24gb vram you can run qwen3.6 27b with Q4_K_M or Q5_K_S with like 128k context.
pr0d_@reddit
but 1x12gb and 1x24 isn't exactly the same. hopefully there's a turnkey setup but it isn't always as smooth. (same caveat with 2x24 isn't the same as 1x48 etc etc.
KURD_1_STAN@reddit
Okey like 22gb, still can run it at 16gb for q4 with still a lot of headroom for context, no?
fallingdowndizzyvr@reddit
It really depends on the size of the layers. If you are doing layer splitting, then a layer will have to fit. So if you only have 1GB and a layer is 1.1GB, that 1GB will be wasted. But if the layers, rows or tensors do fit then it's not wasted.
pr0d_@reddit
yes there maybe some overhead, but i'm talking about flags or configuration to make sure it's running as expected across two cards instead of one. it's definitely easier now that what it was, but it's still fiddling and tuning.
fallingdowndizzyvr@reddit
Ah... there's been no "fiddling" for a while now. In fact, llama.cpp just uses both cards by default. You have to tell it not to.
Endurance_Beast@reddit
Will run Qwen3.5 27b q4K_M with ctx of 128k flawlessly at 17t/s.
Thebandroid@reddit
As you’ve probably noticed the best advice on this sub is “be richer, have more money”, I’m currently struggling with the same questions about entry level gpus. I’m thinking I’ll get a 9070 16gb. Maybe another one later if I need.
You can definitely get models that will work on the current 8gb of vram that you have. It’ll be something small like 4-7billion parameters and maybe quantisation of 8
Have a look at appal.com/tools/vram-calculator.
SillyLilBear@reddit
Nothing worth running. If you can fit 27b it’s a good model but will be slow.
suprjami@reddit
I used 1 then 2 then 3 3060 12G cards over the last couple of years. They were good value for the time of Mistral 12B and 24B and early Qwen 2.5 and 3.
Two of them will run 32B Q4 and 24B Q6 at 15 tok/sec with small (<32k) context.
A third card will let you run Qwen 3.6 Unsloth UD-Q6 with large (80k+) context and MTP. 27B at ~20 tok/sec, or 35B at ~90 tok/sec. That is by far the best quality setup you can get for under US$750.
If your goal is reliable agentic coding imo you'd be better buying two large fast cards like 2x 20Gb or 2x 24Gb. Qwen 27B finally pushed me into buying a pair of 3080 20G.
You're buying a power supply now so buy 1200W and you won't ever have to think about it again.
yes2matt@reddit
Hot tip on the power supply. Thx
niado@reddit
12gb is not worth it, you can’t run any strong models. I made that mistake :)
QuchchenEbrithin2day@reddit
2x 3060's are constrained by speed of PCI bus, due to no NV-link option between them, so a single 24GB card would be far better.
DeepWisdomGuy@reddit
Just went to look up the prices. Man! It's gotten unhinged! I was going to look into alternatives, but there weren't any. You have found a reasonable solution. Also, if your interested in diffusion models, know that they don't split well.
MattOnePointO@reddit
Good question.
WishfulAgenda@reddit
Honestly, it’s going to end up getting you a Linux desktop, a new psu, vllm and potentially a really big hole in your wallet as you’ll 100% always want me vram.
Try and plan ahead. I have a dual rig and planning to go to triple and should be able to on a high end consumer rig with minimal problems.
merica420_69@reddit
WishfulAgenda@reddit
Lol 😂
Thepandashirt@reddit
Just get a 3090. If you wanted to scale to 48G its an easier path of just getting a second 3090, rather than going to 4x3060 ti's. And I think 24GB is not really enough for agentic coding- I find the lower quant models people are recommending like Qwen3.6 27B Q4 have serious issues with tool calling compared to larger quants like FP8. So a Q4 quant might run in 24GB but you wont get the performance or context size you need.
fdrch@reddit
2 x 16 Gb is more interesting combination (4060ti, 5060ti). 2 x 12 is not equal to a single card with 24, because usually you can't split without gaps.
suprjami@reddit
You also lose half a gig per card for driver internals. Exact size varies depending on NV driver version. Latest 595 driver is about 400MiB.
So 2x12G is actually ~23G usable. 2x16G is actually ~31G usable.
robspassky@reddit
Awe 6
kiwibonga@reddit
Do note that it's going to be on the slower side.
Ali express has much beefier nvidia v100 with pcie adapter for a similar price (water cooling is recommended for noise though).
Comfortable_Ebb7015@reddit
I have added one rtx3060 to my home Ubuntu server. It runs Qwen3.6 35B q4_K_XL at 40t/s. I am very happy of the results for a 200€ investment!
dero_name@reddit
The best agentic coding model on dual 3060s will be the Qwen 3.6 35B A3B.
Unsloth UD-IQ4_XS will fit with very usable context.
Dense Qwen models (27B) will not be a good experience, not fast enough for agentic work on 3060s with their memory bandwidth, unless you're very patient.
Source: used two, then later three 3060s.
Extension_Canary3717@reddit
What level of agentic things can it perform?
ambient_temp_xeno@reddit
There's not much to it. If you needed to run multiple cards in future it wouldn't take you long to get it running.
khampol@reddit
I ll go for 4070ti super x2 ~32gb. Llama.cpp. Qwen 3.6 27b q6 gguf
Brilliant-Resort-530@reddit
the bandwidth gap is the real catch — two 3060s gives 24GB but each card only has 360GB/s. a 4090 does 1008GB/s. MoE models hurt less bc fewer params are active