Titan RTX vs 3090?

Posted by AssociationAdept4052@reddit | LocalLLaMA | View on Reddit | 15 comments

I was able to find Titan RTX cards for \~ $460 USD equivalent in Chinese markets and 3090s for $640 USD equivalent. Is it better value to go with Titan RTX cards for LLM inference mainly?

Thanks in advance!

[-]

a_beautiful_rhind@reddit

2080ti 22g is cheaper there, no? Also got Mi50s. Otherwise just buy 3090 or 4090-48g. Flash attention and ampere+ kernels are a thing.

[-]

AssociationAdept4052@reddit (OP)

what about 20g 3080s? they go for $320 here as well

[-]

a_beautiful_rhind@reddit

Yea, that's worth a try. Ram bandwidth is slower iirc, but at least it's ampere. I thought there was some 3080ti with even more but never seen it for sale.

You will probably miss the extra 2g and 4g memories but can always just add more.

[-]

No-Comfortable-2284@reddit

bandwidth is slower u get less ram and also 3080 doesnt support nvlink. if ur running these cards in tensor parallel u will lose inference speed due to pcie bandwidth limit.

[-]

No-Comfortable-2284@reddit

yea it works but p2p without nvlink is maxed at pcie bandwidth and latency due to going through cpu. its even worse if the server is numa

[-]

a_beautiful_rhind@reddit

For inference that's not bad. Plus unlike nvlink pcie can do all to all.

[-]

No-Comfortable-2284@reddit

ill give u a real answer since most people here just like saying 3090 as it is newer with more vram.

Firstly, it depends on your use case and environment. if you are putting this card in a server rack or having more than 2 in a workstation, the titan rtx will give u better vram per space efficiency as it is only 2 slots while 2 slot 3090s are very rare and expensive for that reason. you can stack 4 titan rtx in a normal workstation case.

If your use case is for single user inference then the titan rtx will save you money and be fast enough for any model that fits in 24gb. for single user inference, you can always use 8 bit 6 bit 4 bit quantized models to save on vram space and increase token speed so spending extra on a 3090 doesnt make sense.

If your use case is multiple user deployment via vllm etc, then there is an advantage in 3090 as it supports bf16 quantization (for vllm u need native hardware support to quantize model weights and kvcache). but honestly the benefits of fp16 to bf16 might not be worth the space inefficiency of 3090 cospared to titan rtx (or the extreme premium u will pay for a 2 slot 3090) depending on your setup. a 4090 will be worth considering more than the 3090 as atleast ada cards support fp8 quantization on model weight and kvcache which halves the weight and kvcache size.

honestly titan rtx costs that much still for a reason. for single user inference, it is pretty good value at 2 slots, 24gb vram and a clean look. only advantage on a 3090 is faster inference (which kinda irrelevant on models you will be running 4 bit 8 bit quantized on 24gb vram) and native bf16 support on vllm (which aint worth the premium ull be paying).

[-]

martinkou@reddit

No - Titan RTX is one gen older, and the memory bandwidth is nowhere near the 3090.

[-]

AssociationAdept4052@reddit (OP)

I see.. but for performance I don't see much of a difference, at least for fp16, and its 40% cheaper

[-]

Nepherpitu@reddit

But LLMs are limited by memory bandwidth, not fp16 performance

[-]

AssociationAdept4052@reddit (OP)

true, for that i have v100s too, for hbm2 memory. Im just wondering what to try since my rig is all over the place right now, with random gpus xD I want to sell a 5060ti and 4070ti super and replace them with something, currently i have a v100 16g and 32g sxm2 in nvlink config.

[-]

No_Efficiency_1144@reddit

Not the titan its older

[-]

AssociationAdept4052@reddit (OP)

right but also cheaper

[-]

No_Efficiency_1144@reddit

Older than 3090 gets super dicey. You need good drivers