Two RTX 6000 Pro Blackwell..what's it get you?

Posted by SteveRD1@reddit | LocalLLaMA | View on Reddit | 23 comments

What would you all do if you had 192Gb VRAM available to you on Blackwell hardware.

Is there anything it would open up that the 3090 stackers can't currently do?

What could it still not do?

Not thinking just LLM, but image/video stuff, anything else at all AI adjacent.

[-]

Exxact_Corporation@reddit

Great question. In video and image workflows, you’ll breeze through 8K+ editing and real-time ray-traced rendering with complex scenes, something that’s often a bottleneck on smaller VRAM cards. This setup also shines in engineering simulations, CAD, and scientific applications like molecular dynamics where memory capacity and compute throughput accelerate huge and complex datasets. While NVIDIA RTX 3090 stacks are great for many tasks, the NVIDIA RTX PRO 6000 Blackwell excels in stability, ECC memory support, and professional-grade drivers that boost reliability and precision on critical workloads.

[-]

SteveRD1@reddit (OP)

Wow, this was unexpected. I actually have a system of this sort on order from you guys!

[-]

Exxact_Corporation@reddit

That's awesome to hear! Thank you so much for choosing Exxact. We truly appreciate your business and your trust in our team. Wishing you the absolute best with your new system, and can't wait to hear about the achievements you'll accomplish.

[-]

opi098514@reddit

It would be faster. And most importantly it wouldn’t use 2400 watt during inference.

[-]

nero10578@reddit

Yea just 1200W until it melts

[-]

Educational_Sun_8813@reddit

there is also version called "Max-Q" and then it's around 300W, bit slower than the "workstation" which is 600W TDP. Both models costs the same.

[-]

opi098514@reddit

I’d just get the 600 watt and under volt.

[-]

No_Afternoon_4260@reddit

600w iirc

[-]

Logical_Divide_3595@reddit

Real-time vision models are cool, you can try it if you are interested in

[-]

SteveRD1@reddit (OP)

Got a suggestion for where to start there?

[-]

Logical_Divide_3595@reddit

These two are both cool and both just released recently, first is better to start.

https://github.com/ngxson/smolvlm-realtime-webcam?tab=readme-ov-file

https://github.com/apple/ml-fastvlm

[-]

SteveRD1@reddit (OP)

Thanks!

[-]

AleksHop@reddit

Qwen3 235B A22B Q4 will fit if use this: https://www.reddit.com/r/LocalLLaMA/comments/1ki7tg7/dont_offload_gguf_layers_offload_tensors_200_gen/

[-]

Expensive-Paint-9490@reddit

I would train a 1.5B LLM from scratch.

[-]

Ravenpest@reddit

It's not enough... I would probably do 70b QLoRA and run R1 at a bigger quant but... yeah. Need more.

[-]

streaky81@reddit

Run more and bigger models is the short answer. That's why GPU manufacturers won't give us it despite the fact in pure chip cost terms it would cost them pennies - enterprise buying the stuff they want them to buy isn't helping the situation either.

[-]

Conscious_Cut_6144@reddit

16x 3090's enters the chat...
No.

Now if you had 8...

[-]

SandboChang@reddit

Qwen3 235B-A22 seems a good fit at Q5.

[-]

FullOf_Bad_Ideas@reddit

You need around 48GB VRAM to do QLoRA FDSP of 70B model at low ctx like 512.

Assuming you can just scale it up, that would mean QLoRA of 280B model would be possible on 2x rtx 6000 Pro Blackwell.

Try to finetune Qwen3 235B A22B. Or run it with tensor parallel/pipeline parallel with vLLM/SGLang and see how many concurrent requests it can serve.

[-]

power97992@reddit

You can it run faster than a 3090 stack.

[-]

JockY@reddit

Sadly the q6 of Qwen3 235B A22B doesn’t quite fit into 192GB VRAM very well, you end up with not much more than 8k context at FP16.

The Q5_K_M fits beautifully with full 32k context at FP16.

Q4_K_XL seems to work just as well and runs significantly faster.

[-]

bick_nyers@reddit

Qwen 3 235B at high context and other models (like Flux) loaded at same time. Also, 14B-16B full finetunes.

[-]

mczarnek@reddit

You could do it a little faster especially as less GPUs communicating = more speed and 5090 faster than 3090 but I'd guess rarely does that matter