Dual RTX Pro 6000 Blackwell Workstation vs Max-Q — open frame build, need to decide in 24 hours
Posted by stainlessblueshield@reddit | LocalLLaMA | View on Reddit | 40 comments
Hoping to get some input from people actually running this class of hardware. I have until Monday to make a call and I’d rather not make the wrong one on cards that cost $9k each.
The decision
I already own one RTX Pro 6000 Blackwell Workstation Edition. A second one is paid for and shipping Monday. The seller told me today he can still swap that order to a Max-Q if I want. I’m planning to add a third very soon either way, possibly a fourth.
Do I stay on the Workstation Edition in an open-air frame, or switch everything to Max-Q?
I can’t stomach losing 6–10% performance on these cards. I know I can power-limit the Workstation to 450W and still beat a 300W Max-Q. But I keep reading that people underestimate what the Workstation cards demand for airflow in a multi-GPU setup. Server Edition is off the table — noise is a different category entirely.
PCIe routing / frame layout
I ordered two riser cables with one-slot brackets. I was originally hoping to lay everything flat on a single horizontal plane but I don’t think that’s realistic with slot spacing on the WRX90E-SAGE SE. Two-shelf vertical layouts look like the standard approach.
Questions:
∙ How are people routing PCIe 5.0 risers for 3–4 cards without signal integrity issues?
∙ Any slots dropping to 4.0 at length, and does it matter for inference workloads?
∙ Specific off-the-shelf frames people are happy with? I can fabricate but don’t have time to, and would rather buy.
Build so far
∙ ASUS WRX90E-SAGE SE
∙ Threadripper PRO 9965WX
∙ 4×64GB DDR5 ECC (Kingston KSM64R52BD4-64HA) — considering adding another 256GB now while this exact SKU is available
∙ SilverStone HELA 2500W PSU — will likely need a second or a 3000W depending on card count
∙ Water-cooled CPU, stack of Noctua fans
Environment
Dedicated basement space. Main concerns: dust, heat, long-term power draw. I’m an electrician so the wiring side is handled.
Use case
Automating my electrical contracting business (QuickBooks, Notion, field ops) and some hobby/potential AI side ventures. Three-year horizon on Blackwell — when Rubin drops and it’s feasible, I plan to upgrade, which should also cut heat load meaningfully. That’s part of why Workstation Edition resale value matters to me now.
Paths I’m weighing
1. All Workstation Editions, 3–4 cards in an open frame
2. Switch Monday’s card to Max-Q, sell my current Workstation, run all Max-Q
3. Keep current Workstation, buy next two as Max-Q
4. Cap at 3 Workstation cards, jump to Rubin at launch
Thanks in advance for any input on any of it!
Quadrapoole@reddit
My setup right now, but I would suggest getting the workstation cards. Testing them then just put them on water.
I got the bryski waterblocks for around 280 cad and it's really not much and that'll take care of the heat problem
Vicar_of_Wibbly@reddit
This is my quad RTX 6000 PRO open frame build: https://blraaz.net Please ask any questions :)
(No ads, no trackers, no cookies, just a vibe-coded photo blog of the rig build).
stainlessblueshield@reddit (OP)
This is pretty unbelievable. I have a ton of questions. First off, I need to look up that motherboard — I don’t know anything about it. 12 RAM slots? Wow. I’m hoping 512GB will be enough for 3–4 GPUs. I love this build. It’s the best-looking and most logical solution I’ve seen to this problem so far, and it actually looks attainable. My first question is how the GPUs are connected back to the motherboard. If switching from my WRX90E-SAGE to your board is what unlocks this layout, it would absolutely be worth the change to me. That board is $1,200, but if it lets me spread the GPUs out like you have without sacrificing performance, that’s money well spent. Am I able to do what you did with my motherboard, or is yours doing something mine can’t? On the PSU — I have the SilverStone HELA 2500W, but the consensus seems to be I should add a second or upgrade. There’s an ASUS 3000W for around $1,200, and a Corsair 3000W for about $600, which is actually $200 less than I paid for the 2500. Curious what you’d go with. Facing the cards all out is brilliant. That’s the one real downfall of the Workstation Edition — dumping all the hot air back into the case — and you’ve basically solved it. Of course. On building the frame myself: as an electrician and hobby metalworker, I’m going back and forth on whether to take it on. I definitely wouldn’t weld anything for this — extruded aluminum is the move, and I’ve got metal benders. But in this season of life, a metal project squeezed in between jobs feels like trying to eat an elephant one bite at a time. I’d rather buy something that works. That’s my first round of questions — frame, motherboard, and how the cards physically connect back. That piece changes everything for me. Next round, assuming I can get my head around the above: What would you do in my position if I plan to upgrade when Rubin is out? I’d rather have two Rubins than four Blackwells. Or one Rubin and two Blackwells. But I’d love to hear what you’re actually running day to day. Is it fast? Are you running them in parallel? Do you power-limit? I can probably swing three cards, but I know four is the sweet spot for a lot of workloads. If keeping the 4 Blackwells makes sense I’ll just do that. Do you ever regret not going Max-Q or Server Edition? Is the electric bill a monster? And is your motherboard the clear winner for this kind of build, or are there other boards worth looking at?
Vicar_of_Wibbly@reddit
Your WRX90E-SAGE can do it, no problem. The trick is PCIe -> MCIO -> PCIe.
The Superflower Leadex 2800W 240V works great and has been working for many months without a hitch.
Frame is just 400mm lengths of 4040 extrusion. I tapped the ends for M8 using a hand drill, a pack of taps, a handful of rags, and some WD40.
The corner pieces are what holds it all together. They're actually just something I downloaded off printables.com or thingiverse.com and printed on my Prusa 3D printer, then fixed it all together with M8 bolts through plastic pieces into the tapped holes in the aluminum. Dead easy.
The internal supports, brackets, etc. are all parts I either custom-designed in TinkerCAD or tweaked based on public domain designs, then 3D printed.
Buy Blackwell now, worry about Rubin later.
It's not bad. I run either Qwen3.5 397B A17B NVFP4 or MiniMax-M2.7 FP8 in vLLM. Single sequence streaming runs around 130 t/s with Qwen, around 115 t/s with Minimax. The real speed comes with batched concurrecy and Qwen will run 30x vs MiniMax's 3x in the 384GB of quad RTX 6000 PROs, so for large teams of people or for working on problems that parallelize well then Qwen is generally more suitable and runs into several hundred tokens/sec.
Yes! Tensor parallel (
-tp 4) in vLLM.Nope, 600W each GPU. Honestly... it really doesn't matter unless training.
3x 6000 PRO (288GB) is kinda pointless for vLLM because you only get tensor parallel for 2, 4, 8, or 16 GPUs. And I'm not sure what 288GB would bring that 192GB would not. It's not like you can run significantly better models until you hit 384GB of VRAM (e.g. MiniMax in FP8 or Qwen3.5 397B in NVFP4 or GLM-4.7-FP8). If you're doing multiple 6000 PROs then realistically you're doing 2 or 4.
4x 6000 PRO (384GB) is where things get really great. You get a bunch of big models, they run fast, they run with decent quants, they run with full context lengths, multiple concurrency, no CPU offloading, and they're fast because of tensor parallel. This is where it starts to feel like cloud-in-a-box.
I thank all the local deities that I went 4x Workstation. Server is just a pain in the ass to cool, that's a non-starter. Max-Q is fine, but power hobbled and with my open outward-blowing frame design I can easily deal with heat, so why bother with Max-Qs?
If one day I decide to add more capacity then I'll just stack another aluminum cube of PSU + PCIe 100-lane switch + 4 GPUs underneath the current cube. Borg-on-borg, if you will.
My wife sent me a passive-aggressive link to login to our electrical provider, but I ignored it.
And is your motherboard the clear winner for this kind of build, or are there other boards worth looking at?
I do not know. It was the clear winner for my build where for the workloads I planned to run I wanted: PCIe 5.0, as many channels of DDR5 6400 as I could get on a single CPU (12 on EPYC), 128+ PCIe lanes, and a CPU with 128 cores.
Your SAGE looks pretty sweet, too.
Such_Advantage_6949@reddit
But mcio is 8x meaning u sacrify half the bandwidth?
Vicar_of_Wibbly@reddit
That’s why we use 2 of them.
Such_Advantage_6949@reddit
U can combine back the mcio to one x16? Can share me the link how the adaptor look like
Vicar_of_Wibbly@reddit
They’re literally in my parent comment above.
Such_Advantage_6949@reddit
I did check your comment before comment actually. Issue is the mainboard to pcie will occupy the whole pcie slot from the mainboard, and it gives out one mcio port out only. There is no pcie to 2x mcio x8 slot on the link u shared
Vicar_of_Wibbly@reddit
This is the good stuff: https://c-payne.com/products/mcio-pcie-gen5-host-adapter-x16-retimer
There are Chinese versions for $60 in the usual places.
Such_Advantage_6949@reddit
Thank u very much
Vicar_of_Wibbly@reddit
Close-up photos in my photo blog: https://blraaz.net
Vicar_of_Wibbly@reddit
And recombining: https://c-payne.com/products/mcio-pcie-gen5-device-adapter-x8-x16
segmond@reddit
Can you please share performance numbers?
Q4 (Qwen3.5-397b, KimiK2.5, GLM5.1, DeepSeekV3.2),
MiniMax2.7-Q8
Zealousideal-Mall818@reddit
for 300w and 5_10% less performance i would go max q all day , plus stacking more than 2 the blower system helps , you can keep the ws as the main horse for single gpu heavy tasks
training or llm chats will run all gpus at the worst gpu speed . so adding more maxq could be dynamic and power will not require a multi psu or a 3000w psu without blowing something
600w x 4 gpu power 400w vs 600+ 900 ..1500w ... for 10% less without worrying about burning cables... take that any day , and way less heat
stainlessblueshield@reddit (OP)
Dude! Thank you for responding! I am super grateful. There are two guys talking about the vertical position being dangerous and unintended for the workstation. Do you think they have a point? I haven’t researched yet but they seem pretty confident.
On one hand 3 6000s is a stepping stone to 4 ,3rd one can do smaller models. Then get 4th I suppose. I keep thinking two 6000s and a 5000 with higher vram but then I just come back to 3 6000s.
I’m wondering if using risers and creating an elevated plane with the riser cables shoot up and over to make lots of room In between the GPUs would be effective and doable enough to not use the alternative cables but I really like the flexibility of them.
Annual_Award1260@reddit
Pretty sure you need to run in powers of 2?
stainlessblueshield@reddit (OP)
Yes you do. Odd numbers are not good for parallelism. I got to three like this: I want two plus. Nice beefy 3rd but it doesn’t have to be a 6000. I have a 3090 that I can sell for a nice little chunk of next gpu. So than I think a 5000. There are two 5000s and I think may as well get the one with 72 (I think) gigs of vram. Then I think. For more money I could get to 4 Blackwell which is where my home systems reaches a full circle. Couple thousand more I have three Blackwells. Then I’m one stop away from the station and I get off the train. At least for this life season. Once I hit that I am committed to learning models software coding Linux and anything else I can invest to make my self in tune with what’s happening. I focus too much on hardware but tools are my hobbie. The best tools. I see these products as tools. And, you can create the most revenue with the best tools. That is an equation I am very familiar with. I am relentless when I set a goal. I’m also a person of faith and I think that is actually where my strength comes from. But I always get there. It’s often quite a journey those are the journeys that make us who we are. They make us humble. I get humbled a lot. If I lack something somewhere I compensate-balance out-add-subtract-wiggle -shake-duck-go through hoops-what ever it is. It’s what we are here to do.
I went on a rant. It happens. Thanks for responding to the post!
Annual_Award1260@reddit
With the cost of ddr5 ram these days it is hard set up a system with enough pci 5 lanes to properly support these cards. I'm not sure what you could learn on a high end system like this that you couldn't learn on a dgx spark.
Annual_Award1260@reddit
For open frame I think you should go for the regular workstation. You can always limit the power down if need be. I run 2x max-q because venting outside the case deals with my thermal issues. The regular ones actually have a larger heatsink as well.
Max-q is really just for stacking up the cards in a small space.
stainlessblueshield@reddit (OP)
Thank you for that. That is good news.
reto-wyss@reddit
stainlessblueshield@reddit (OP)
Hey thanks for the reply, Can you tell me how it’s going with the two? Are you one model on one and smaller models on the other? Are you parrelling? Which models are you using and likening with the two. Come Wednesday I’ll have two 6000s and a 3090. I’ll probably switch out 3090 for anther Blackwell.
Then when Rubin comes out possibly trade to 6000blacks in for chunk of the Rubin workstation cost. But honestly I don’t know how that going to unfold. For all we know demand is so high that we won’t be able to get a Rubin workstation for 4 years. Or maybe in 3 years could will be so inexpensive it doesn’t make sense to have home rig(although I’m betting all my chips that isn’t the case. I don’t like needing cloud. I understand I have to use for highest logic tasks but I would like to keep as much in house as possible and use openclaw Hermes ragg and lora and fine tuning to continually evolve my systems and retain my databases and model personalities and plug in newer and newer models as they evolve and become available.
reto-wyss@reddit
I've mostly ran Qwen/Qwen3.5-122b-a10b-fp8, it's good at everything, it's fast for interactive coding, but there's still enough VRAM for highly parallel workloads. Other good options are nvidia/MiniMax-2.5-NVFP4 for coding, and Gemma-31b (both BF16 and the nvfp4 (avg 8-bit) checkpoint are good).
I use vllm with tensor-parallel, sglang is an option, don't use llama.cpp, it's at least an oreder of magnitude slower at high concurrency.
I run vllm-omni on the 5090 for image generation and editing. I've had another 3x 3090 on another machine for image gen, but they are so expensive now, made sense to sell those. R9700 and RTX 5090 are better value for that.
stainlessblueshield@reddit (OP)
That makes sense. That matches with what I have found. I think the q5 with llama ccp was producing less than fantastic results. I really appreciate your feedback.
What are the temps usually looking like when both are at 600 watts?
stainlessblueshield@reddit (OP)
Also does your ram get hot? I had the mucus fan for cpu but it just jammed up the air flow so I got a water cooled.
Ok-Measurement-1575@reddit
You've already set the scene with the first one?
I would have gone 300w if it's for the home.
stainlessblueshield@reddit (OP)
I go back and forth. 10%? That’s a nice chunk for paying this much for exceptional performance. I’ll always be wondering- would this have been faster with ws editions? You can set the workstation to 400w and still get considerably more performance than max q. The silicone is more efficient but that efficiency doesn’t become optional until you reduce wattage by half. At that point the workstation would perform 5-10% less than max q. But, if you want to maintain your options for high perforce when you need, 100 watts more and you get about 10% more than the max q. These are the calculations I have received from multiple Frontier models. - maybe they are wrong. I don’t know.
Ok-Measurement-1575@reddit
Whatever works for you.
You're gonna be pcie constrained regardless so you may have all the noise and nowhere near the power draw you were expecting anyway.
stainlessblueshield@reddit (OP)
Could you clarify? I’m not sure exactly what you are saying.
Ok-Measurement-1575@reddit
4 cards over pcie with tensor parallel are probably not going to see full power utilisation. Compute throughput is limited by all reduce operation is my understanding.
My 450W cards do 450W @ TP1. They only achieve around 220W @ TP4.
Such_Advantage_6949@reddit
U can mix and match. For my second rtx 6000 i might go with max q because max q allow me to go with liquid cooling single slot. Liquid cooling save the hassle with raiser for em
kiwimonk@reddit
Killer setup. I have no wisdom to add... Just wish I could build something this wild! Whoever mentioned water-cooling the graphics cards in the other thread seemed to be on to something. I would definitely opt for cards that can stretch their legs in optimal conditions. You can always make tweaks to get more performance if you hit a wall.
bluelobsterai@reddit
What harness are you using to automate?
stainlessblueshield@reddit (OP)
Currently using a mix and seeeing what works best for me. Currently as of yesterday -qwen 3.6 -Paperclip -VLLM -With and without openclaw in some parts of the stack
Before that I was using nemotron and before that Gemma 4.0
Once my second Blackwell arrives I may go to minimax with vllm tensor parallel.
My friend is helping me and I am learning from him but I am balls to the wall determined to learn and be fluent. I’m on my way and learning more each week.
In my MacBook using Hermes installed local and sshing into home server with qwen 3. And vllm bit going to use a shared wikki llm and I would love to have a merged data and memories from 3 different MacBooks.
Miserable-Dare5090@reddit
have you tried Qwen397?
stainlessblueshield@reddit (OP)
I was using qwen 3.5 120b q5 for my main local engine with openclaw and it was very unreliable. Sadly it would respond with some interesting ideas and then just stall out. A for effort but the talent and capacity weren’t there. -Now that could have been some flaw in my system. His knows there are 1000 ways to mess up optional performance but my friend who is helping ran extensive tests and found it just was there.
397 has been more than I could handle with the one Blackwell. As of right now I still have one Blackwell and one 3090. This Wednesday I’ll get second Blackwell. Then the third blackwell 2 weeks to 4 months after.
Material-Link9151@reddit
Have you considered custom liquid cooling for gpu, for 3,4 gpus?
stainlessblueshield@reddit (OP)
I have but I’m not thrilled about taking them apart and installing the whole system. And if I want to sell to get Rubin that may complicate things. I a professional tech but I also pretty conservative with risk when tens of thousands of dollars are at stake. I will do that if it comes to it. If Rubin comes out and people I want to sell Blackwells in 1.5-3 years I think it will be easier with whole workstations.
What do you think?
Rerouter_@reddit
Go with the proper pro, you can always add cooling.