2x RTX 6000 build during an extended bench test

[-]

mzzmuaa@reddit

i have a rtx 6000 + 5090 on a x870e extreme with 9950x3d and 96gb cl26 ddr5 at 6000mhz in a frame 5000d rs. I plan to get another rtx 6000 tomorrow.

when I run just these two in my mancave, it gets 15F hotter than the other rooms in the house despite blasting the AC. I open the window and bugs/lantern flies threaten my 30k setup lol

[-]

Borkato@reddit

Omg I could never. Thank god for my 3090s

[-]

Signal_Ad657@reddit (OP)

Haha it would only take 4x 3090’s to get into the same ballpark be careful

[-]

Borkato@reddit

Wait really? 😬 I have one and it’s set to max 250, so with 4 I would be at 1000W! Jesus…

[-]

Signal_Ad657@reddit (OP)

Yeah 1600w is ~5,500 BTU/h. You’d need about half a ton of AC to break even with it. It’s a legit thermal load. It’s like a big space heater.

[-]

ironmatrox@reddit

Had to install separate AC for my home office for this cost is adding up fast. Lol

[-]

AlwaysLateToThaParty@reddit

Great data thankyou. I have an rtx 6000 pro. It took me a while before I became used to it hovering at 90+C.

[-]

Signal_Ad657@reddit (OP)

Yeah at that temp it’s still not at full fans on the GPU it’s just a different class of hardware. I think fans on the GPUs hit about 75% as you land on 90c.

[-]

I'm a bit curious about the airflow. Did you orient the CPU fan like that to try to pull more air across the GPUs? Are the intake and exhaust fans all running at the same speed? Did you measure the difference in GPU temps between the present CPU fan orientation and with the CPU fan oriented to exhaust toward the rear?

[-]

Signal_Ad657@reddit (OP)

Yes, yes, and yes. I wanted to align it so there’s one clean path and direction up and out for CPU and GPU heat. All fans with this configuration in the heat stack all point in the same direction ending in the top exhaust so they act supportively. Every fan in the case is the same exact model and runs at the same exact speed other than the CPU cooler fan that is 140mm and runs off of the CPU fan header. Yes. I got better overall system results with the orientation I settled on.

[-]

Kaljuuntuva_Teppo@reddit

Only Q4 quant on two RTX 6000 Blackwell cards? 😭

[-]

Signal_Ad657@reddit (OP)

Haha you don’t want to know how slow Qwen3.6-27B is fully un quantized on an RTX PRO 6000 😂

[-]

Thrumpwart@reddit

You should really be running the FP8 versions. Your GPU's have hardware acceleration for FP8 models - the Q4 quants you are running may actually be slower.

[-]

BillDStrong@reddit

Blackwell has fp4 acceleration, so he should be running nvfp4, the nvidia designed format for this GPU.

[-]

Thrumpwart@reddit

I know SM100 does (B200 etc.) but I thought RTX Pro 6000 (SM120) lacked support for NVFP4 to date?

[-]

Juulk9087@reddit

Just training lacks nvfp4. Inference works fine

[-]

Thrumpwart@reddit

Thank you, this is probably what I was referring to.

[-]

YouKilledApollo@reddit

It does not, nvfp4 is the best quant you can run on it, out of all I've tested

[-]

BillDStrong@reddit

I am basing it off of this. https://github.com/voipmonitor/rtx6kpro/blob/master/optimization/nvfp4-quantization.md

It claims SM120 has it, and recommends it.

I will defer to those that have been testing and working with them for months.

From the RTX6000 Wiki. https://github.com/voipmonitor/rtx6kpro

[-]

ormandj@reddit

How slow is it?

[-]

Signal_Ad657@reddit (OP)

~25 tokens per second fully un quantized

[-]

ddog661@reddit

Do you have a ballpark for how slow it is? I am curious because I will probably be running gemma4:31b dense at full precision on a similar system with 5-10 users.

[-]

Signal_Ad657@reddit (OP)

Like 25 tokens per second fully un quantized

[-]

Juulk9087@reddit

My mind it's 30 TPS without speculative decoding. With speculative decoding it's 60 to 70.

[-]

SI-LACP@reddit

Holyyyy Can I ask how much you paid for those 6000s?

[-]

Signal_Ad657@reddit (OP)

The pair was probably around 17k

[-]

alpacaMyToothbrush@reddit

Ok, so I gotta ask, are you using these professionally for something? Cause ~ 20k is a lot of money to spend on compute for a hobby. Not that there's anything wrong with it but ...damn...

[-]

YouKilledApollo@reddit

What's expensive or cheap is all relative, hard to say something is expensive or cheap when you know nothing about their life :)

[-]

alpacaMyToothbrush@reddit

I mean, true. Everything is relative, I would just think at that budget you're better off renting compute than owning it

[-]

Maximum-Wishbone5616@reddit

$17k ? Where?

[-]

ShelZuuz@reddit

Probably more like When.

[-]

xienze@reddit

Nah, you can get them for $9K all day long at Central Computers, and if you're not in California, no sales tax...

So $18K, but right there in the ballpark.

[-]

SirDaveWolf@reddit

I got one from Proshop for 7800€ like 3-4 months ago

[-]

Signal_Ad657@reddit (OP)

eBay like 7 months ago.

[-]

arkuw@reddit

I believe your computer is worth more than my two cars. Combined.

[-]

SI-LACP@reddit

Nice!! Awesome setup!!

[-]

Signal_Ad657@reddit (OP)

Thanks! It’s been really fun building it.

[-]

edgedepth@reddit

What are you using this setup for specifically, and how close do you get to Opus performance?

[-]

Signal_Ad657@reddit (OP)

Qwen3.5-397B is pretty awesome, multi instance smaller models like Qwen3.6-27B or 35B would likely be cool too with lots of bandwidth for parallel tasks.

[-]

Pyrenaeda@reddit

Very nice build, man. Very nice.

Enjoy it. Use it hard. Make it earn its paycheck.

[-]

Signal_Ad657@reddit (OP)

Will do! Thank you!

[-]

try_repeat_succeed@reddit

So whatcha running on there? xD

[-]

Signal_Ad657@reddit (OP)

The first big thing I ran was Unsloths 3-bit Qwen3.5-397B. Ran at ~71 tokens per second pretty sweet. Still lots of room to optimize I want to see how fast I can get big models to run.

[-]

Blindax@reddit

Close to 85C seem pretty high for the GPU in particular if you are only doing an inference test. Have you tried to see what your temp are with an OCCP full 100% test? Maybe your still have a margin for optimization. You said the aio was restraining airflow but then maybe consider an other case because it seems the CPU is blowing too much in the case. There is also other cases that may yield you much better temp. I use the Silverstone Alta d1 with 2 GPU (3090/5090) and a 9800x3d (I know much less tdp) and GPU don’t go over 70c in stress test (with air cooling). On that case you can mount the aio side way to that the 180mm front fans are unrestricted.

Otherwise great config. Wish you enjoy it a lot :)

[-]

Signal_Ad657@reddit (OP)

This was both GPUs cranking and a fully loaded CPU for 30 minutes straight

[-]

Objective-Picture-72@reddit

I was thinking of building a similar rig. How loud is it when it runs? I am concerned about having something that sounds like a jet engine in my office.

[-]

Signal_Ad657@reddit (OP)

Crazy quiet.

[-]

Wildnimal@reddit

I wish to own 2 x RTX6000 Pro someday. What is the rest of the specs?

[-]

Signal_Ad657@reddit (OP)

Build List:

•   CPU: AMD Threadripper PRO 7965WX

•   Motherboard: ASUS Pro WS WRX90E-SAGE SE (WRX90, EEB, 128 PCIe 5.0 lanes, dual 10GbE, IPMI)

•   RAM: 256GB DDR5-4800 ECC RDIMM — 8× Samsung M321R4GA3BB6-CQK

Compute

•   2× NVIDIA RTX PRO 6000 Blackwell (96GB GDDR7 ECC each)

•   192GB total VRAM — x16/x16 PCIe 5.0

Case

•   Corsair 9000D RGB Airflow (SSI-EEB, no fans included)

Power

•   PSU:  MSI MEG Ai1600T PCIE5 — 1600W 80+ Titanium — dedicated to GPUs

•   Dedicated 20A 120V circuit

Cooling

•   CPU: Noctua NH-U14S TR5-SP6

•   Front intake: 8x Noctua NF-A12x25 G2 PWM

•   Top exhaust: 4x Noctua NF-A12x25 G2 PWM

•   Rear exhaust: 2× Noctua NF-A12x25 G2 PWM

Storage

•   Samsung 9100 PRO 8TB w/heatsink — PCIe 5.0 x4, 14,800 MB/s (OS, models, stack)

•   2TB SSD (scratch — Qdrant, datasets, embeddings)

Networking

•   Dual 10GbE onboard (Intel X710, connects to 10Gb switch)

[-]

ironmatrox@reddit

We have very similar build - just different brands and speed on the RAMs and CPU. Also opted to run a a Hela pulling from 250V circuit that I had to ask an electrician to install - and almost added a connectx 7 NIC to get 200Gb of my network spine.

Is this setup directly pulling from the wall or are you planning to get a ups for graceful shutdown?

[-]

Signal_Ad657@reddit (OP)

Currently hooked to a Trip-Lite and pulling from the wall. I have 2x 1500w UPS but they are now too small for it unless I heavily cap GPUs. Might be what I do for now.

[-]

Due_Duck_8472@reddit

How can I build that on a 1000$ budget?

[-]

Eupolemos@reddit

It's 10:50 AM here and I've already reached peak internet for today <3

[-]

Practical-Concept231@reddit

What is your CPU ? What is your specifications for your computer?

[-]

ThePixelHunter@reddit

You're pulling 1650W from a 1600W power supply? :O

[-]

Signal_Ad657@reddit (OP)

Haha fun fact! 1600w at the wall on a 94% efficient PSU is really like 1,500w delivered at the machine (which is what the PSU is actually rated for). So a maxed out 94% efficient 1600w PSU when fully supplying 1600w is like ~1700w at the wall. Learn something new every day!

[-]

IrisColt@reddit

B-but a PSU rated 1600 W is rated for 1600 W output, not wall draw... PSU rating usually means output, not input, r-right?

[-]

llitz@reddit

I think a lot of people are not aware of things that used to be common knowledge some time ago.

Short and to the point, upvoted.

[-]

r0cketio@reddit

If your system is using 1650W and it's being delivered at 94% efficiency through the PSU.. Doesn't that mean you're actually pulling 1755W from the wall?

[-]

r0cketio@reddit

Ah didn't see the part of the video showing 1650W AT the wall. Your math checks out then.

[-]

Signal_Ad657@reddit (OP)

Opposite it means you draw more at the wall than you actually supply at the machine.

[-]

Fit-Statistician8636@reddit

So many strange things here 😀:

27B-UD-Q4_K_XL: Why? You can run BF16 on one GPU only, or FP8 if you care about parallelism with large context. Try vLLM, or even better, voipmonitor’s SGLang Docker image with b12x kernels.
1600W: How is that even possible? I run AMD EPYC 9355 (dual socket) with two RTX PRO 6000’s (and one RTX PRO 2000) and haven’t seen such large power draw ever, irrespective of model I tested.

Take this not as a criticism, but as a question… or a tip of what to try next. And don’t fry your beautiful new toy! 👍

[-]

Signal_Ad657@reddit (OP)

The model is just for testing the GPUs in this case. The 1600W at the wall is totally doable. 1100w from GPUs, 350w from CPU, another 100 between board and fans. That would be 1550w supplied at the PSU and on a 94% efficient PSU that’s ~1650w at the wall.

[-]

Fit-Statistician8636@reddit

Yeah, I understand - theoretically. I bought 2200W PSU and people told me it will not be enough. Yet I haven’t seen it going above 1600W ever - it is always either GPUs working, or CPU, or something else - never everything together 😀.

Now there are even 3000W+ consumer PSUs available, not much more expensive than Seasonic’s “low-power” 1600/2200, I considered them when my first Seasonic died - but the laziness won eventually. Could not imagine reconnecting everything again.

https://pcpartpicker.com/b/p8JMnQ

[-]

MelodicRecognition7@reddit

usually GPUs are bottlenecked by something else and they do not run at their full 600 Watts, likely around 200 Watts instead. Check with nvidia-smi during the inference or pic/vid/whatever generation.

[-]

AD7GD@reddit

What's the practical perf gain going to 535W/ea vs the other 300W model?

[-]

MelodicRecognition7@reddit

double prompt processing speed

[-]

JC1DA@reddit

did you try to power limit the gpu?

[-]

Signal_Ad657@reddit (OP)

Right now I can run both GPUs at full 600w and it handles the thermals fine with no issues. I have to cap them if I’m going to run the CPU full out just due to the limits of the 1600w PSU. So far I don’t notice much difference when I run capped vs uncapped in token generation etc.

[-]

BillDStrong@reddit

There is a reason Nvidia also released the MaxQ version, which is limited to 300 watts. Going down to 300w does less tear on the cards, and you lose something like 10% or your performance.

So, the little you are reducing should not see any at all. You could safely go down to 500w and you might still be at 96-97% depending on your workload.

[-]

MelodicRecognition7@reddit

*like 10% of TG and like 50% of PP

[-]

MelodicRecognition7@reddit

there is no difference in token generation speed past about 400W, I run mine at 330. Check this https://old.reddit.com/r/LocalLLaMA/comments/1nkycpq/gpu_power_limiting_measurements_update/

[-]

iamapizza@reddit

Nice. 💢

Congrats. 😠

Happy for you. 😡

[-]

WyattTheSkid@reddit

Damn and I thought my 4x 3090 build was cool

[-]

Signal_Ad657@reddit (OP)

It’s cooler IMO. 3090 builds are punk rock.

[-]

ydnar@reddit

in an ai era, local llms are about as punk as it gets. we are essentially the spiritual descendants of the cypherpunks. we still need someone like a satoshi. a person or people who write the thing that makes the philosophy real. that code hasn't been written yet.

[-]

WyattTheSkid@reddit

Thanks :)

[-]

Maximum-Wishbone5616@reddit

Buy Seasonic 2200W.

What kind of app are you using for that dashboard?

[-]

Signal_Ad657@reddit (OP)

Needs 240v for it to supply that wattage I’ve looked into it. Dashboard is Dream Server:

https://github.com/Light-Heart-Labs/DreamServer

[-]

kosnarf@reddit

Thx for sharing the link!

[-]

somerandomperson313@reddit

What temps are the gpu's at when being maxed out like this?

[-]

Signal_Ad657@reddit (OP)

81c and 89c

[-]

somerandomperson313@reddit

Thank's. That case has insane airflow. Nice setup for sure.

[-]

MuzafferMahi@reddit

Most GPU starved r/LocaLLaMA fanboy

[-]

Borkato@reddit

If he’s GPU starved the rest of us are GPU dead from malnutrition

[-]

WithoutReason1729@reddit

Your post is getting popular and we just featured it on our Discord! Come check it out!

You've also been given a special flair for your contribution. We appreciate your post!

I am a bot and this action was performed automatically.

[-]

squachek@reddit

Riveting

[-]

Far-Low-4705@reddit

U should really be using vLLM and nvfp4 quant.

U can get much faster speeds. There was a post on here that got 80t/s on a single 5090, and u have rtx 6000’s

[-]

Signal_Ad657@reddit (OP)

Yeah I’ll definitely make that swap thank you.

[-]

wektor420@reddit

https://github.com/voipmonitor/rtx6kpro

A lot of usefull info

[-]

juaps@reddit

Holy mother of baby Jesus

[-]

fragment_me@reddit

WOULD!!!

[-]

HopePupal@reddit

just a note: looks like you've got the RTX PRO 6000 (Blackwell)

Nvidia naming is garbo and they had two previous cards called the RTX 6000, specifically the RTX 6000 Ada Generation (Ada) and the Quadro RTX 6000 (Turing). it's gonna be real fun when the GeForce RTX 6xxx (Rubin) cards hit the market someday

[-]

3dom@reddit

The pair was probably around 17k

Eh, reading posts like this made me happy about buying a $20/month Codex subscription fee.

[-]

Signal_Ad657@reddit (OP)

Enjoy it in good health.

[-]

JockY@reddit

Sweet!

Going AIO changed everything for my CPU temps, which was a cause of throttling (DRAM temps being another). Now it idles at ~ 40C and only ever reaches 60s even when running inference. This is EPYC Turin.

I used a Silverstone AIO, it’s been great. Highly recommended.

[-]

Signal_Ad657@reddit (OP)

This build originally started with an AIO liquid cooler for the CPU on the intake. I actually switched it to this air cooled setup after testing. Worth noting this also comes down to priorities. A front mounted AIO on this build (at least the 360mm T5 Silverstone I used) reduces intake airflow by about 1/3 on the 3 mounted radiator fans for the resistance and it actually costs you another 120mm fan on intake because the radiator is not a perfect 360mm it’s bigger with the end bells on it. So you wind up effectively losing 2x intake fans and roughly 125 CFM and drop intake airflow by 25% to prioritize the CPU thermals on what is primarily an inference serving machine. I decided prioritizing the GPUs with increased fresh air intake was my bigger focus on this setup. So far, CPU never thermally throttles on anything I throw at it. Even when pulling 1600w at the wall with both GPUs maxed out.

[-]

JockY@reddit

Oh that’s rad with the heatsink acting as a trip!

My AIO is top mounted in a 4x GPU rig so more room to deal with airflow.

Happy to help with any questions, I’ve been running vllm on multi-rtx 6k a while now.

[-]

Signal_Ad657@reddit (OP)

Nice! And yeah I thought it was cool / clever using the CPU HX as a way for the board to see the heat of the GPUs and drive the fans without a dedicated hub and sensor setup. Thank you!!

[-]

iamn0@reddit

full spec of this rig?

[-]

Signal_Ad657@reddit (OP)

Build List:

•   CPU: AMD Threadripper PRO 7965WX
•   Motherboard: ASUS Pro WS WRX90E-SAGE SE (WRX90, EEB, 128 PCIe 5.0 lanes, dual 10GbE, IPMI)
•   RAM: 256GB DDR5-4800 ECC RDIMM — 8× Samsung M321R4GA3BB6-CQK

Compute • 2× NVIDIA RTX PRO 6000 Blackwell (96GB GDDR7 ECC each) • 192GB total VRAM — x16/x16 PCIe 5.0 Case • Corsair 9000D RGB Airflow (SSI-EEB, no fans included) Power • PSU: MSI MEG Ai1600T PCIE5 — 1600W 80+ Titanium — dedicated to GPUs • Dedicated 20A 120V circuit Cooling • CPU: Noctua NH-U14S TR5-SP6 • Front intake: 8x Noctua NF-A12x25 G2 PWM • Top exhaust: 4x Noctua NF-A12x25 G2 PWM • Rear exhaust: 2× Noctua NF-A12x25 G2 PWM Storage • Samsung 9100 PRO 8TB w/heatsink — PCIe 5.0 x4, 14,800 MB/s (OS, models, stack) • 2TB SSD (scratch — Qdrant, datasets, embeddings) Networking • Dual 10GbE onboard (Intel X710, connects to 10Gb switch)

[-]

arthor@reddit

is this an enthoo? i have the same fan setup but it collects dust like crazy.

[-]

Signal_Ad657@reddit (OP)

Corsair 9000D

[-]

Narrow-Belt-5030@reddit

What will your use case be for this beast of a machine? Curious as well if you can load and run something that equates / approximates to the premium models for coding (like Codex / Claude)

[-]

Signal_Ad657@reddit (OP)

It’s primarily for inference serving either big models across the setup or smaller ones in parallel across GPUs. So far I’ve hosted Qwen3.5-397B at ~71 tokens per second which felt pretty awesome.

[-]

Narrow-Belt-5030@reddit

Nice !

[-]

somesayitssick@reddit

Which dashboard software is that?

[-]

Signal_Ad657@reddit (OP)

Dream Server with Nvidia SMI and some other readings I like to track ported into it.

[-]

jacek2023@reddit

t/s?

[-]

Signal_Ad657@reddit (OP)

I think in this case it was averaging around 68 tokens per second per card using Qwen3.6-27B on each GPU for the test in parallel.

[-]

emprahsFury@reddit

the cost of the fans is more than most people here are willing to spend on vram. (Can't wait for them to chime in rofl)

[-]

Signal_Ad657@reddit (OP)

I talk more about the thermals and setup here: https://x.com/the_only_signal/status/2047738608115679372?s=46

Was just too long of a video to upload.