For those wondering about the power consumption of a dual 3090 rig while inferencing
Posted by sdfgeoff@reddit | LocalLLaMA | View on Reddit | 16 comments
Mine is \~760W measured at the wall by a smart plug.
Idle is 90Wish.
I haven't tweaked the power limit of the cards or done anything fancy.
Sunija_Dev@reddit
I get to 500W with 2x3090+3060 during inference, limiting the 3090s to 220W. It doesn't decrease speed, though I'm running without tensor parallelism because the second 3090 hangs on a 1x pcie. 😅 Roughly 110w in idle.
TacGibs@reddit
At 220W you're losing some performances in a pretty mesurable way, but the real killer is the 1x PCIe.
A PCIe 4.0 4x is enough for TP, don't you have a free M2 slot ? You can use an M2 to Oculink adapter, that's what I'm using and it's working flawlessly :)
An_Original_ID@reddit
Is pcie 4x 4 really enough? I have 2x 3090 and tried llamacpp TP and vLLM and got worse or similar results to pipeline. Operator error?
TacGibs@reddit
There's no TP for llamacpp, only tensor split. The closest thing available to TP with GGUF is graph mode on ikllamacpp.
And with PCIe 1x TP isn't beneficial at all because the card on the fastest PCIe port is constantly waiting for the one on the PCIe 1x.
TacGibs@reddit
Power limit your 3090.
Got 4 of them, and around 260W is the sweet spot (losing around 5% of performances compared to full power, which is 420W for my MSI Suprim X).
imp2@reddit
+1 to that. If you're on Linux, you can use LACT, they recently added actual VF curve editing (instead of the previous weird workaround to undervolt): https://github.com/ilya-zlobintsev/LACT/pull/957
I run my 2x3090s at 275W each. There will be a really minor hit to PP speed, but TG will be pretty much the same.
TacGibs@reddit
Do you know at tool that's working in pure CLI ? I'm on Ubuntu Server 24.04 so no UX for me :)
imp2@reddit
LACT's GUI is basically an editor to its YAML setting. You can edit that manually and run the daemon headlessly: https://github.com/ilya-zlobintsev/LACT/blob/master/docs/CONFIG.md
m31317015@reddit
This. Without power limit my dual SUPRIM X ran at 420W each, limit to 260-280W not only saves efficiency, but also avoid overheating and prevents cards from dropping buses.
TacGibs@reddit
I know, got the highest OctaneBench 2025.2 score for 4 RTX 3090 (2858.84), my PC was using around 2kW at this moment and the temperature in my main room was rising pretty fast 😂
At 260W those 4 Suprim X are dead silent 😎
TacGibs@reddit
A very informative page about 3090's power limits vs performances :
https://benchmarks.andromeda.computer/videos/3090-power-limit
cleversmoke@reddit
I am running on 245W with my RTX 3090 while waiting for new thermal pads and repaste. No meaningful drop in tok/s either!
Anbeeld@reddit
Undervolt them, too.
MelodicRecognition7@reddit
https://files.catbox.moe/gez0rd.png
this works for almost all cards except artificially power-limited ones like "RTX Pro 6000 Max-Q". PP rises all the way to max TDP, TG stops rising at about half of max TDP.
Long_comment_san@reddit
You can easily drop a lot of wattage on 3090. Like 20% performance is 40% power or so.
ziphnor@reddit
Is that the whole rig measured at the wall? I would hope that at idle it's not the 3090s dominating?