For those wondering about the power consumption of a dual 3090 rig while inferencing

Posted by sdfgeoff@reddit | LocalLLaMA | View on Reddit | 16 comments

Mine is \~760W measured at the wall by a smart plug.

Idle is 90Wish.

I haven't tweaked the power limit of the cards or done anything fancy.

[-]

Sunija_Dev@reddit

I get to 500W with 2x3090+3060 during inference, limiting the 3090s to 220W. It doesn't decrease speed, though I'm running without tensor parallelism because the second 3090 hangs on a 1x pcie. 😅 Roughly 110w in idle.

[-]

TacGibs@reddit

At 220W you're losing some performances in a pretty mesurable way, but the real killer is the 1x PCIe.

A PCIe 4.0 4x is enough for TP, don't you have a free M2 slot ? You can use an M2 to Oculink adapter, that's what I'm using and it's working flawlessly :)

[-]

An_Original_ID@reddit

Is pcie 4x 4 really enough? I have 2x 3090 and tried llamacpp TP and vLLM and got worse or similar results to pipeline. Operator error?

[-]

TacGibs@reddit

There's no TP for llamacpp, only tensor split. The closest thing available to TP with GGUF is graph mode on ikllamacpp.

And with PCIe 1x TP isn't beneficial at all because the card on the fastest PCIe port is constantly waiting for the one on the PCIe 1x.

[-]

TacGibs@reddit

Power limit your 3090.

Got 4 of them, and around 260W is the sweet spot (losing around 5% of performances compared to full power, which is 420W for my MSI Suprim X).

[-]

imp2@reddit

+1 to that. If you're on Linux, you can use LACT, they recently added actual VF curve editing (instead of the previous weird workaround to undervolt): https://github.com/ilya-zlobintsev/LACT/pull/957

I run my 2x3090s at 275W each. There will be a really minor hit to PP speed, but TG will be pretty much the same.

[-]

TacGibs@reddit

Do you know at tool that's working in pure CLI ? I'm on Ubuntu Server 24.04 so no UX for me :)

[-]

imp2@reddit

LACT's GUI is basically an editor to its YAML setting. You can edit that manually and run the daemon headlessly: https://github.com/ilya-zlobintsev/LACT/blob/master/docs/CONFIG.md

[-]

m31317015@reddit

This. Without power limit my dual SUPRIM X ran at 420W each, limit to 260-280W not only saves efficiency, but also avoid overheating and prevents cards from dropping buses.

[-]

TacGibs@reddit

I know, got the highest OctaneBench 2025.2 score for 4 RTX 3090 (2858.84), my PC was using around 2kW at this moment and the temperature in my main room was rising pretty fast 😂

At 260W those 4 Suprim X are dead silent 😎

[-]

TacGibs@reddit

A very informative page about 3090's power limits vs performances :

https://benchmarks.andromeda.computer/videos/3090-power-limit

[-]

Long_comment_san@reddit

You can easily drop a lot of wattage on 3090. Like 20% performance is 40% power or so.

[-]

ziphnor@reddit

Is that the whole rig measured at the wall? I would hope that at idle it's not the 3090s dominating?