Anyone running HUANANZHI H12D-8D + BMC with 4x RTX 3090 for LLM inference?

Posted by awfulalexey@reddit | LocalLLaMA | View on Reddit | 15 comments

Hi everyone,

I'm considering building a home LLM inference rig around:

- HUANANZHI H12D-8D + BMC

- AMD EPYC 7002/7003

- 4x RTX 3090 24GB

- DDR4 ECC RDIMM, 8-channel

- Linux + vLLM / SGLang / llama.cpp

- Open frame, PCIe 4.0 x16 risers

The board looks very attractive for the price: EPYC SP3, 8-channel memory, BMC/IPMI, 4x PCIe 4.0 x16 physical slots, 3x M.2, etc. But documentation and real-world reports are a bit scattered, so I’d love to hear from actual owners.

Questions:

  1. Do all 4 PCIe slots run electrically at x16, or is one of them limited to x8?

Could you share lspci -vv / nvidia-smi link width output if possible?

  1. Does Above 4G Decoding work properly with 3-4 GPUs?

  2. Does Resizable BAR work after the newer BIOS update?

I saw that HUANANZHI has a BIOS note mentioning Resizable BAR / PCIe split optimization.

  1. Any issues booting with RTX 3090 specifically?

I’ve seen some reports about GPU compatibility quirks on this board.

  1. How stable is the BMC/IPMI module?

Does remote KVM work reliably? Any fan control or sensor weirdness?

  1. Any RAM/channel issues with 8 DIMMs?

Did all 8 memory channels work out of the box?

  1. How long is POST/boot time in your setup?

  2. Any problems with PCIe 4.0 risers?

Did you have to force Gen3/Gen4 manually?

  1. If you run vLLM/SGLang/llama.cpp on this board, how has stability been under long inference workloads?

  2. Would you buy this board again, or would you rather go with Supermicro H12SSL-i / ASRock Rack ROMED8-2T / TYAN S8030?

My main concern is not peak CPU performance, but stable 4-GPU operation for LLM inference. Even 3x PCIe 4.0 x16 + 1x PCIe 4.0 x8 would probably be acceptable, but I’d like to understand the real limitations before buying.

Thanks!