Need advice on building a GPU-based render/Al compute setup: Unsure about hardware direction

Posted by One_Abroad_5937@reddit | LocalLLaMA | View on Reddit | 3 comments

Hey everyone,

I'm in the early stages of planning a high performance GPU compute setup that will primarily be used for heavy rendering and maybe Al workloads. I'm still finalizing the exact business and infrastructure details, but right now I need to make some critical hardware decisions.

I'm trying to figure out what makes the most sense. Should I build using multiple high-end consumer GPUs (like 4090s or similar) in custom nodes, or invest in enterprise-grade GPU servers like Supermicro with NVLink or higher-density rack configurations.

If anyone here has experience with setting up render farms, Al inference/training clusters, or GPU virtualization environments, l'd really appreciate your insight on things like:

• Hardware reliability and thermals for 24/7 workloads. • Power efficiency and cooling considerations. • Whether used/refurb enterprise servers are a good deal. • Any gotchas when scaling from a few nodes to a full rack.

Thanks in advance for any and all advice I can get, especially from those who are familiar with this stuff and running similar systems.

[-]

Lissanro@reddit

It is the best to have all GPUs connected to a single system. As an example, https://www.gigabyte.com/Enterprise/Server-Motherboard/MZ32-AR1-rev-30 - this motherboard has 16 RAM slots, four PCI-E 4.0 x16 slots, one PCE-E 3.0 x16 slot, one PCI-E 4.0 x8, and supports x8 x8 or x4 x4 x4 x4 bifurctation modes on each slots (x4 x4 for x8 slot), which means you can have up to 18 GPUs on x4 PCI-E 4.0 + 2 GPUs on PCI-E 3.0 x8 (which will have about the same bandwidth as PCI-E 4.0 x4), for up to 20 GPUs in total.

This does not have to be insane cost - you can use MI50 32GB for example, which are quite cheap compared to 3090.

About 4090, I suggest to avoid it, it costs much more than 3090 but has the same VRAM and lacks newer architecture features that 5xxx series has. So it may not worth its price for AI, unless you are sure that it is good for your specific workload or you found an especially good deal for it.

If you are looking for cheap used options, then EPYC with DDR4 + GPUs like 3090 if you want Nvidia or MI50 if you are willing to consider older AMD cards.

If you want high end system, then consider if you plan GPU-only or CPU+GPU inference. If GPU-only, then you can still stick with EPYC DDR4-based system still there is no much point going with higher end EPYC with 12-channel DDR5, but for CPU+GPU inference it can be of great help if you got the budget. Exception: if you want training and going to buy Blackwell GPUs that support PCI-E 5.0 and you want as much PCI-E bandwidth as you can get, then going with high end DDR5 system may make sense.

You don't need NVLink for inference. If you are looking for higher end options, then consider RTX PRO 6000 cards, 8 of them can run even biggest models like Kimi K2 fully in VRAM. Obviously, enterprise grade GPUs can be even better, but the cost is going to build up even further - chances are, if you are asking on redding, you probably have limited budget, but of course you should do your own research.

How many users you will have also matters, if it is just you then CPU+GPU inference could save you a lot of money to run the same models with a backend like ik_llama.cpp (since it has better performance than mainline llama.cpp for CPU+GPU inference), but if you going to have many users or just need high throughput, then you will need GPU-only inference with backends like VLLM.