Sanity check for a Threadripper + Dual RTX 6000 Ada node (Weather Forecasting / Deep Learning)

Posted by Icy_Gas8807@reddit | LocalLLaMA | View on Reddit | 7 comments

Hola!!

tldr

I’m in the process of finalizing a spec for a dedicated AI workstation/server node. The primary use case is training deep learning models for weather forecasting (transformers/CFD work), involving parallel processing of wind data. We are aiming for a setup that is powerful now but "horizontally scalable" later (i.e., we plan to network multiple of these nodes together in the future).

Here is the current draft build: • GPU: 2x NVIDIA RTX 6000 Ada (Plan to scale to 4x later) • CPU: AMD Threadripper PRO 7985WX (64-Core) • Motherboard: ASUS Pro WS WRX90E-SAGE SE • RAM: 512GB DDR5 ECC (8-channel population) • Storage: Enterprise U.2 NVMe drives (Micron/Solidigm) • Chassis: Fractal Meshify 2 XL (with industrial 3000RPM fans)

My main questions for the community: 1. Motherboard Quirks: Has anyone deployed the WRX90E-SAGE SE with 4x double-width cards? I want to ensure the spacing/thermals are manageable on air cooling before we commit.

  1. Networking: Since we plan to cluster these later, is 100GbE sufficient, or should we be looking immediately at InfiniBand if we want these nodes to talk efficiently?

  2. The "Ada" Limitation: We chose the RTX 6000 Ada for the raw compute/VRAM density, fully aware they lack NVLink. For those doing transformer training, has the PCIe bottleneck been a major issue for you with model parallelism, or is software sharding (DeepSpeed/FSDP) efficient enough? Any advice or "gotchas" regarding this specific hardware combination would be greatly appreciated. Thanks!