First-time builder trying to put together a $90K 4-GPU inference server in Dubai -please tell me what I'm missing

Posted by Material-Link9151@reddit | LocalLLaMA | View on Reddit | 93 comments

TL;DR: Got executive approval to build a 4× RTX Pro 6000 Blackwell on-prem inference server for my company. Budget is $90k. I've never built a server in my life. Sourcing in UAE is harder than I expected. Looking for reality-check from people who've actually done this.

Hey everyone. Long-time lurker, first time posting. I work for a trading company in Dubai and I've somehow ended up as the guy in charge of building our first on-prem AI setup after a presentation I gave to the board went well. I'm a data/AI guy by background, not a hardware person. I've built gaming PCs before, that's about it. Now I'm staring at a BOM for something that's an order of magnitude more complex than anything I've touched, and I'm getting nervous. The project is basically my whole neck on the line at this point so I'd really appreciate sanity checks from people who've been here before.

What we're building:

We want to run 70B-class open-weight models in production (starting with Llama 3.3 70B at FP16) and grow toward flagship MoE models (Qwen3-235B-A22B at FP8) as the system proves itself. It'll be the backend for a multi-agent setup that plugs into our ERP, Outlook, and internal trading tools via MCP. 15–20 concurrent users, 24/7 uptime, with LoRA fine-tuning on top.

Current spec:

4× NVIDIA RTX Pro 6000 Blackwell 96GB (Server Edition or Max-Q — whichever I can actually source in the UAE right now)
1× AMD EPYC 9654 (single socket, 96 cores — went Zen 4 over Zen 5 to save budget for storage, figured the CPU isn't the bottleneck anyway on an inference workload, happy to be corrected)
1,152 GB DDR5-4800 ECC RDIMM (12× 96GB, fully populated)
4× Micron 9550 PRO 15.36TB PCIe 5.0 NVMe + 2× mirrored boot
2× Mellanox ConnectX-7 100GbE (bonded)
Eaton 9PX 6000VA online UPS with extended battery
Supermicro 4U chassis, 2× 2000W redundant Titanium PSU
Ubuntu 24.04 + CUDA 12.8 + vLLM

All in I have maximum 90k USD budget.

What I actually need help with:

Is this spec balanced or am I overbuilding / underbuilding somewhere? I know some of you will tell me 1TB+ of RAM is overkill, but the logic was 3× GPU VRAM for MoE CPU offload on Qwen3-235B. Is that still the rule of thumb or am I operating on outdated advice?
Max-Q vs Server Edition vs Workstation Edition — am I thinking about this right? My understanding: Workstation = dual-fan axial, only safe for 2 GPUs max. Max-Q = 300W blower, made for 4-GPU workstations. Server Edition = 600W passive, needs chassis airflow. If I'm going into a 4U Supermicro rackmount with proper fans, Server Edition seems like the "right" answer and not Max-Q. Anyone actually deployed these side-by-side?
Sourcing in Dubai is turning into a real issue. Anyone here done on-prem AI hardware procurement in the GCC region recently? Any vendors I should be looking at that I'm missing, or any I should avoid?
Can a hardware rookie actually assemble this or am I kidding myself? I'm comfortable with Linux, I can rack gear, I know which end of a screwdriver to hold. But I've never done tensor-parallel GPU config, I've never tuned BIOS for a dual-channel EPYC, I've never burned-in a server for 72 hours. Am I going to brick $40K of silicon on day one if I try to assemble this myself, or is it actually doable with good documentation and patience? If it's not doable solo, is the right move paying a local integrator a few thousand USD to handle the physical build?
Thermals in a Dubai office. We're not putting this in a datacenter. It's going into a standard office server room with a regular AC unit. The system draws \~2.1kW steady-state, \~2.4kW under training bursts. Ambient summer temps outside the building hit 45°C+. Anyone operated a 4-GPU box in a non-purpose-built room in a hot climate? What did you wish you'd known?
Gotchas I'm not seeing. This is the one I care about most. You know that thing where people who've actually done this say "oh by the way, make sure you have X" and it's not in any guide? I want those. Fire me all the "wish I'd known" moments you've got.

I know this is a long post. I also know some of you will tell me to just buy a DGX Spark or rent from Lambda, I promise I thought about it, the on-prem requirement is non-negotiable because of data residency. I'm not trying to reinvent anything, just trying not to screw up my first serious AI deployment.

Any help -even a single sentence- is genuinely appreciated. I'll read every reply.

Thanks from Dubai 🙏

[-]

matt-k-wong@reddit

Get the new dell gb300 for roughly the same budget

[-]