128GB VRAM quad R9700 server

Posted by Ulterior-Motive_@reddit | LocalLLaMA | View on Reddit | 128 comments

This is a sequel to my previous thread from 2024.

I originally planned to pick up another pair of MI100s and an Infinity Fabric Bridge, and I picked up a lot of hardware upgrades over the course of 2025 in preparation for this. Notably, faster, double capacity memory (last February, well before the current price jump), another motherboard, higher capacity PSU, etc. But then I saw benchmarks for the R9700, particularly in the llama.cpp ROCm thread, and saw the much better prompt processing performance for a small token generation loss. The MI100 also went up in price to about $1000, so factoring in the cost of a bridge, it'd come to about the same price. So I sold the MI100s, picked up 4 R9700s and called it a day.

Here's the specs and BOM. Note that the CPU and SSD were taken from the previous build, and the internal fans came bundled with the PSU as part of a deal:

Component Description Number Unit Price
CPU AMD Ryzen 7 5700X 1 $160.00
RAM Corsair Vengance LPX 64GB (2 x 32GB) DDR4 3600MHz C18 2 $105.00
GPU PowerColor AMD Radeon AI PRO R9700 32GB 4 $1,300.00
Motherboard MSI MEG X570 GODLIKE Motherboard 1 $490.00
Storage Inland Performance 1TB NVMe SSD 1 $100.00
PSU Super Flower Leadex Titanium 1600W 80+ Titanium 1 $440.00
Internal Fans Super Flower MEGACOOL 120mm fan, Triple-Pack 1 $0.00
Case Fans Noctua NF-A14 iPPC-3000 PWM 6 $30.00
CPU Heatsink AMD Wraith Prism aRGB CPU Cooler 1 $20.00
Fan Hub Noctua NA-FH1 1 $45.00
Case Phanteks Enthoo Pro 2 Server Edition 1 $190.00
Total $7,035.00

128GB VRAM, 128GB RAM for offloading, all for less than the price of a RTX 6000 Blackwell.

Some benchmarks:

model size params backend ngl n_batch n_ubatch fa test t/s
llama 7B Q4_0 3.56 GiB 6.74 B ROCm 99 1024 1024 1 pp8192 6524.91 ± 11.30
llama 7B Q4_0 3.56 GiB 6.74 B ROCm 99 1024 1024 1 tg128 90.89 ± 0.41
qwen3moe 30B.A3B Q8_0 33.51 GiB 30.53 B ROCm 99 1024 1024 1 pp8192 2113.82 ± 2.88
qwen3moe 30B.A3B Q8_0 33.51 GiB 30.53 B ROCm 99 1024 1024 1 tg128 72.51 ± 0.27
qwen3vl 32B Q8_0 36.76 GiB 32.76 B ROCm 99 1024 1024 1 pp8192 1725.46 ± 5.93
qwen3vl 32B Q8_0 36.76 GiB 32.76 B ROCm 99 1024 1024 1 tg128 14.75 ± 0.01
llama 70B IQ4_XS - 4.25 bpw 35.29 GiB 70.55 B ROCm 99 1024 1024 1 pp8192 1110.02 ± 3.49
llama 70B IQ4_XS - 4.25 bpw 35.29 GiB 70.55 B ROCm 99 1024 1024 1 tg128 14.53 ± 0.03
qwen3next 80B.A3B IQ4_XS - 4.25 bpw 39.71 GiB 79.67 B ROCm 99 1024 1024 1 pp8192 821.10 ± 0.27
qwen3next 80B.A3B IQ4_XS - 4.25 bpw 39.71 GiB 79.67 B ROCm 99 1024 1024 1 tg128 38.88 ± 0.02
glm4moe ?B IQ4_XS - 4.25 bpw 54.33 GiB 106.85 B ROCm 99 1024 1024 1 pp8192 1928.45 ± 3.74
glm4moe ?B IQ4_XS - 4.25 bpw 54.33 GiB 106.85 B ROCm 99 1024 1024 1 tg128 48.09 ± 0.16
minimax-m2 230B.A10B IQ4_XS - 4.25 bpw 113.52 GiB 228.69 B ROCm 99 1024 1024 1 pp8192 2082.04 ± 4.49
minimax-m2 230B.A10B IQ4_XS - 4.25 bpw 113.52 GiB 228.69 B ROCm 99 1024 1024 1 tg128 48.78 ± 0.06
minimax-m2 230B.A10B Q8_0 226.43 GiB 228.69 B ROCm 30 1024 1024 1 pp8192 42.62 ± 7.96
minimax-m2 230B.A10B Q8_0 226.43 GiB 228.69 B ROCm 30 1024 1024 1 tg128 6.58 ± 0.01

A few final observations: