PP speed on dual RTX 6000 12c EPYC setup

Posted by iVoider@reddit | LocalLLaMA | View on Reddit | 18 comments

I want to run big models like GLM 5.1 or Kimi k2.6.
I can buy Mac Studio M3 Ultra with 512gb ram, but PP speed would be ofc bad.

Then I researched benchmarks of hybrid single gpu (RTX 6000 or 5090) and system with EPYC 9xxxx and 12x channel DDR5 6400 ram planks.

On such setups PP is also abysmal post 96k context size, little bit higher than M3 Ultra.
Would a second RTX 6000 boost these numbers by parallelising tensors of dense models part and how much?