To what degree to PCIe lanes x16 vs x4 or x1 matter in a multi-GPU setup for running LLMs?

Posted by fabkosta@reddit | LocalLLaMA | View on Reddit | 21 comments

Many mainboard offering multi-GPU setups only offer one primary PCIe slot with full x16 bandwidth, wheras the others are then at e.g. x4 or oftentimes only x1. Let's assume I'd have 1 Nvidia RTX 3090 at x16 and 3 others at x1, how does this realistically impact the processing speed of an LLM vs having all four on x16? Does anyone have real-life experience?