Multi-GPU owners here? Cooling question + small experiment

Posted by aospan@reddit | LocalLLaMA | View on Reddit | 13 comments

Hey folks, curious how people here test and monitor cooling on multi-GPU rigs.

Especially when cards are stacked close together, do you mostly rely on GPU temp graphs, fan curves, external sensors, or thermal cameras? Or has anyone gone completely overboard and modeled airflow with CFD? :)

Part of why I’m asking: we recently shipped a monitoring feature in Reefy.ai and added a Bench app that runs GPU stress tests using the open-source gpu-fryer project from Hugging Face.

If anyone has a multi-GPU rig and wants to try it: boot Reefy from a USB dongle, install Bench from the app catalog, run the GPU stress test, and share a screenshot of GPU utilization and temps. Monitoring works out of the box, no Grafana or agents to wire up :)

Curious to see how this works across different setups. Really appreciate it if anyone can try and share a screenshot 🙏