Is this setup good enough to run LLaMA 70B at 8-bit quantization?

Posted by matt23458798@reddit | LocalLLaMA | View on Reddit | 6 comments

Hey everyone!

I'm building a budget-friendly AI/ML rig primarily to experiment with running large language models like LLaMA 70B at 8-bit quantization. This is my first time building a rig like this before so I’ve put together the following components and wanted to get your thoughts on whether this setup is sufficient for the task:

My Current Setup:

  1. GPU: Liquid-cooled block of 4 RTX 3090s (connected using the EKWB Vector RTX water block).
  2. Motherboard: ASUS ROG Zenith Extreme X399.
  3. CPU: AMD Ryzen Threadripper 1950X (16 cores, 32 threads).
  4. RAM: 32GB DDR4.
  5. Storage: 1TB SSD running Linux.
  6. PSU: 2000W Modular Mining Power Supply (supports up to 6-8 GPUs).
  7. Chassis: Open-air mining rig
  8. Cooling: Liquid cooling loop for GPUs, 4 basic fans for airflow.
  9. OS: Planning to run Ubuntu/Linux for compatibility with AI frameworks.

What I Need to Know:

I’m aware that 32GB of RAM might be on the lower side for larger datasets or training, but I was hoping it would suffice for inference and I can always upgrade to 64GB or more in the future. Do I need to consider going up to 64GB now or can it wait? Is the older PCIe 3.0 architecture of the X399 board a dealbreaker?

Looking forward to your advice! Thanks in advance for helping me optimize this build.

Hey everyone!

I'm building a budget-friendly AI/ML rig primarily to experiment with running large language models like LLaMA 70B at 8-bit quantization. This is my first time building a rig like this before so I’ve put together the following components and wanted to get your thoughts on whether this setup is sufficient for the task:

My Current Setup:

  1. GPU: Liquid-cooled block of 4 RTX 3090s (connected using the EKWB Vector RTX water block).
  2. Motherboard: ASUS ROG Zenith Extreme X399.
  3. CPU: AMD Ryzen Threadripper 1950X (16 cores, 32 threads).
  4. RAM: 32GB DDR4.
  5. Storage: 1TB SSD running Linux.
  6. PSU: 2000W Modular Mining Power Supply (supports up to 6-8 GPUs).
  7. Chassis: Open-air mining rig (supports 6 GPUs, 81mm spacing between slots). Look at the pic attached for what the open air mining rig would look like (ignore the components attached its just a stock image).
  8. Cooling: Liquid cooling loop for GPUs, 4 basic fans for airflow.
  9. OS: Planning to run Ubuntu/Linux for compatibility with AI frameworks.

What I Need to Know:

I’m aware that 32GB of RAM might be on the lower side for larger datasets or training, but I was hoping it would suffice for inference and I can always upgrade to 64GB or more in the future. Do I need to consider going up to 64GB now or can it wait? Is the older PCIe 3.0 architecture of the X399 board a dealbreaker?

Looking forward to your advice! Thanks in advance for helping me optimize this build.