Recommendation for Production Hardware for inference and fine tuning.

Posted by Whyme-__-@reddit | LocalLLaMA | View on Reddit | 2 comments

Hi guys, I am trying to get a mini Ai rig which I can run 2-3 20b models finetuned on proprietary data and sending it to customers as a startup.

There are 3 goals i need to achieve from this machine

Finetuning and RL from the machine
Inference via vLLM on larger workloads using our front end software which is dockerized.
Ease of deployment: I want to load up my software, connect it to the LLMs on the machine and ship it to customers to deploy in their environment. Completely private.

My options are:

DGX spark,
GMKtec AI Mini PC Ryzen Al Max+
Anything else you recommend but I don’t want to build a tower pc and mess around with the form factor.

What are the challenges that I can encounter with the option 1,2 to accomplish my goals?

Any help regarding this would be greatly appreciated. Thank you

[-]

Conscious_Cut_6144@reddit

Local isn't going to scale any better than cloud.

You mentioned runpod, the stuff you linked in options is going to perform more on par with an rtx 4000, not even close to an H100. Might want to retest your workload. There is a reason H100's sell for 20k

What model are you talking about, 20b dense or moe? 4bit 8bit or 16bit quantization?

Such_Advantage_6949@reddit

The difference between H100 and the options u listed are like speed difference between a car and a bicycle. Are you sure those will meet your need?