RTX 5090 or Mac Studio?

Posted by Excellent_Koala769@reddit | LocalLLaMA | View on Reddit | 38 comments

Hey Guys,

I run a small business where I use a many agents to handle sensitive client work. Everything has to stay 100% on-prem for compliance reasons.

Right now I'm running the full Gemma 4 31B dense model (4-bit) on my M5 Max laptop with 128 GB of memory. The main agent does long reasoning tasks and I'm only able to run about 2 agents at the same time. I get around 28 tokens per second when it's just one, but it drops to 22 when two are going. The whole thing feels slow and I'm already hitting the limit.

In the upcoming months I need to scale up to handle way more agents at once (around 40-80 concurrently).

I'm trying to decide between building a simple RTX 5090 desktop node (and using vLLM) or buying a high-RAM Mac Studio. The GPU side seems a lot stronger for running multiple agents, but the Mac would be quieter and simpler.

What would you guys do?