3090 vs mac choice

Posted by CaaKebap@reddit | LocalLLaMA | View on Reddit | 31 comments

Planning to run local models betwen 30b-120b mainly for (if viable, agentic) coding.

Current model targets are GLM-4.5-Air (110B), Qwen3-Coder-30B-A3B, gpt-oss-120b or 20b, Devstral-Small-2507 (24B) and Mistral-Small-3.2-24B.

Below are the options at my local market.

RTX 3090 24GB (2nd-hand), Ryzen 5 9600(arbitrary), 64/128GB DDR5, 1TB SSD — 1350$
RTX 3060 12GB (2nd-hand), Ryzen 5 5500(arbitrary), 64/128GB DDR4, 1TB SSD — 900$
Apple Mac Studio M1 Max — 64GB / 1TB SSD — 1000$ (2nd-hand)
Mac mini M4 — 32GB / 512GB — 1300$
MacBook Air M4 (10-core GPU) — 32GB / 512GB — 1800$
MacBook Pro 14 M4 Pro — 48GB / 512GB — 2700$
Mac Studio M4 Max — 128GB / 1TB — 4000$

I dont wanna spend too much but if that will make a really huge difference, I may consider going over 2000$.

So, considering price/performance including electricity usage through years but also considering ease of use which one should I prefer?

[-]

fallingdowndizzyvr@reddit

What about a Max+ 395? It's much better value than the Macs.

[-]

CaaKebap@reddit (OP)

max+ 395 pcs are mostly not available where I live and if you find one its hella expensive. Performance you listed is very good, thanks for the response.

[-]

fallingdowndizzyvr@reddit

They are available worldwide at the same price. Since they are shipped out globally from China. Well mostly global, they won't ship to war zones with active fighting for example.

Here's one for $1700.

https://www.bosgamepc.com/products/bosgame-m5-ai-mini-desktop-ryzen-ai-max-395

[-]

fallingdowndizzyvr@reddit

That's the gamble. There are a lot of exclusions. Primarily for electronics. At least one person has reported not being charged tariffs on a Max+ 395 machine from China.

I can't find that thread but here's a related thread. People have gotten their machines and I don't think anyone has said they got charged a tariff.

https://www.reddit.com/r/GMKtec/comments/1l1vmib/x2_has_been_stuck_in_shipping_since_may_12th/

But that was then. This is now. And tomorrow is tomorrow. Everything changes with a social media post from the White House these days.

[-]

kweglinski@reddit

how much does it cost in your area? here they are similarly priced as macs max and slower than them. I'm getting 70t/s in gguf with m2 max, don't remember pp

[-]

There is no M2 Max with 128GB. The cheapest would be a used M3 Max. With 128GB that's $3500 or more. A Max+ 395 with 128GB is $1700. So half the cost for 70% the performance for TG. For PP it's faster.

[-]

crantob@reddit

I'd go 4x32GB DDR5 ryzen 9 + 2x 3090. (Air disappointingly weak for mid-context coherence and memory)

[-]

bestofbestofgood@reddit

Sounds like a good question to chatgpt, he likes such broad not coherent provided options. I mean, you mixed together 900$ and 4k$ options, stationary variants and MBA, laptops and separate discrete card. Make your mind first, what do you need in terms of use cases and money

[-]

CaaKebap@reddit (OP)

I am all for price/performance, the form of the device is not a criterion for me. I am just open to advices thats why I placed so many options. I could not find a good post about mac versus pc so here we are.

[-]

Miserable-Dare5090@reddit

So your main goal is to run GLM4.5 Air, you need either 96gb of vram, or a 3090 with a beefy cpu and lots of ram.

I don’t know the estimates for inference without loading the model whole in vram. I know it can be done with ik_llama and fast, well slotted ram to run at highest frequency. But large language models are tensor computations, and vectors/tensors are the reason GPUs exist.

The M1 ultra has a 800mhz bandwidth in unified memory, and I believe you will happily run GLM at 40+ tk/s with that.

[-]

CaaKebap@reddit (OP)

40 tk/s is really good and acceptable. Thanks you for the response! I wonder what 3090 with enough ram would generate.

[-]

Eugr@reddit

Don't know about 3090, but 4090 on my i9-14900K with 96GB DDR-5 6600 gives me on llama.cpp:

\~24 t/s generation on gpt-oss-120b with 28 MOE layers offloaded to CPU (and 128K for K/V cache). Both model and cache are FP16 (well, model is MXFP4 as it was trained by OpenAI).
\~12 t/s generation on glm4.5-air-Q4_K_XL with 41 MOE layers offloaded to CPU (and 128K for K/V cache with q5_1 quant).

These are for the first few kilotokens, as context fills, the speed gets slower. Also, token generation is not great for CPU-offloaded models.

You need 96GB RAM for this with 3090 if you want full context, especially GLM. You can run gpt-oss-120b with 64GB, but it will barely fit. For unified memory systems, 128GB would be minimum.

Anyway, out of the options you listed, M1 Ultra is probably optimal.

You can also look into AMD AI Max 395+ in desktop form factor, like Framework Desktop and some other alternatives. It only has 250gb/s memory bandwidth, so the token generation on these large MOE models will be slower than on Mac, but faster than on more "traditional" AMD (or Intel) desktops. Prompt processing should be faster than most of the options you listed for models that don't fit into 3090, though. The most attractive thing is that you can have it for about $2K new, and the power consumption would be only slightly higher than Mac options. I'm actually considering getting one of these to act as my 24/7 home inference server.

[-]

cornucopea@reddit

OP should have defined the "performance" notion. regardless the size of model, less than 20 t/s is intollerable for practical uses. 40+ t/s is really the benchmark floor unless for the purpose of testing the hardware.

For large model other than gpt-120B, considering context size and inference performance, 4x3090/4090 would be minimum, a build nightware. So the AMD AI Max 395 is ideal to run small model e.g. <30B despite it has more RAM, the bw would choke any large model. Yet gpt-120B might just be the sweetspot for it as there is no way I can make the gtp-120B running > 20 t/s on 2x3090 and you can foget about the context, it won't even handle a 10 page PDF.

So if OP is Ok with 15 t/s for GLM, sure go ahead with any of these so long plenty memory is provided regardless VRAM or DRAM. Again, what does "performance" mean to OP in this context is yet to be defined.

[-]

elchulito89@reddit

Mac Studio M4 max is the clear winner here

[-]

CaaKebap@reddit (OP)

Will it serve more over Mac Stuido M1 Ultra with 128gb ram? M4 max costs 1700$ more.

[-]

elchulito89@reddit

Memory bandwidth matters as well. The 3090 has the best memory bandwidth but it can’t handle a 120b unless you download a quantified model and drop the accuracy. The M4 max has the second best. So for the bang for the buck it’s the better choice. But I would recommend the Mac Studio M3 ultra that goes for 4k if you buy it online. Microcenter had it on sale this week for 3599.99 and it comes with 96gb.

[-]

CaaKebap@reddit (OP)

Why would I go for m4 max since it has 546gb/s bandwith where m1 ultra has 800 gb/s bandwith? Ok, M3 ultra has 819 gb/s but also M1 ultra has 800 gb/s bandwith. Does cpu matter for local llm that much?

[-]

elchulito89@reddit

It’s not just memory bandwidth. As I mentioned before it’s one of the requirements. Also, Mac has Unified memory. So unlike a CPU and GPU that’s independent when you upload a memory it’s uploaded in RAM first and then it’s transferred to the GPU VRAM. So it’s only one upload…. These are the other benefits. You can also ask Claude or ChatGPT. (For bigger models I always go Mac because it’s a great bang for the buck.)

• M4 Max has significantly improved memory controllers and cache hierarchy • Better memory bandwidth per core efficiency Neural Engine: • M4 Max has a much more advanced Neural Engine (16-core vs M1 Ultra’s 32-core, but the M4’s cores are far more capable per core) • Better optimization for transformer architectures used in modern LLMs • Improved matrix multiplication units specifically designed for AI workloads CPU Performance: • M4 Max has faster single-core performance, which matters for sequential parts of LLM inference • More efficient performance cores with better IPC (instructions per clock) Software Support: • Better optimization in frameworks like MLX, Core ML, and PyTorch for M4 architecture • More recent compiler optimizations and metal shaders Power Efficiency: • M4 Max delivers better performance per watt, allowing sustained performance without thermal throttling

[-]

real-joedoe07@reddit

Neural engine plays absolutely no role when it comes to interference or diffusion. What counts for performance is memory bandwidth and gpu-cores. With OP’s configurations, M1 ultra is way better suited than the M4 max for AI tasks.

[-]

real-joedoe07@reddit

Comprehensive benchmarks for all silicon Macs: https://github.com/ggml-org/llama.cpp/discussions/4167

[-]

fallingdowndizzyvr@reddit

Not quite. A Max+ 395 is less than half the cost with more than half the performance. It's faster in PP. and about 70% the speed in TG.

[-]

Ill_Yam_9994@reddit

I think Mac if you're not interested in the other benefits of Nvidia like gaming and image generation. The Nvidia cards are in an awkward spot right now in my opinion where the good open source models are all huge MoEs that aren't really that well suited to even a 24GB card.

[-]

NoidoDev@reddit

God. What happened to the Dollar? I thought this is quite expensive but then I looked it up in Euro. Mac M4 Max 2300 and a bit.

[-]

CaaKebap@reddit (OP)

This Mac Studio M4 Max I listed is 128gb ram configuration.

[-]

QFGTrialByFire@reddit

I can run gpt-oss 20b, Devstral-Small-2507 (24B) and Mistral-Small-3.2-24B. on my 3080ti pretty fast (\~100tk/s). So if you get a 3090 with double the vram Qwen3-Coder-30B-A3B should be no issue. However GLM-4.5-Air (110B) or gpt-oss-120b would be pushing it even quantised and some cpu offloading. In my personal opinion laptops aren't great - they will overheat especially on training runs or large/long inference with the cards you can take away the heat easier. I've no experience with the M1/M4 but just looking at it gives me pause as i don't see great ways of getting heat out. Perhaps others who train models on it can give better advice.

[-]

StandardLovers@reddit

What about cooling? .. isn't it easier to cool a GPU than a macbook air ?

[-]

tmvr@reddit

You need about 60-80GB of memory to run the largest models on your list so that eliminates a lot of the options. What is left are the two 128GB Mac Studios and the PCs. If you have the budget for the $4000 M4 Max then I would suggest to get the first PC with the 3090 for $1350 and the Mac Studio M1 Ultra 128GB for $2300 so you have 2 machines.

[-]