Bloomberg: No Mac Studios until at least October

[-]

eclipsegum@reddit (OP)

Should have bought the Mac Studio M3U 512GB two months ago. Waiting 6 months in LLM time is like Miller’s planet in Interstellar. Feels like 20 years will pass with tons of new models.

[-]

flyingbanana1234@reddit

im in this exact boat i forsaw this as exact situation with current mac studios unavailable and m5 studio delayed

but thought id uncharacteristically wait till june for the m5 studio

might buy a dgx spark instead atp

[-]

I love my Spark (well Asus GX10), but buying it for inference isn’t what I’d suggest. It’s lots better than it was. But SM121 is so unique that we’re only just getting to the point where kernels run on it to take advantage of its abilities.

Where it rocks is: learning CUDA and fine tuning bigger models. It’s a great device, I love mine. But just setting expectations!

[-]

flyingbanana1234@reddit

thank you for your advice !!! :)

[-]

TheRealMasonMac@reddit

On the bright side, you'll get better hardware for the same amount of money when things calm down.

[-]

eclipsegum@reddit (OP)

Feels like the zuck meme of peeking in the window at all the M3U 512 owners

[-]

fivetoedslothbear@reddit

I was waiting for the M5 to buy a Mac Studio 512GB.

In December 2024, I bought a 128GB M4 Max MacBook Pro on short notice for "reasons" and looking at the current situation, appreciating what I've got. Even the Mac Mini is capped at 64 GB right now.

[-]

fragment_me@reddit

Dude you’re going to love qwen 4.0 on that thing !!!

[-]

StupidScaredSquirrel@reddit

Just accept that hardware is a nightmare and run whatever sota model fits your current hardware. Best of luck

[-]

segmond@reddit

Oh well, Blackwell pro 6000 is looking like the option.. I was waiting to decide if I should get M5 Studio Ultra or at least 2xBlackwell pro 6000. M5 Ultra is suppose to match 4090, if that's true and it has at least 512gb then it will be worth the wait. However rumor is that it will now max out at 256gb. I'm going to wait till the end of the year. If it doesn't come out, I go blackwell pro, if it comes out and doesn't measure up, blackwell pro. For now, I'll manage with my current rigs.

[-]

FullstackSensei@reddit

Oh , no.... Anyway, I finally installed the fourth 3090 that's been sitting in my parts cabinet since six months in my triple 3090 rig, making it a quad 3090.

I now get a consistent 17-18t/s TG and ~76-80t/s PP running Qwen 3.5 397B Q4_K_XL all the way to 180k context. This is using vanilla llama.cpp. Ik would probably be faster if I bothered tuning parameters. Might not sound much, but with prompt caching PP takes less than 30 seconds per request and the whole request is done in under a minute for most requests. Power draw is ~600W from the wall during inference.

Even at today's prices, I could build it for half the price of the M3 Ultra 512GB. Doubt I'll use 5k worth of electricity over the lifetime of the machine.

[-]

segmond@reddit

You are preaching to newbies who just got into the scene recently or a few months ago. They don't understand...

[-]

droptableadventures@reddit

You're buying second hand 3090s because the hardware you want isn't decently priced brand new.

You've also got a long way to go - you'll need 18 more 3090s to hit 512GB.

[-]

FullstackSensei@reddit

No. I have 512GB system memory. So, I have a tad over 600GB total. The four 3090s take care of all the attention layers and context, while the octa-channel system memory takes care of the FFN layers. How else do you think I'm running a 400B at Q4_K_XL (245GB) plus 180k context?

[-]

droptableadventures@reddit

8 channels of RAM is decent (I have a Threadripper Pro machine), but it's about the bandwidth of an Apple Silicon "Max" chip. The Ultra is double that.

[-]

FullstackSensei@reddit

Like everyone downcoting me, you're ignoring the compute of the 3090s.

On large models, the Ultra doesn't even get to 30% of the memory bandwidth because it's limited by compute. You can double the memory bandwidth and it won't run much faster because most of the actual time is spent crunching attention.

The M3 ultra has the GPU compute of a single 3080, or a single Mi50. Even if an M5 Ultra doubles that, it'll still be the compute of about 1.5 3090s.

This sub is full of people reporting benchmarks on the M3 Ultra. Lookup how many t/s they get on a 400B model with 150k context.

I understand the appeal of the machine, but it's too darn expensive for what it offers if you know what you're doing.

[-]

droptableadventures@reddit

I have a dedicated machine with two 3090s and 8 MI50s.

I now know more about PCIe than I ever thought I would, and I've had to take a soldering iron to some of the hardware to make this setup work. It's loud, heavy and lives in a custom case made from T-slot aluminium extrusion. If I put it in the back of the car and drove it somewhere, it probably wouldn't work on arrival because some connector somewhere would need reseating. And mine's one of the ones you could feasibly move.

I definitely get the appeal of a Mac Studio, even if the performance per dollar isn't as good.

[-]

segmond@reddit

your stuff is loud because of the MI50. I have a rig with plenty 3090 that is quiet because they all have triple fan and rarely kick in.

I also have 10xMI50 in a crypto miner case with massive fans that is quiet and fan controlled to the temperature, barely hit 40% speed.

[-]

rumor is that M5 ultra won't even get 512gb but will be limited to 256gb due to ram shortage.

[-]

fivetoedslothbear@reddit

[-]

eclipsegum@reddit (OP)

Top 3 regrets for me