Apple stopped selling 512gb URAM mac studios, now the max amount is 256GB!

[-]

No-Introduction-2211@reddit

我这个网站本来还指望着买台mac studio来支持开发呢？

https://www.amazingindex.com

[-]

DeepAd8888@reddit

I was actually planning on buying a 512 version. Oh well will look at other options

[-]

Sabotag3-@reddit

I think they’re redirecting it to the M5 Ultra Mac Studio for April.

[-]

eclipsegum@reddit

They are selling on ebay for $25K. They’re the only legitimate option for running large models on a desktop and in retrospect were a steal

[-]

Icy_Distribution_361@reddit

Nah those large models would still run super slow even if they fit in memory. It’s not really usable. It might become usable with M5 Max

[-]

Something-Ventured@reddit

They run fine and are perfectly usable.

I have the M3 Ultra.

[-]

Something-Ventured@reddit

Im getting local Claude sonnet/opus-like speeds with deepseek, and gpt-oss, etc.

I haven’t benchmarked in a year, so I couldn’t tell you tps. You can google those, but it’s very workable.

[-]

Civil_Response3127@reddit

Yeah, but which deepseek. Large ones that push the 512gb ram do not run at that speed

[-]

Something-Ventured@reddit

There’s a lot of throttling on regular subscription plans now on Claude. So it definitely does get close.

[-]

Civil_Response3127@reddit

You say that as if they're on the same scale. Even with throttling, your M3 isn't even close to the ingest and output of Claude Code, even on Opus 4.6.

In Claude code, when the agent is doing its thing, it regularly has 5 to 10 other subagents running at the same time. All at 40 tok/s approx. When you have another one or two conversations going at the same time, this is especially different. For any model coming close to using up your 512 gig of RAM, your tokens per second is absolutely not even close to the same as a single stream of Claude Opus 4.6, let alone all of them simultaneously.

[-]

Something-Ventured@reddit

https://www.reddit.com/r/technology/comments/1s4w4gm/anthropic_tweaks_claude_usage_limits_to_manage/

Your mileage may vary.

I’ve been getting significantly slower prompt responses, having to retry frequently enough, etc. that it’s about the same.

I had to disable all my cowork tasks because of the new throttling policies.

I dropped down to the $20/m plan after evaluating that I got good enough performance locally for my workflows and github my copilot plan somehow got better Claude performance than my Claude subscriptions.

The slightly slower TPS of local, even with large models, is irrelevant when throttling and having to retry prompts on Claude happens. It’s also way less relevant when you’re actually inspecting the code changes and bounding the prompts.

The “faster” aspects of Claude don’t really matter when you have to frequently stop it from wasting tokens or doing things it shouldn’t to avoid being throttled.

[-]

Civil_Response3127@reddit

No, it isn't a question of your mileage may vary. The tokens per second just aren't even close, even with Claude's throttling that I already acknowledged. Additionally, your link does not reference throttling, that is to do with usage limits.

[-]

Something-Ventured@reddit

TPS is the wrong metric.

Useful token per workday is the metric.

My workflow isn’t generating shit code I haven’t read and accruing technical debt burning through tokens.

My workflow is to review data and create analysis and interactive data tools which can be repeated and verified.

Claude throttles and reduces token limits, both.

I prompt and tab over to my real work only to come back to some retry issue (I’m not out of tokens, the prompt was throttled or canceled). Or it went off reservation and did things I didn’t tell it to do and wasted 5-20 minutes of effort.

It got unreliable enough that local models perform well enough that I cut my $200/m subscription to $20.

TPS is an idiotic metric for functionality and LLM use.

[-]

Civil_Response3127@reddit

Tokens per second is absolutely the correct metric if you use the term throttling. You can't even compare the two, because Claude can output maybe even two orders of magnitude more code per second than your setup, which just inherently means you then become the bottleneck. It sounds like you've got the strangest setup where you don't really output much but you let it take ages to do so, and then you come back to review it all as a batch instead of dynamically through the workday, which is far better for cognitive load.

And stop trying to throw around the term "usage limits" or "throttling", because again, comparing your setup to that is just plain incorrect.

It genuinely just sounds like you're confused by the technology and giving people very out of date and confused advice.

[-]

Something-Ventured@reddit

I said I get good enough performance running large models on the studio. It is good enough, and output is close to opus/sonnet levels as measured by useful work accomplished. I also don’t run out of tokens.

But TPS is still a stupid metric because I frequently find I have to interrupt Claude when it starts doing things it shouldn’t, I get prompt failures during peak time of day from what is obviously throttling of the service by Claude, and there is still human latency time for review and actions that means more time elapses between prompts allowing even a slower local model to be good enough.

I work in actual science fields with embedded hardware / instrumentation setups collecting terabytes of real data. This is a very different experience than some front-end coder where putting vibe coded garbage and unoptimized JavaScript doesn’t matter.

Opus is not an order of magnitude faster than local, as you just said. OSS, Deepseek, GLM, etc. all run well enough locally. At this point opus is so slow and good at incorrect outputs after wasting time I only use Haiku through my GitHub copilot subscription, and use my Claude subscription to research tasks (scouring the net).

Telling others you’re not technical and don’t understand really just makes you sound like a gamer fanboy or web monkey who isn’t technical and cannot comprehend other people’s workloads. I work on scientific computing workloads where no LLM is accurate enough that you can just vibe code your way through to an actual answer. TPS is irrelevant.

[-]

Civil_Response3127@reddit

Im getting local Claude sonnet/opus-like speeds with deepseek, and gpt-oss, etc

No, you said this and I have been saying that is not true.

[-]

Something-Ventured@reddit

TPS is slower. No shit Sherlock.

Useful work is about the same.

As I originally said "it's good enough" TPS is irrelevant as a metric. I get accuracy and useful work per day to be about the same. That's called good enough.

It's also why so many people use smaller models locally for coding.

[-]

Virtamancer@reddit

Gpt oss isn’t a large model, it’s not even remotely close to 512gb. The large models are >512gb and barely fit into 512gb AFTER being quantized—those would presumably run pretty damn slow.

The advantage would be having multiple small models like gpt oss or qwen3.5 in memory without having to load/unload them.

[-]

Something-Ventured@reddit

Yes and I am able to run multiple in memory and switch tasks or run full deepseek at once…

All at decent speeds

[-]

LambdasAndDuctTape@reddit

Cope all you want for buying that expensive piece of hardware and falling for the massive PR stunt but the reality is you could've funded Max for multiple years, gotten much better performance and cutting edge models, and still had money left over.

[-]

Something-Ventured@reddit

lol, dude. I run 2-3 week batch processing jobs that use 400gb of ram and it was a 90% cost reduction per YEAR vs cloud compute to use CUDA.

There's no cope. It was a ridiculous cost savings.

LLM use is just a bonus.

[-]

eclipsegum@reddit

Qwen3.5-397B 35 tok/s and likely faster with TurboQuant

[-]

Hyiazakite@reddit

PP speed 32k context?

[-]

BumbleSlob@reddit

Apple just launched the MacBook neo line which is going to sell like hotcakes. Their CEO is famously the best supply chain guy in the history of the tech world. I think it’s more likely they’re just saving chips for the refreshed M5 Ultra mac studios arriving in a month or three.

[-]

Late-Assignment8482@reddit

I would relax about the "they're never making another 512GB model!!!" theory.

This is most likely that they sold a very few of them (halo build of a halo product line) and are dropping the M5 Ultra sometime this year, so it makes sense to hold supply back for that. Unless they actually put out a press release and say "we're never selling these again" (which they did say about Mac Pros recently) quiet store changes are usually related to an upcoming product of some kind.

Apple likes to set a price when they introduce a product, and hold to it for that products lifespan. They also have long term parts contracts.

This also may be supply conservation.

They take a real hit if they have to release a 30k product because of a price hike that goes away a year later. The bad press doesn't revert. Google searches in 2029 are seeing memes about how Mac Studios start at 28k even though the price went back down in 2027.

If setting that DDR5 aside for the upcoming M5 model and losing maybe a few hundred or thousand sales gets them over a gap in RAM price lock-in, then they get press for "Apple took care of customers during RAM insanity" and the M5 Ultra drops in October, and they come strong in a time when local models are buzzy and their product is dirt cheap.

[-]

PracticlySpeaking@reddit

They take a real hit .. because of a price hike that goes away

If you listened to the earnings call, they talked about "margin pressure" — CEO-speak for "we are going to eat some cost."

[-]

Late-Assignment8482@reddit

Yup. Tim Cook may not be flashy, but the man knows systems and supply chains and manufacturing pipelines.

[-]

PracticlySpeaking@reddit

And Apple have huge negotiating leverage — despite rumors to the contrary — (still) being one of, if not the largest customer for many suppliers.

[-]

Late-Assignment8482@reddit

And they're steady. AI Bubble pops and NVIDIA needs triage to stay in business?

Apple's still going to buy a hundred million iPhones a year.

[-]

PracticlySpeaking@reddit

Try 240M for iPhone 🤯
...along with 25M Macs.

[-]

Late-Assignment8482@reddit

Well, I was in the right order of magnitude at least.

[-]

PracticlySpeaking@reddit

If you were Tim Apple, would you put the 512GB on hand into the next-generation M5 Ultra, or the generation-behind M3 Ultra?

...or 40 iPhone 17 Pro? At 12GB each, that's more like $40,000 in revenue.

[-]

Adrian_Galilea@reddit

Are you sure that you can use that same memory on the m5?

[-]

Georgefakelastname@reddit

Yeah, phone and Mac memory aren’t even the same, to my knowledge.

[-]

PracticlySpeaking@reddit

We are not talking about stacks of inventory sitting on shelves, or DIMMs from Micro Center waiting to go into PCs.

Semiconductor fabs and packaging are massively expensive. Chips move through very quickly. The time to start making M5 is carefully planned, with simultaneous orders for the correct DRAM well in advance.

[-]

PracticlySpeaking@reddit

M4, M5 and their corresponding A-series SoCs all use LPDDR5X.

[-]

No_War_8891@reddit

640k ought to be enough for anybody

[-]

ProfessionalSpend589@reddit

640k tokens context I presume?

[-]

No_War_8891@reddit

sry was memquoting Bill Gates - forgive me I’m old

[-]

ryfromoz@reddit

The joys of config.sys and autoexec.bat

[-]

etaoin314@reddit

yeah you dont need those, go ahead and delete them. /s

[-]

pscoutou@reddit

EMS vs XMS.

[-]

No_War_8891@reddit

I made the school sysadmins life a hell by changing it on all pcs 🙃

[-]

boptom@reddit

Qemm memory unlocked

[-]

_twrecks_@reddit

I recall the original Mac only having 512k with no expansion options, Jobs said something like it would force the programmers to write tighter faster code. Everyone reveres Jobs and demonizes Gates.

[-]

droptableadventures@reddit

The original Mac had 128k because Jobs said it absolutely had to sell for $2499 at most.

The "fat Mac" with 512k actually came a bit later.

[-]

CanineAssBandit@reddit

The difference I see there is that Steve admitted it wasn't actually enough ram but did it anyway because costs and they are a hardware+software company, whereas Bill straight up didn't think it was needed despite being purely a software company (which implies lack of imagination).

[-]

infearia@reddit

Everyone reveres Jobs and demonizes Gates.

Not true! I demonize them both.

[-]

ProfessionalSpend589@reddit

Yeah, I got it. I was trying to create a new joke or something :)

[-]

hellomistershifty@reddit

512gb will be offered as an option for an additional $640,000

[-]

IrisColt@reddit

I understood that reference, sigh...

[-]

dobkeratops@reddit

640gb maybe

640tb in a few decades hopefully.

[-]

pier4r@reddit

tbf a lot of SW is mostly bloated in my view, for this we need a lot.

I am not talking about LLMs though.

[-]

dobkeratops@reddit

agree regular software can be way more efficient , everyone got used to using web frameworks etc

[-]

Maleficent-Ad5999@reddit

We’d still need couple of 640gb devices to run kimi

[-]

droptableadventures@reddit

622GB at UD-Q4_K_XL, so it'd barely fit on one if you didn't have much context.

[-]

some_user_2021@reddit

DEVICE=C:\Windows\HIMEM.SYS

[-]

bernaferrari@reddit

just wait a few months for m5 or m6 ultra, not worth it for m3

[-]

Neighbor_@reddit

m6? I'm waiting for m7

[-]

bernaferrari@reddit

You can, but m7 will be a minor update, m6 is 15% faster on 30% less energy.

[-]

Neighbor_@reddit

But won't it be a year+ for the m6 studio / mini to come out?

I was actually joking on the above because like, 15% / 30% improvements are kinda baked in. That's just Moore's Law.

[-]

bernaferrari@reddit

No one knows. Moore law ended long time ago. This is the first nm reduction in a few years.

[-]

Adrian_Galilea@reddit

Are you certain of that?

[-]

jonydevidson@reddit

TSMC N2

[-]

power97992@reddit (OP)

It will have 256 or 512 gb of ram but probably not more

[-]

Yorn2@reddit

Yup, and they are selling on Ebay for over $20k.

[-]

Neighbor_@reddit

How the hell are these going for 20k? Aren't we just a few months away from an M5 Mac Studio, which would be like 10k with all the upgrades?

[-]

datbackup@reddit

Why do you assume they’d be 10K with all the upgrades? Why not assume Apple knows they can price them at $16K and they’d still sell equally well? Why not assume there will be no 512GB units because demand is so high for local inference that people will be willing to buy two 256GB units which results in higher margin for apple?

[-]

Neighbor_@reddit

Based on previous prices, 10k for all upgrades seems reasonable. If we anticipate a spike in prices, it'll probably be more in the 12k range
Even at 16k, this is still the best hardware you can get, vs some outdated M3 for 20k...

[-]

Yorn2@reddit

Considering the 256GB RAM versions are even selling for a lot on Ebay auctions as well I suspect the M5s are going to be priced a lot higher than people think.

[-]

JacketHistorical2321@reddit

This is weeks old news dude

[-]

power97992@reddit (OP)

Yep, people noticed like 3 weeks Ago

[-]

ElementNumber6@reddit

So why are you posting this as though it was just discovered?

[-]

power97992@reddit (OP)

I discovered it while searching.

[-]

Flimsy_Leadership_81@reddit

goodmorning my baby!

[-]

Specialist_Golf8133@reddit

wait this is actually huge if true. the 512gb configs were basically the only consumer hardware that could run the absolute chonkers locally without completely falling apart. apple quietly killing the top end feels like they're either preparing new silicon or they realized almost nobody was buying them. which means the local llm crowd just lost their best plug-and-play option for running like 200b+ models

[-]

Pleasant-Shallot-707@reddit

Old news. Apple is emptying the pipeline because they’re ramping up production for the refresh coming on June 8th

[-]

_derpiii_@reddit

they’re ramping up production for the refresh coming on June 8th

Is that date confirmed?

[-]

Pleasant-Shallot-707@reddit

Yes

[-]

droptableadventures@reddit

It's never confirmed but that's the first day of WWDC - Apple's developer event, and it has been announced that "major AI advancements" will be part of the theme.

[-]

_derpiii_@reddit

Gotcha. Thank you for the clarification.

[-]

dinerburgeryum@reddit

Eh. M3 was always overhyped given the lack of matmul cores on the GPU. Prefill time was pretty bad. Almost certainly they’re just flushing inventory while building M5 stock. Bummer if you really, really need a new one, but otherwise I’m cool with them focusing on the chips that are actually good at inference.

[-]

Both_Opportunity5327@reddit

Is this why Strix Halo can keep up, when on paper when looking ay the memory bandwidth, the Macs Studios should be able to demolish it.

[-]

Sliouges@reddit

That's an astute observation. Margin is low, get rid of old stock, so they wait for the new ones where they can hype and add the apple 300% tax.

[-]

droptableadventures@reddit

Like the Mac Studio, the XDR Pro Display is actually pretty cheap for a device with the same specifications. Professional displays with a similar contrast range and colour gamut cost about double that, similar to how it'd be a lot more expensive to get that 512GB in GPUs.

Also I know that MSI display. It's not 5k, it's an ultrawide 4K display that they're incorrectly describing as 5K - and it doesn't come close to the specified brightness or colour gamut promised. Viewing angle's also terrible for an IPS display.

[-]

power97992@reddit (OP)

I think eventually the high ram prices will make macs even more expensve and decrease their supply.. Apple is not even tsmc's biggest customer anymore and their node shares are decreasing % wise

[-]

Late-Assignment8482@reddit

They have more padding simply by charging more for RAM and being a big customer. So I don't expect to see a 4k Macbook Air just because a $300 pair of laptop RAM sticks is now selling at $1200 at Best Buy.

More likely that it'll become $550 between each "tick" (32GB-64GB->128GB) rather than $400.

[-]

tiffanytrashcan@reddit

They've moved past the power grid issue.
In truly the most horrific way possible, ignoring any sane regulations and literally just strapping jet engines to generators. Muskrat specifically relying on these to turn the lights on in the new facilities.

No, it's not remotely sustainable in the long term, and with recent world events not even in the short term.

But they keep finding a way to just cover up the next big issue. The bankers would wake up if they walked into the brand new datacenter and the lights weren't on. So they make sure that doesn't happen.

The groundwork has already been laid for the next step already when they can't afford fuel. The recent executive order on AI data centers not impacting local consumer electric rates. Well, how do you (pretend to) do that?
You follow up with a new executive order of the US government handing these companies barrels of fuel. "They no longer rely on or take from the grid!" - and nobody else can afford fuel. But that wasn't his promise. It was electricity prices, which are not that heavily dependent on oil in the U.S. comparatively to coal and natural gas, locally produced sources.

[-]

NNN_Throwaway2@reddit

Yup, their plan is to make the Technate states of America and brute force their way through the issue of power and resources. Venezuela is in the bag, they've already started in Ecuador and Columbia is next. They've given up on Greenland temporarily, probably because they got sidetracked with Iran.

[-]

maxstader@reddit

Inference involves both compute for pre processing and memory bandwidth for token generation. Now with the m3u512 getting rdma the cost to load kv cache has dropped significantly, and honestly its pretty fast loading from disk on precomputed cache. Its incredibly efficient for working with large code bases, speaking from personal experience the system has aged well as MLX tools optimized over time what the m3u studio is good at.

[-]

GoofusMcGhee@reddit

Well that's OK, I can just take out the 256GB modules and put in some 512GB modules I bought and...

Oh. Right. This is 

[-]

tiffanytrashcan@reddit

They're not fast enough to use all that RAM. This is why they're supporting memory access via Thunderbolt (RDMA.) Clustering these machines makes much more sense than increasing the RAM in a single unit. (Exo)

We won't see a huge difference with M5 because part of it is still the memory bandwidth limitation. Even though the chip's faster, it can't read enough RAM quick enough if there's too much to go through. You still need another chip to handle a new 256GB chunk, even if we're moving from the need being the chip capability to the memory lanes and bandwidth.
M5 could have potentially seen a larger bandwidth increase if not for the RAMpocalypse. But the faster you want to run your RAM, the more complicated it is, needing a smallernode, etc, and the more expensive it's going to be. They decided to just take the markets increase in pricing, instead of adding an exponential increase to the cost.

[-]

droptableadventures@reddit

Clustering these machines makes much more sense than increasing the RAM in a single unit. (Exo)

That's not what it's for. RDMA over Thunderbolt is for sharing data between them more quickly than having to use TCP/IP over Ethernet.

[-]

tiffanytrashcan@reddit

Lol what? RDMA is what enabled Exo to even work. It was "day zero support" requiring macOS Tahoe Beta to even run it when first released.

RDMA over Thunderbolt is for directly accessing the RAM of another device (in the cluster.) Thunderbolt is already many more times faster than TCP/IP over (most) Ethernet.

We are sharing data here, but at a much quicker speed than even Thunderbolt traditionally provides, latency wise.

I won't get the exact terminology correct on what's shared in between layers, but EXO intelligently splits everything up, so that the majority of the communication is between the GPU and RAM on each device, and then it's closer to the end of the processing pipeline when data is shared between all of them to come to the final result. The data that needs the most bandwidth is put as close to the chip that's going to use it as possible.

[-]

droptableadventures@reddit

It's not for "sharing" the RAM between both machines i.e. plugging a 256GB machine into a 32GB machine and "borrowing" some RAM.

It's for poking stuff into the other device's memory very quickly - transferring data between both machines.

[-]

tiffanytrashcan@reddit

Exactly...
That's what I keep saying.
"You still need another chip to handle another 256GB chunk."
"EXO intelligently splits everything up, so that the majority of the communication is between the GPU and RAM on each device"
It transmits the intra-layer communication, which is much less data, but still sensitive to latency, after the majority of the heavy computation is done.

[-]

droptableadventures@reddit

It has 8 channels of RAM. You'd need to get 8 sticks in there.

Power usage would increase. Memory timings would need to be loosened, reducing memory bandwidth, due to the much longer traces, and signal integrity issues with sockets. The memory being soldered down, that close to the CPU is why it performs so well.

[-]

tarruda@reddit

They might want to trigger the FOMO psychology so that when they launch m5 ultra 1tb, localllama enthusiasts won't think twice before throwing $20k into it.

[-]

oceanbreakersftw@reddit

Wanted a 256GB m5 Max MBP.. or 512 since I think the chip can maybe handle it.. so if we wait we can maybe get 256 in MBP?

[-]

power97992@reddit (OP)

U Might have to Wait until 2027-2028 dude, new mem fabs wont be ready until 2027 and any new mem capacities will be snatched by hyperscalers and data centers

[-]

WithoutReason1729@reddit

Your post is getting popular and we just featured it on our Discord! Come check it out!

You've also been given a special flair for your contribution. We appreciate your post!

I am a bot and this action was performed automatically.

[-]

dinominant@reddit

Don't worry you can just add more ram by upgrading later to a whole new mac.

[-]

PracticlySpeaking@reddit

This has been much debated over in r/MacStudio over the last several days.

More likely related to the OpenClaw craze overlapping with Apple's transition to Mac Studio M5.

RAM has to be packaged into SoCs at the fab, so lead times are longer than systems with DIMMs. Also note that Apple got burned on the 2025 changeover — there were discontinued M2 Max and M2 Ultra still selling (and heavily discounted) for nearly a year after M3/M4 started shipping.

[-]

power97992@reddit (OP)

I read they cancelled it in the beginning of march

[-]

jacek2023@reddit

Again?