Apple stopped selling 512gb URAM mac studios, now the max amount is 256GB!
Posted by power97992@reddit | LocalLLaMA | View on Reddit | 119 comments
THe memory supply crisis is hitting apple too. IT is probably too expensive and/or not enough supply for them to sell 512gb ram m3 ultras. U can look at https://www.apple.com/shop/buy-mac/mac-studio to see it is no longer available.. MAybe that is why the m5 max only has a max of 128gb, i think they couldve added 256gb to it...
No-Introduction-2211@reddit
我这个网站本来还指望着买台mac studio来支持开发呢?
https://www.amazingindex.com
power97992@reddit (OP)
No idea?
No-Introduction-2211@reddit
感觉这种状态持续不了太久,明年肯定会好很多
DeepAd8888@reddit
I was actually planning on buying a 512 version. Oh well will look at other options
Sabotag3-@reddit
I think they’re redirecting it to the M5 Ultra Mac Studio for April.
eclipsegum@reddit
They are selling on ebay for $25K. They’re the only legitimate option for running large models on a desktop and in retrospect were a steal
Icy_Distribution_361@reddit
Nah those large models would still run super slow even if they fit in memory. It’s not really usable. It might become usable with M5 Max
Something-Ventured@reddit
They run fine and are perfectly usable.
I have the M3 Ultra.
idiotiesystemique@reddit
What model and tps you getting?
Something-Ventured@reddit
Im getting local Claude sonnet/opus-like speeds with deepseek, and gpt-oss, etc.
I haven’t benchmarked in a year, so I couldn’t tell you tps. You can google those, but it’s very workable.
Civil_Response3127@reddit
Yeah, but which deepseek. Large ones that push the 512gb ram do not run at that speed
Something-Ventured@reddit
There’s a lot of throttling on regular subscription plans now on Claude. So it definitely does get close.
Civil_Response3127@reddit
You say that as if they're on the same scale. Even with throttling, your M3 isn't even close to the ingest and output of Claude Code, even on Opus 4.6.
In Claude code, when the agent is doing its thing, it regularly has 5 to 10 other subagents running at the same time. All at 40 tok/s approx. When you have another one or two conversations going at the same time, this is especially different. For any model coming close to using up your 512 gig of RAM, your tokens per second is absolutely not even close to the same as a single stream of Claude Opus 4.6, let alone all of them simultaneously.
Something-Ventured@reddit
https://www.reddit.com/r/technology/comments/1s4w4gm/anthropic_tweaks_claude_usage_limits_to_manage/
Your mileage may vary.
I’ve been getting significantly slower prompt responses, having to retry frequently enough, etc. that it’s about the same.
I had to disable all my cowork tasks because of the new throttling policies.
I dropped down to the $20/m plan after evaluating that I got good enough performance locally for my workflows and github my copilot plan somehow got better Claude performance than my Claude subscriptions.
The slightly slower TPS of local, even with large models, is irrelevant when throttling and having to retry prompts on Claude happens. It’s also way less relevant when you’re actually inspecting the code changes and bounding the prompts.
The “faster” aspects of Claude don’t really matter when you have to frequently stop it from wasting tokens or doing things it shouldn’t to avoid being throttled.
Civil_Response3127@reddit
No, it isn't a question of your mileage may vary. The tokens per second just aren't even close, even with Claude's throttling that I already acknowledged. Additionally, your link does not reference throttling, that is to do with usage limits.
Something-Ventured@reddit
TPS is the wrong metric.
Useful token per workday is the metric.
My workflow isn’t generating shit code I haven’t read and accruing technical debt burning through tokens.
My workflow is to review data and create analysis and interactive data tools which can be repeated and verified.
Claude throttles and reduces token limits, both.
I prompt and tab over to my real work only to come back to some retry issue (I’m not out of tokens, the prompt was throttled or canceled). Or it went off reservation and did things I didn’t tell it to do and wasted 5-20 minutes of effort.
It got unreliable enough that local models perform well enough that I cut my $200/m subscription to $20.
TPS is an idiotic metric for functionality and LLM use.
Civil_Response3127@reddit
Tokens per second is absolutely the correct metric if you use the term throttling. You can't even compare the two, because Claude can output maybe even two orders of magnitude more code per second than your setup, which just inherently means you then become the bottleneck. It sounds like you've got the strangest setup where you don't really output much but you let it take ages to do so, and then you come back to review it all as a batch instead of dynamically through the workday, which is far better for cognitive load.
And stop trying to throw around the term "usage limits" or "throttling", because again, comparing your setup to that is just plain incorrect.
It genuinely just sounds like you're confused by the technology and giving people very out of date and confused advice.
Something-Ventured@reddit
I said I get good enough performance running large models on the studio. It is good enough, and output is close to opus/sonnet levels as measured by useful work accomplished. I also don’t run out of tokens.
But TPS is still a stupid metric because I frequently find I have to interrupt Claude when it starts doing things it shouldn’t, I get prompt failures during peak time of day from what is obviously throttling of the service by Claude, and there is still human latency time for review and actions that means more time elapses between prompts allowing even a slower local model to be good enough.
I work in actual science fields with embedded hardware / instrumentation setups collecting terabytes of real data. This is a very different experience than some front-end coder where putting vibe coded garbage and unoptimized JavaScript doesn’t matter.
Opus is not an order of magnitude faster than local, as you just said. OSS, Deepseek, GLM, etc. all run well enough locally. At this point opus is so slow and good at incorrect outputs after wasting time I only use Haiku through my GitHub copilot subscription, and use my Claude subscription to research tasks (scouring the net).
Telling others you’re not technical and don’t understand really just makes you sound like a gamer fanboy or web monkey who isn’t technical and cannot comprehend other people’s workloads. I work on scientific computing workloads where no LLM is accurate enough that you can just vibe code your way through to an actual answer. TPS is irrelevant.
Civil_Response3127@reddit
No, you said this and I have been saying that is not true.
Something-Ventured@reddit
TPS is slower. No shit Sherlock.
Useful work is about the same.
As I originally said "it's good enough" TPS is irrelevant as a metric. I get accuracy and useful work per day to be about the same. That's called good enough.
It's also why so many people use smaller models locally for coding.
Virtamancer@reddit
Gpt oss isn’t a large model, it’s not even remotely close to 512gb. The large models are >512gb and barely fit into 512gb AFTER being quantized—those would presumably run pretty damn slow.
The advantage would be having multiple small models like gpt oss or qwen3.5 in memory without having to load/unload them.
Something-Ventured@reddit
Yes and I am able to run multiple in memory and switch tasks or run full deepseek at once…
All at decent speeds
LambdasAndDuctTape@reddit
Cope all you want for buying that expensive piece of hardware and falling for the massive PR stunt but the reality is you could've funded Max for multiple years, gotten much better performance and cutting edge models, and still had money left over.
Something-Ventured@reddit
lol, dude. I run 2-3 week batch processing jobs that use 400gb of ram and it was a 90% cost reduction per YEAR vs cloud compute to use CUDA.
There's no cope. It was a ridiculous cost savings.
LLM use is just a bonus.
eclipsegum@reddit
Qwen3.5-397B 35 tok/s and likely faster with TurboQuant
Hyiazakite@reddit
PP speed 32k context?
BumbleSlob@reddit
Apple just launched the MacBook neo line which is going to sell like hotcakes. Their CEO is famously the best supply chain guy in the history of the tech world. I think it’s more likely they’re just saving chips for the refreshed M5 Ultra mac studios arriving in a month or three.
Late-Assignment8482@reddit
I would relax about the "they're never making another 512GB model!!!" theory.
This is most likely that they sold a very few of them (halo build of a halo product line) and are dropping the M5 Ultra sometime this year, so it makes sense to hold supply back for that. Unless they actually put out a press release and say "we're never selling these again" (which they did say about Mac Pros recently) quiet store changes are usually related to an upcoming product of some kind.
Apple likes to set a price when they introduce a product, and hold to it for that products lifespan. They also have long term parts contracts.
This also may be supply conservation.
They take a real hit if they have to release a 30k product because of a price hike that goes away a year later. The bad press doesn't revert. Google searches in 2029 are seeing memes about how Mac Studios start at 28k even though the price went back down in 2027.
If setting that DDR5 aside for the upcoming M5 model and losing maybe a few hundred or thousand sales gets them over a gap in RAM price lock-in, then they get press for "Apple took care of customers during RAM insanity" and the M5 Ultra drops in October, and they come strong in a time when local models are buzzy and their product is dirt cheap.
PracticlySpeaking@reddit
If you listened to the earnings call, they talked about "margin pressure" — CEO-speak for "we are going to eat some cost."
Late-Assignment8482@reddit
Yup. Tim Cook may not be flashy, but the man knows systems and supply chains and manufacturing pipelines.
PracticlySpeaking@reddit
And Apple have huge negotiating leverage — despite rumors to the contrary — (still) being one of, if not the largest customer for many suppliers.
Late-Assignment8482@reddit
And they're steady. AI Bubble pops and NVIDIA needs triage to stay in business?
Apple's still going to buy a hundred million iPhones a year.
PracticlySpeaking@reddit
Try 240M for iPhone 🤯
...along with 25M Macs.
Late-Assignment8482@reddit
Well, I was in the right order of magnitude at least.
PracticlySpeaking@reddit
If you were Tim Apple, would you put the 512GB on hand into the next-generation M5 Ultra, or the generation-behind M3 Ultra?
...or 40 iPhone 17 Pro? At 12GB each, that's more like $40,000 in revenue.
Adrian_Galilea@reddit
Are you sure that you can use that same memory on the m5?
Georgefakelastname@reddit
Yeah, phone and Mac memory aren’t even the same, to my knowledge.
PracticlySpeaking@reddit
We are not talking about stacks of inventory sitting on shelves, or DIMMs from Micro Center waiting to go into PCs.
Semiconductor fabs and packaging are massively expensive. Chips move through very quickly. The time to start making M5 is carefully planned, with simultaneous orders for the correct DRAM well in advance.
PracticlySpeaking@reddit
M4, M5 and their corresponding A-series SoCs all use LPDDR5X.
No_War_8891@reddit
640k ought to be enough for anybody
ProfessionalSpend589@reddit
640k tokens context I presume?
No_War_8891@reddit
sry was memquoting Bill Gates - forgive me I’m old
ryfromoz@reddit
The joys of config.sys and autoexec.bat
etaoin314@reddit
yeah you dont need those, go ahead and delete them. /s
pscoutou@reddit
EMS vs XMS.
No_War_8891@reddit
I made the school sysadmins life a hell by changing it on all pcs 🙃
boptom@reddit
Qemm memory unlocked
_twrecks_@reddit
I recall the original Mac only having 512k with no expansion options, Jobs said something like it would force the programmers to write tighter faster code. Everyone reveres Jobs and demonizes Gates.
droptableadventures@reddit
The original Mac had 128k because Jobs said it absolutely had to sell for $2499 at most.
The "fat Mac" with 512k actually came a bit later.
CanineAssBandit@reddit
The difference I see there is that Steve admitted it wasn't actually enough ram but did it anyway because costs and they are a hardware+software company, whereas Bill straight up didn't think it was needed despite being purely a software company (which implies lack of imagination).
infearia@reddit
Not true! I demonize them both.
ProfessionalSpend589@reddit
Yeah, I got it. I was trying to create a new joke or something :)
hellomistershifty@reddit
512gb will be offered as an option for an additional $640,000
IrisColt@reddit
I understood that reference, sigh...
dobkeratops@reddit
640gb maybe
640tb in a few decades hopefully.
pier4r@reddit
tbf a lot of SW is mostly bloated in my view, for this we need a lot.
I am not talking about LLMs though.
dobkeratops@reddit
agree regular software can be way more efficient , everyone got used to using web frameworks etc
Maleficent-Ad5999@reddit
We’d still need couple of 640gb devices to run kimi
droptableadventures@reddit
622GB at UD-Q4_K_XL, so it'd barely fit on one if you didn't have much context.
some_user_2021@reddit
DEVICE=C:\Windows\HIMEM.SYS
bernaferrari@reddit
just wait a few months for m5 or m6 ultra, not worth it for m3
Neighbor_@reddit
m6? I'm waiting for m7
bernaferrari@reddit
You can, but m7 will be a minor update, m6 is 15% faster on 30% less energy.
Neighbor_@reddit
But won't it be a year+ for the m6 studio / mini to come out?
I was actually joking on the above because like, 15% / 30% improvements are kinda baked in. That's just Moore's Law.
bernaferrari@reddit
No one knows. Moore law ended long time ago. This is the first nm reduction in a few years.
Adrian_Galilea@reddit
Are you certain of that?
jonydevidson@reddit
TSMC N2
power97992@reddit (OP)
It will have 256 or 512 gb of ram but probably not more
Yorn2@reddit
Yup, and they are selling on Ebay for over $20k.
Neighbor_@reddit
How the hell are these going for 20k? Aren't we just a few months away from an M5 Mac Studio, which would be like 10k with all the upgrades?
datbackup@reddit
Why do you assume they’d be 10K with all the upgrades? Why not assume Apple knows they can price them at $16K and they’d still sell equally well? Why not assume there will be no 512GB units because demand is so high for local inference that people will be willing to buy two 256GB units which results in higher margin for apple?
Neighbor_@reddit
Yorn2@reddit
Considering the 256GB RAM versions are even selling for a lot on Ebay auctions as well I suspect the M5s are going to be priced a lot higher than people think.
JacketHistorical2321@reddit
This is weeks old news dude
power97992@reddit (OP)
Yep, people noticed like 3 weeks Ago
ElementNumber6@reddit
So why are you posting this as though it was just discovered?
power97992@reddit (OP)
I discovered it while searching.
Flimsy_Leadership_81@reddit
goodmorning my baby!
Specialist_Golf8133@reddit
wait this is actually huge if true. the 512gb configs were basically the only consumer hardware that could run the absolute chonkers locally without completely falling apart. apple quietly killing the top end feels like they're either preparing new silicon or they realized almost nobody was buying them. which means the local llm crowd just lost their best plug-and-play option for running like 200b+ models
Pleasant-Shallot-707@reddit
Old news. Apple is emptying the pipeline because they’re ramping up production for the refresh coming on June 8th
_derpiii_@reddit
Is that date confirmed?
Pleasant-Shallot-707@reddit
Yes
droptableadventures@reddit
It's never confirmed but that's the first day of WWDC - Apple's developer event, and it has been announced that "major AI advancements" will be part of the theme.
_derpiii_@reddit
Gotcha. Thank you for the clarification.
dinerburgeryum@reddit
Eh. M3 was always overhyped given the lack of matmul cores on the GPU. Prefill time was pretty bad. Almost certainly they’re just flushing inventory while building M5 stock. Bummer if you really, really need a new one, but otherwise I’m cool with them focusing on the chips that are actually good at inference.
Both_Opportunity5327@reddit
Is this why Strix Halo can keep up, when on paper when looking ay the memory bandwidth, the Macs Studios should be able to demolish it.
Sliouges@reddit
That's an astute observation. Margin is low, get rid of old stock, so they wait for the new ones where they can hype and add the apple 300% tax.
droptableadventures@reddit
Like the Mac Studio, the XDR Pro Display is actually pretty cheap for a device with the same specifications. Professional displays with a similar contrast range and colour gamut cost about double that, similar to how it'd be a lot more expensive to get that 512GB in GPUs.
Also I know that MSI display. It's not 5k, it's an ultrawide 4K display that they're incorrectly describing as 5K - and it doesn't come close to the specified brightness or colour gamut promised. Viewing angle's also terrible for an IPS display.
power97992@reddit (OP)
I think eventually the high ram prices will make macs even more expensve and decrease their supply.. Apple is not even tsmc's biggest customer anymore and their node shares are decreasing % wise
Late-Assignment8482@reddit
They have more padding simply by charging more for RAM and being a big customer. So I don't expect to see a 4k Macbook Air just because a $300 pair of laptop RAM sticks is now selling at $1200 at Best Buy.
More likely that it'll become $550 between each "tick" (32GB-64GB->128GB) rather than $400.
tiffanytrashcan@reddit
They've moved past the power grid issue.
In truly the most horrific way possible, ignoring any sane regulations and literally just strapping jet engines to generators. Muskrat specifically relying on these to turn the lights on in the new facilities.
No, it's not remotely sustainable in the long term, and with recent world events not even in the short term.
But they keep finding a way to just cover up the next big issue. The bankers would wake up if they walked into the brand new datacenter and the lights weren't on. So they make sure that doesn't happen.
The groundwork has already been laid for the next step already when they can't afford fuel. The recent executive order on AI data centers not impacting local consumer electric rates. Well, how do you (pretend to) do that?
You follow up with a new executive order of the US government handing these companies barrels of fuel. "They no longer rely on or take from the grid!" - and nobody else can afford fuel. But that wasn't his promise. It was electricity prices, which are not that heavily dependent on oil in the U.S. comparatively to coal and natural gas, locally produced sources.
NNN_Throwaway2@reddit
Yup, their plan is to make the Technate states of America and brute force their way through the issue of power and resources. Venezuela is in the bag, they've already started in Ecuador and Columbia is next. They've given up on Greenland temporarily, probably because they got sidetracked with Iran.
maxstader@reddit
Inference involves both compute for pre processing and memory bandwidth for token generation. Now with the m3u512 getting rdma the cost to load kv cache has dropped significantly, and honestly its pretty fast loading from disk on precomputed cache. Its incredibly efficient for working with large code bases, speaking from personal experience the system has aged well as MLX tools optimized over time what the m3u studio is good at.
GoofusMcGhee@reddit
Well that's OK, I can just take out the 256GB modules and put in some 512GB modules I bought and...
Oh. Right. This is
tiffanytrashcan@reddit
They're not fast enough to use all that RAM. This is why they're supporting memory access via Thunderbolt (RDMA.) Clustering these machines makes much more sense than increasing the RAM in a single unit. (Exo)
We won't see a huge difference with M5 because part of it is still the memory bandwidth limitation. Even though the chip's faster, it can't read enough RAM quick enough if there's too much to go through. You still need another chip to handle a new 256GB chunk, even if we're moving from the need being the chip capability to the memory lanes and bandwidth.
M5 could have potentially seen a larger bandwidth increase if not for the RAMpocalypse. But the faster you want to run your RAM, the more complicated it is, needing a smallernode, etc, and the more expensive it's going to be. They decided to just take the markets increase in pricing, instead of adding an exponential increase to the cost.
droptableadventures@reddit
That's not what it's for. RDMA over Thunderbolt is for sharing data between them more quickly than having to use TCP/IP over Ethernet.
tiffanytrashcan@reddit
Lol what? RDMA is what enabled Exo to even work. It was "day zero support" requiring macOS Tahoe Beta to even run it when first released.
RDMA over Thunderbolt is for directly accessing the RAM of another device (in the cluster.) Thunderbolt is already many more times faster than TCP/IP over (most) Ethernet.
We are sharing data here, but at a much quicker speed than even Thunderbolt traditionally provides, latency wise.
I won't get the exact terminology correct on what's shared in between layers, but EXO intelligently splits everything up, so that the majority of the communication is between the GPU and RAM on each device, and then it's closer to the end of the processing pipeline when data is shared between all of them to come to the final result. The data that needs the most bandwidth is put as close to the chip that's going to use it as possible.
droptableadventures@reddit
It's not for "sharing" the RAM between both machines i.e. plugging a 256GB machine into a 32GB machine and "borrowing" some RAM.
It's for poking stuff into the other device's memory very quickly - transferring data between both machines.
tiffanytrashcan@reddit
Exactly...
That's what I keep saying.
"You still need another chip to handle another 256GB chunk."
"EXO intelligently splits everything up, so that the majority of the communication is between the GPU and RAM on each device"
It transmits the intra-layer communication, which is much less data, but still sensitive to latency, after the majority of the heavy computation is done.
droptableadventures@reddit
It has 8 channels of RAM. You'd need to get 8 sticks in there.
Power usage would increase. Memory timings would need to be loosened, reducing memory bandwidth, due to the much longer traces, and signal integrity issues with sockets. The memory being soldered down, that close to the CPU is why it performs so well.
tarruda@reddit
They might want to trigger the FOMO psychology so that when they launch m5 ultra 1tb, localllama enthusiasts won't think twice before throwing $20k into it.
oceanbreakersftw@reddit
Wanted a 256GB m5 Max MBP.. or 512 since I think the chip can maybe handle it.. so if we wait we can maybe get 256 in MBP?
power97992@reddit (OP)
U Might have to Wait until 2027-2028 dude, new mem fabs wont be ready until 2027 and any new mem capacities will be snatched by hyperscalers and data centers
WithoutReason1729@reddit
Your post is getting popular and we just featured it on our Discord! Come check it out!
You've also been given a special flair for your contribution. We appreciate your post!
I am a bot and this action was performed automatically.
dinominant@reddit
Don't worry you can just add more ram by upgrading later to a whole new mac.
PracticlySpeaking@reddit
This has been much debated over in r/MacStudio over the last several days.
More likely related to the OpenClaw craze overlapping with Apple's transition to Mac Studio M5.
RAM has to be packaged into SoCs at the fab, so lead times are longer than systems with DIMMs. Also note that Apple got burned on the 2025 changeover — there were discontinued M2 Max and M2 Ultra still selling (and heavily discounted) for nearly a year after M3/M4 started shipping.
Ill-Turnip-6611@reddit
they have released m3 half a year after m2s so it was kinda expected by them probably
Ruin-Capable@reddit
Not that heavily discounted. I would have definitely snapped up a 192GB M2 Ultra if it had come down to something like $2000.
PracticlySpeaking@reddit
The 192GB was always a BTO option so it was never in the retail channel inventory, and never discounted like the regular SKUs.
The 'regular' ones were $899 for a 32GB M2 Max (originally $1999) or $2100 for the 64GB M2 Ultra (orig $4999).
rorowhat@reddit
Lol 😆
LeRobber@reddit
OH NO
CanadianPropagandist@reddit
Welcome to the future where the new model is a more expensive downgrade.
xlltt@reddit
thanks internet explorer
fallingdowndizzyvr@reddit
No. It's because of Turboquant. With that you simply don't need 512GB.
ratocx@reddit
I suspect they may be needing the chips for the M5 Ultra, and are slowly cutting back supplies to the M3.
Technical-Earth-3254@reddit
Didn't they already cancel it like a month ago...
positivitittie@reddit
Yes. This was announced a while back and you haven’t been able to buy it with 512 for some time.
power97992@reddit (OP)
I read they cancelled it in the beginning of march
jacek2023@reddit
Again?