Framework and DIGITS suddenly seem underwhelming compared to the 512GB Unified Memory on the new Mac.
Posted by Common_Ad6166@reddit | LocalLLaMA | View on Reddit | 211 comments
I was holding out on purchasing a FrameWork desktop until we could see what kind of performance the DIGITS would get when it comes out in May. But now that Apple has announced the new M4 Max/ M3 Ultra Mac's with 512 GB Unified memory, the 128 GB options on the other two seem paltry in comparison.
Are we actually going to be locked into the Apple ecosystem for another decade? This can't be true!
Zeddi2892@reddit
I mean, if you really have no idea what you are doing and too much money: Yes.
You will have 512GB VRAM with ~800 GB/s bandwidth, shared for every core.
So the speed will scale significantly with model size.
There is only one use case I can imagine: You have around five 70B models you want to switch around without loading them again.
Common_Ad6166@reddit (OP)
FP16/32 show \~10% improvement across benchmarks compared to the lower quants.
I am just trying to run and fine-tune FP16 70B models, with inference of \~20t/s on atleast 16-64K context length.
Zeddi2892@reddit
Even then you might be way faster and cheaper off, building a rig of used 3090s. No one knows the stats of nVidia digits, but if they are able to provide more bandwidth over all, it still be a better deal.
Apple silicon shared RAM is just a good deal, if you use up to 128 GB Vram for running 70B models local. Anything more than that isnt a good deal anymore.
OriginalPlayerHater@reddit
all the options suck, rent by the hour for now until they have an expandable vram solution.
We don't need 8x5090's we need something like 2 of them running 500-1000 gigs of vram
eleqtriq@reddit
One 5090 with 8xs the memory bandwidth and 10x’s the memory capacity from normal would still be limited by compute.
Ansible32@reddit
How many do you actually need though, person you're responding to said two 4090s, one 5090 is kind of a nonsequitur, two 4090s is still more compute than a single 5090, changing units and going smaller doesn't clarify anything.
eleqtriq@reddit
You don’t need more memory size or band width than the GPU can compute. That’s what I’m trying to say. The guy said he needed a 5090 with 500 gigs of ram, but that’s ridiculous. A 5090’s GPU wouldn’t be able to make use of it. The GPU would be at crawling speeds at around 100-150GB.
Ansible32@reddit
We're talking about running e.g. 500GB models, and especially for MoE the behavior can be more complicated than that. Yes, one 4090 can't do much with 500GB on its own, but depending on caching behavior, adding more than one may help. The question is if you're aiming to run, say, DeepSeek R1, how many actual GPUs do you need to run it performantly, is it worthwhile to invest in DDR5 and rely on a smaller number of GPUs for the heavy lifting? It's a complicated question and there are no easy answers.
eleqtriq@reddit
Yes, there are some easy answers. We can test. Relying on CPU is not the answer unless you have monk levels of patience. I have 32 threads in my 7950 and DDR5 and it’s dog slow compared to my 4090 or A6000s.
Ansible32@reddit
Yes, obviously you need at least one GPU, the question posed is how many? If we're talking a 600GB model, especially a MoE, having 600GB of VRAM is likely overkill. This is an important question given how expensive VRAM/GPUs are.
eleqtriq@reddit
That would depend on you. Even with MoE R1, that would be a lot of swapping of weights. 2-4 experts per run. Worst case, you swap 4 * 37b parameters. Best, you keep the same. You'll still need at least 4 experts with of GPU memory + whatever memory the gating network needs. I'm calculating about 100GB of VRAM needed at Q8, just for your partial CPU scenario.
I wouldn't go for that, personally.
2CatsOnMyKeyboard@reddit
which will cost how much? Framework 2000 dollar option is fine for what is available. The non existing 2x 5090 with 512GB VRAM prices are as unknown anything else in the world that does not exist yet. I can't afford the Mac with 512GB, and with current prices I can't afford a rig of 5090s either.
xsr21@reddit
Mac Studio with M4 Max and 128GB is about 1K more on the education store with double the bandwidth. Not sure if Framework makes sense unless you really need the expandable storage.
Cergorach@reddit
The problem with the Framework solution is that it's available in Q3 2025 at thge soonest. The Apple solutions are available this Wednesday...
DerFreudster@reddit
Exactly. The legoland look isn't nearly as sexy as the Studio as well.
Bootrear@reddit
The HP Z2 G1a will likely be available much sooner than the Framework Desktop (one of the reasons I haven't ordered one). They've teased an announcement for the 18th.
Common_Ad6166@reddit (OP)
I'm just trying to run 70B models with 64-128K context length at \~20t/s. Is that too much to ask for?
Zyj@reddit
If you have too much RAM in one GPU it eventually gets slow again with very large models, even with the 1800GB/s of the DDR7 on the 5090.
henfiber@reddit
Mixture of Experts (MoE) models such as R1 need the whole model in memory, but only the active params (\~5%) are accessed, therefore you may get around 40 t/sec with 1800 GB/s.
m0thercoconut@reddit
Yeah if price is of no concern.
StoneyCalzoney@reddit
I don't think people are really doing the right price comparisons here...
If you were to go with Framework's suggested 4x128GB mainboard cluster, at a minimum you're paying ~$6.9k after getting storage, cooling, power, and an enclosure.
That gets you most of the necessary VRAM, with a large drop in inference performance due to clustering and the lower memory bandwidth. It might be 70% of the price, but you're only getting maybe 35% of the performance assuming the best case scenario where everything is running at full speed, including the links between nodes.
Adding in the edu discount to pricing just makes Apple's offerings more competitive in terms of price/performance.
T-Loy@reddit
Yeah, it's always a trade off and device make sense at certain price point while not at others. At only 128GB one framework is preferable over a 128GB Mac Studio. But fully specced the Mac pulls ahead again over a cluster of framework.
Like Apple is currently a good option at the low end, a base mac mini is a decent pc for the price, and at the top end the fully specced Mac Studio, because almost no similar configuration come close in "fast" memory. But inbetween are many alternatives, like framework.
cafedude@reddit
Yeah. I can get away with $2.5K for a Framework, but if I spent $8K for a Mac Studio my wife would kill me or she would insist that we need to spend $8K on a European river cruise so that we'd be even.
BigMagnut@reddit
This is why I'm glad I'm not married.
StoneyCalzoney@reddit
Yes, the price is certainly competitive, and I hope that ROCm does grow as a platform because NVIDIA's practical monopoly needs to be toppled.
However, people should not discount Apple solely because they have historically overpriced their products. Now that they are increasingly gaining vertical control over their products, they are increasing the price to performance ratio in such a way that very much justifies the price, especially when considering the other benefits like greater efficiency compared to x86-64 platforms.
Xandrmoro@reddit
If only I could use it without, well, mac os
StoneyCalzoney@reddit
I'll take macos/some future version of asahi linux over windows 11.
My Win10 desktop will continue on Win10 + ESU until it dies or MS force upgrades it while I'm sleeping.
MegaBytesMe@reddit
Out of interest what is wrong with Windows 11?
I know I initially missed the old start menu, however everything else has been better! At least on 24H2... Also feels nicer to use with the animations. Mica looks lovely too as a background material compared to acrylic (much easier to use in an app). Easy to disable telemetry and disable automatic updates, if you are "one of those"... Performance has not noticeably changed on any of my devices either.
Only downgrade is the start menu, which sees little usage to me as I always just search for the app I want (Windows key then type the name of the app and press enter). Also the new Widgets pane is kinda... Meh.
StoneyCalzoney@reddit
For me, it's too many poor UX changes that disrupt my workflow if I need to reconfigure or fix something with devices. The small amount of fragmentation between Win7 and Win10 menus was already enough, but in Win11 they've forced many of those options into the Settings app while hiding the older, more powerful (and useful) menus in one of many buttons or hyperlinks towards the bottom of the relevant Settings app page.
Then of course the whole Recall fiasco, the ads pre-installed, the lack of stability with feature updates, the forced reliance of built-in basic apps on the Microsoft Store (Photos, 3D Paint, Snipping Tool), to me it just feels like Windows 11 is hostile to all it's users.
I have daily driven every version from XP to 10, and I have to support 11 because of work. 11 is my least favorite because there was very little gained in terms of performance or usability for this enshittified version of Windows. The only good to come out of this version is the greater support for ARM CPUs.
Mochila-Mochila@reddit
ReactOS can't come out of alpha "soon" enough...
DerFreudster@reddit
I started with uhh, well, I'm not going to say because I don't want to age myself, but I've skipped a few before, but 11 really pissed me off. I've built two PCs with it for others, but my big desktop is going to Linux Mint and I bought a Mac Mini to try out and now I'm looking at the Studio M3 Ultra. Because I don't trust Nvidia's Unicorns will ever come home to roost.
StoneyCalzoney@reddit
Yeah IIRC a lot of people skipped ME, Vista, and 8.
I am somewhat fond of Win8/8.1, I remember installing the beta version and being impressed enough with the performance uplift from 7 that I didn't really mind the radical shift in UI.
DerFreudster@reddit
By "one of those" you mean one of those that has been in the middle of something and had the machine install updates and crash everything? Yeah.
I couldn't figure out how to completely disable telemetry, but I was working with earlier versions (dev edition) and then the builds were for others. All the corporate stuff and the way Edge was setup felt like they were trying to google us by jamming ads down our throat. Felt very slimey. Of course, Apple has it's own "Here's our privacy that is private except for those few things we collect which we promise aren't spying on you but did you know you left the garage door open?" I wish Asahi was ready for prime time....
Mochila-Mochila@reddit
That it hard 😭
(I learned the hard way to physically unplug my laptop from the mains...)
gnaarw@reddit
I would take any Linux working on an M3/4 chip any time of the day but then I might as well wait for an AMD card with 96GB VRAM...
If all your ecosystem and scripts are already Linux based and you don't want the persistent additional config of homebrew and Apple's seemingly non reply mandate on any support you don't throw cash at (and even then it's bad if it's these useless apple store employees)...
Xandrmoro@reddit
Well, I'm staying on win10 too, and thrn either they traditionlly make win12 good, or I'll have to move to some kind of linux. But no way I'm using mac, I'd rather move to debloated win11.
DorianGre@reddit
On a mac you just open a prompt window and boom, there are all the linuxy things. Everything you develop with - python, java, rust, or C, is happy to run on that mac
Xandrmoro@reddit
Yea, and you dont own any of it
Tsubajashi@reddit
same goes for windows, if we want to play by your rules.
Thebombuknow@reddit
It is worth mentioning, I think for the vast majority of people, a single 128GB Framework Desktop is probably the best choice. It's looking like the 512GB Mac Studio is going to be the price of a used car, and 4x Framework Desktops isn't a much better price. The 512GB Mac Studio is only really appealing to those who were in the market for an a100 or something.
I would personally never spend $10,000+ on a computer, but I could justify $2000 if it's a really fast computer that has enough VRAM to run larger models when I feel like it. The Framework Desktop is the closest the average consumer has been to being able to afford to run big models.
GradatimRecovery@reddit
Framework 128 costs the same as Mac Mini 128
blebo@reddit
How? Mac Mini tops out at 64 GB
Craigslist_sad@reddit
I assume (without looking into any details) that the Framework would also have significantly worse performance per watt. Watts cost money...
GriLL03@reddit
The lower memory bandwidth argument is 100% valid, and I would personally go with the Mac on the basis of that alone. 2x the price for a lot more memory bandwidth is a good trade, and if you're spending $7k you can likely afford to spend $15k.
Regarding inferencing drops in performance, I just started testing llama with distributed computing. So far adding my 3090s as backend servers for the MI50 node actually increased my t/s by a little bit on llama 70B. I'm in the middle of testing stuff, so more info to come as I discover it.
jarec707@reddit
and resale value for the Mac
StoneyCalzoney@reddit
EXO made a good breakdown for how clustering slows down inference speed for single requests.
The TLDR of it is that you lose some performance in single request scenarios (one chat session) but you reap the benefits of clustering with multi-request scenarios when multiple chat sessions are hitting the system. Clustering allows these multiple requests to be processed in parallel, so you maintain a higher total tps throughput.
GriLL03@reddit
That's a super interesting read! Thanks!
The particular test I was running just now is Llama 70B on 8xMI50 in one server (S1) and 4x3090 in the other (S2).
Running the main host on S1 and the rpc servers from llama on S2 (one for each GPU. If I run just one with all GPUs visible it doesn't allocate memory correctly for some reason), I get more tps than if I just run it on S1 only.
I now want to try using S2 as the main host, and using both S1 and S2 as backends and the main host on my daily driver dev PC (with an extra 2x3090s) and see what happens.
This will also allow me to test how the network impacts stuff as well, since S1 and S2 have 10 Gb fiber links and my PC only has a 1 Gb link (no space for the SFP+ NIC lmao). I don't really expect it to be a bottleneck, though. Running iperf3 at the same time as the inferencing didn't lead to a decrease in t/s at all.
If all goes well, I have some more add-on VRAM I can throw in.
Common_Ad6166@reddit (OP)
The only thing cheaper, of comparable performance would be to just go with a server rack with an Epyc CPU and a terabyte of RAM. But then you will be stuck with a server rack, and the noise+power consumption that brings, plus you will still need a GPU anyway to get decent prompt processing speed.
animealt46@reddit
No, especially if price is of concern. You aren't getting a 512 GPU cluster for any cheaper, especailly including cooling and power cost.
Regrets_397@reddit
$8,500 with student discount (Apple never checks this).
Not that expensive in the quantized full model league. I would like to think the people buying this are small businesses who have IP or privacy concerns or those doing research, not hobbyists and social media benchmark posters.
animealt46@reddit
Don't abuse the edu discount people. Not because being kind to Apple matters, but because this is one of the few easy discount programs designed for students left standing. If you want a deal, just wait a few months and stores will be shaving similar amounts off without having to pretend to be a student.
Useful-Skill6241@reddit
Hold on is this the full fat 512gb for this price?
Feisty_Ad_4554@reddit
Useful-Skill6241@reddit
The same spec in £2000 cheaper if I fly to the US and get it myself and fly back 😂
Useful-Skill6241@reddit
Useful-Skill6241@reddit
Is that with the student discount?
GradatimRecovery@reddit
Yes
kovnev@reddit
This gets posted daily, and I just can't comprehend the hype.
Yes, if you want to spend $10-15k on running a large LLM really slowly, on the most locked-down ecosystem to ever grace personal devices, I guess it's the dream...
QuantumUtility@reddit
You talking about Apple or Nvidia?
daZK47@reddit
I'm hyped for any and every improvements in this space so we'll see. Hopefully we'll see another arms race (including Intel) cause the shiny leather jacket thing is kind of getting played out
florinandrei@reddit
Yeah, if you were born with a silver spoon up your ass.
EnthiumZ@reddit
Where did that expression even come from? Like rich people use silver spoon to scope shit from their asses instead of just letting it flow normally?
ArgyllAtheist@reddit
It started as the more sensible "born with a silver spoon in their mouth" - so, nepo babies, who never know what it is to not just have everything they want handed to them.. Then people being people, the idiom got mixed up with "a stick up their ass".. So, someone who is both privileged and an uptight arse with it.
florinandrei@reddit
In this case, it was an intentional mixtape.
NegativeCrew6125@reddit
😆
Divniy@reddit
Funny as I remember having a silver spoon in my childhood but I'm from a typical working class family.
LatestLurkingHandle@reddit
Look up poop knife at your own risk
AppearanceHeavy6724@reddit
Imagine a silver poop knife.
DifficultyFit1895@reddit
silverwary
NancyPelosisRedCoat@reddit
No, we pay people to do that for us. You don’t have a spooner?
Common_Ad6166@reddit (OP)
Not born with it, I've just been employed at a decent salary, and live at home with 0$ rent, so I can afford to spend a months salary on it LOL.
florinandrei@reddit
I suggest you calm down and wait for the reviews and the benchmarks to come out, for ALL the devices you mentioned. And THEN make a decision.
Don't get me wrong, I am also tempted. But I would hate it to rush into an impulse buy, only to regret it later.
literum@reddit
Mac is $10k while Digits is $3k. So, they're not really comparable. There's also GPU options like the 48/96GB Chinese 4090s, upcoming RTX 6000 PRO with 96gb, or even MI350 with 288gb if you have the cash. Also you're forgetting tokens/s. Models that need 512gb also need more compute power. It's not enough to just have the required memory.
The local LLM market is just starting up, have more patience. We had nothing just a year ago. So, definitely not a decade. Give it 2-3 years and there'll be enough competition.
Cergorach@reddit
The Mac Studio M3 Ultra 512GB (80 core GPU) is $9500+ (bandwidth 819.2 GB/s)
The Mac Studio M4 Max 128GB (40 core GPU is $3500+ (bandwidth 546 GB/s)
The Nvidia DIGITS 128GB is $3000+ (bandwidth 273 GB/s) rumoured
So for 17% more money, you get probably double the output in the interference department (actually running LLMs). In the training department the DIGITS might be significantly better, or so I'm told.
We also don't know how much power each solution draws exactly, but experience has told us that Nvidia likes to guzzle power like a habitual drunk. But for the Max I can infere 140w-160w when running a a large model (depending on whether it's a MLX model or not).
The Mac Studio is also a full computer you could use for other things, with a full desktop OS and a very large software library. DIGITS probably a lot less so, more like a specialized hardware appliance.
AND people were talking about clustering the DIGITS solution, 4 of them to run the DS r1 671b model, which you can do on one 512GB M3 Ultra, faster AND cheaper.
SirStagMcprotein@reddit
Do you remember what the rationale was for why unified memory is worse for training?
jarail@reddit
Training can be done in parallel across many machines, eg 10s of thousands of GPUs. You just need the most total memory bandwidth. 4x128gb GPUs would have vastly higher total memory bandwidth than a single 512gb unified memory system. GPUs are mostly bandwidth limited while CPUs are very latency limited. Trying to get memory that does both well is an absolute waste of money for training. You want HBM in enough quantity to hold your model. You'll use high bandwidth links between GPUs to expand total available memory for larger models as they do in data centers. After that, you can can distribute training over however many systems you have available.
SirStagMcprotein@reddit
Thank you for the explanation. That was very helpful.
Cergorach@reddit
There wasn't. I only know the basics of training LLMs and have no idea where the bottlenecks are for which models using which layer. I was told this in this Reddit, by people that will probably know better then me. I wouldn't base a $10k+ buy on that information, I would wait for the benchmarks, but it's good enough to keep in mind that training vs inference might have different requirements for hardware.
Spanky2k@reddit
Another thing that people often forget is that Macs typically have decent resale value. What do you think will sell for more in 3 years time, a second hand Digits 128 or a second hand Mac Studio M4 Max?
animealt46@reddit
Resale value shouldn't be relied on. First off that's largely for laptops not desktops. Secondly, Apple has been cranking volume on new macs and running deep discounts so the used market is flooded with supply competing against very low new cost so the situation is a lot "worse" now. Thirdly, resale value is almost always determined by CPU/SoC generation and then CPU model. Extra RAM cost almost always disappears in the used market.
Serprotease@reddit
High bandwidth is good but don’t forget the prompt processing time.
An m4 max 40core process a 70b@q4 at ~80 tk/s. So probably less @q8, which the type of model you want to run with 128gb of ram.
80tk/s is slow and you will definitely feel it.
I guess we will know soon how well the m3 ultra handle deepseek. But at this kind of price, from my pov It will need to be able to run it fast enough to be actually useful and not just a proof of concept. (Can run a 671b != Can use a 671b).
There is so little we know about digits. You just know the 128gb, one price and the fact there is a Blackwell system somewhere inside.
Digits should be “available” in may. TBH, the big advantage of the MacStudio is that you can actually purchase it day one at the shown price. Digits will be a unicorn for month and scalped to hell and back.
psilent@reddit
yeah available is doing alot of work. nvidia already indicated theyre targeting researchers and select partners (read Were making like a thousand of these probably)
LevianMcBirdo@reddit
Well since you talk R1 (I assume, because of 671B). Don't forget it's MoE. It has only 32B active parameters, so it should be plenty fast (20-30t/s on these machines (probably not running a full 8q, but a 6q would be possible and give you plenty context overhead).
Serprotease@reddit
That would be great, but from what I understand, (epyc benchmark) you are more likely to be CPU/GPU bound before reaching the memory bandwidth limit.
And there is still the prompt processing timing to look at.
I'll be waiting for the benchmarks! In any case, it's nice to see potential options aside from 1200+w server grade solution.
Cergorach@reddit
True. I suspect that you'll get maybe a 5 t/s output with 671b on a M3 Ultra 512GB 80 core GPU. Is that usable? Depends on your usecase. For me, when I can use 671b for free, faster, for my hobby projects, it isn't a good option.
But If I work for a client that doesn't allow SAAS LLMs, it would be the only realistic option to use 671b for that kind of price...
How badly DIGITS is scalped depends how well it compares to the 128GB M4 Max 128GB 40 core GPU for inference. The training crowd is far, far smaller then the inference crowd.
Apple is pretty much king in the tech space for supply at day 1.
power97992@reddit
It should be around 17-25t/s with m3 ultra on MLX.... A dual M2 ultra system already gets 17t/s...
Ok_Share_1288@reddit
R1 is MoE, so it will be faster than 5tps on M3 Ultra.
iwinux@reddit
https://x.com/exolabs/status/1897360590987051041?s=46
Ok_Share_1288@reddit
Where did you got that numbers from? I get faster prompt processong for 70b@q4 with my mac mini.
Serprotease@reddit
m3 max 40core 64gb macbook pro, gguf (Not MLX optimized version.)
The m4 is about 25% faster on the GPU benchmark so I infered from this.
Not being limited by the Macbook pro form factor and with MLX quant, it's probably better.
I did not used the MLX quant in the example as they are not always disponible.
Spanky2k@reddit
I'm not sure how you could consider 80 tokens/second slow tbh. But yeah, I'm excited for these new Macs but with it being an M3 instead of an M4, I'll wait for actual benchmarks and tests before considering buying. I think it'll perform almost exactly double what an M3 Max can do, no more. It'll be unusably slow for large non MoE models but I'm keen to see how it performs with big MoE models like Deepseek. An M3 Ultra can probably handle a 32b@4bit model at about 30 tokens/second. If a big MoE model that has 32b experts can run at that kind of speed still, it'd be pretty groundbreaking. If it can only do 5 tokens/second then it's not really going to rock the boat.
Serprotease@reddit
I usually have system prompt + prompt at \~4k tokens, sometime up to 8k
So about a minute - 2 minutes before the system starts to answer. It's fine for experimentation, but can quickly be a pain when you try multiple settings.
And if you want to summarize bigger document, it's long.
Tbh, this is still usable for me, but close to the lowest acceptable speed.
I can go down to 60 tk/s pp and 5tk/s inference, below that it's only really for proof of concept and not for real application.
I am looking for a system to run 70b@q8 at 200 tk/s pp and 8\~10 tk/s inference for less that 1000 watts, so I am really looking forward for the first results of these new systems!
I'll also be curious to see how well the M series handle MoE as they seems to be more limited by cpu/gpu power/architecture than memory bandwidth.
allegedrc4@reddit
Well, it's a Mac, so I wouldn't necessarily say that's a given. Most user-hostile OS I've ever seen.
lipstickandchicken@reddit
It is. I hate that the hardware is so good. My Macbook "just works" because I've trained myself on how to navigate its weaknesses and I don't ask it to do what it can't do.
Cergorach@reddit
I've been using it as my main OS for \~3 months now (after 35+ years of MSDOS/Windows). MacOS has it's own quirks compared to Windows and Linux. MacOS integrates incredibly well within it's own ecosystem. It's just that people are used to their own preferred OS system and find anything another OS does differently a flaw, instead of it just being different.
From a normal user perspective I find MacOS leaps ahead of both Windows and Linux. From a power user perspective there are certain quirks you need to get used to with MacOS. The MacOS Terminal might be more powerful then the Windows commandline.
Don't get me wrong I still run all three, at this point probably more Linux then Windows. But I wanted a powerful small machine with a boatload of RAM (for VMs) while being extremely power efficient, the Mac Mini M4 Pro (64GB) offered that, everything else was either WAY less powerful or was guzzeling power like a drunk. I also needed a Mac as I support all three for clients as an IT contractor and with the introduction of M1 Mac 'marketshare' within multinationals has grown drastically the last couple of years and is still growing.
daZK47@reddit
I want to get into the Linux rabbithole sooner than later, do you know where door is?
Cergorach@reddit
The one to enter, or the one to exit? Haven't found the later... ;)
Linux is like a box of chocolates, you never know what you're going to get...
It really depends on what you want to use it for I really like Mint Mate, but Ubuntu is generally better supported, and on my Steam Deck it's SteamOS all the way. On the Raspberry Pi something else is running, etc. Each niche has it's own distribution.
daZK47@reddit
Great to know. I'm looking for something on the easier side but still with a lot of power and tools. I'm hoping to really dive into some local LLM models once I get my hands on the 512 M3 Studio
allegedrc4@reddit
From a Linux user I find that simple features other OSes get right (for example: multiple displays, thunderbolt dock MAT support) that Mac users have wanted for years are all solved by paying for some third party product instead of apple just listening to their users.
Not great!
Cergorach@reddit
What about simple features MacOS gets right and other OSes don't? In the last three months there have been plenty of times where a 48 old bald man went like a little girl: "Oh! Neatoh!" with features that are native in MacOS... ;)
It's not as if I haven't paid for software that did things in Windows or Linux that it didn't do in the OS or did better. Some of that software also works on MacOS (like the whole Affinity Suite) and other work technically, but worse (like WinRar).
I'm not saying it's easy to move from one OS to another, I had similar issues when I went half a year to Linux as my main OS \~20 years ago (I went back to Windows). Finding the right tools can often be a journey. Still looking for a replacement for Notepad++, will probably go for Beyond Compare and Sublime. Sure, costs money, but if it works well or better then what I had, it's not that big of a deal. I paid previously for VMware Workstation Pro, that is now free for personal use, but I prefer Parallels, also paid. Well worth it!
smith7018@reddit
Sublime Text is actually free. It just asks you to purchase it every once in awhile but you're still able to use it forever without buying a license.
That_Em@reddit
Lol
Ok_Warning2146@reddit
I think it depends on your use case. If your case is full R1 running at useful prompt processing and inference speed, then the cheapest solution is Intel Granite Rapids-AP with 12x64GB RAM at 18k.
M3 Ultra can do well for the inference part but dismal in prompt processing.
hurrdurrmeh@reddit
Can you elaborate why it’s slow at prop or processing?
Ok_Warning2146@reddit
GPU is not fast enough computationally.
The newer Intel CPUs support AMX instruction that can speed up prompt processing significantly.
hurrdurrmeh@reddit
So now CPU inference can be faster than GPU??
That is a new development.
Ok_Warning2146@reddit
Granite Rapids support the new MRDIMM 8800 RAM. So its memory bandwidth is now 844.4GB/s. That's faster than majority of GPUs.
hurrdurrmeh@reddit
I don’t think thats even been released, the cost is going to be huge.
But with 128 channels and 64BG DIMMs (assuming they come in ECC) that’s 8TB of RAM!!!
Western_Objective209@reddit
I'm extremely skeptical that a CPU with slow RAM will be anywhere near as fast as a machine that has a GPU and RAM that is like 4x faster
MasterShogo@reddit
It’s important to remember that at the price point we’re talking about here, you have to consider actual server platforms. Granite Rapids supports over 600GB/s memory with normal DDR5 and over 840GB/s with this new physical standard that I can’t remember at this second. AMD Epycs are similar. The only question is that at that price, what actually performance will the CPUs actually have? Inference is still going to be largely memory speed bound, but prompt processing is much more dependent on compute speed, but that is a specific issue with M-series SoCs.
Western_Objective209@reddit
is prompt processing actually a significant portion of the compute?
once you start adding GPUs, the cost will explode, and at that point why do you have so much RAM, why not just use the GPUs?
MasterShogo@reddit
Prompt processing is important, but exactly how much depends on the workload. For something like a chat bot with an unchanging history and incremental increases in token inputs, kv caching is going to save you tons of time and you only have to process the new prompts as they happen, and that is still very fast. But, if you have a workload where large prompts are provided and/or changed, then it will hurt badly, because it's just additional waiting time where absolutely nothing tangible is produced and you can't do anything. Interactive coding and RAG context filling are both examples of where this can happen.
On the other hand, I haven't looked up the actual compute specs on Granite Rapids. While I have no doubt it will do fine in token generation if it has enough cores, if the new instructions don't provide enough performance or if libraries don't take advantage of them, then it will be no faster than an M-series chip because the memory bandwidth is comparitively unimportant during that phase.
And as for the GPUs, I'm primarily talking about flexibility. You can always add GPUs later and spread workloads across them to increase performance. It's not ideal, but it is possible. Or, you can look at one of these crazy setups where people just put the money into used 3090s and have as many of them as possible. You aren't going to build a 500GB inference machine with 3090s (or at least you aren't going to do that sanely), but you could build a smaller one. I saw a 16x 3090 setup on Reddit the other day! It may or may not be a good idea, but it is possible. On a Mac, it isn't.
And then there's the power usage. The Mac is going to be efficient and small. All of this is kind of wacky, but if a small business or extreme hobbyist is set on experimenting with these kinds of things without going out and trying to purchase a DGX rack, all of these options are viable to a point, and they all have tradeoffs. Having some amount of capability in a very small, very quiet machine is something.
FullOf_Bad_Ideas@reddit
But it's still a cpu, which usually has less parallel compute than GPU. I feel like Intel cpu would be even slower at prompt processing then Mac M3 Ultra's gpu.
animealt46@reddit
If all you wanted was CPU, wouldn't there be cheap ways of renting that capacity? IDK I haven't tried.
Ok_Warning2146@reddit
If you rent that capacity, you shouldn't be in this sub. This is LocalLlama. ;)
Billy462@reddit
Could you explain why specifically that processor?
Ok_Warning2146@reddit
Because it has amx instructions tbat are designed for llm
Billy462@reddit
But does it have more memory bw? I thought that was the limiting factor (compared to other server like epic)
allegedrc4@reddit
No, actually! They designed this processor that people are talking about using for this purpose, with special instruction set extensions to boot, and neither Intel nor the people talking about it in this discussion thought about memory bandwidth even once. It's incredible!
Yes. It does. It would blow an Epyc out of the water.
Euchale@reddit
I could totally see someone smarter than me come up with something along the lines of "Load model into an SSD, do token gen on GPU" and suddenly we can run near infinitely large models really quickly.
MINIMAN10001@reddit
Then you're back to square one. Now you're bottlenecked by the speeds of the SSD so instead of the 1800 GB/s on a RTX 5090 you're now looking at 0.2 GB/s sustained random reads of an SSD.
Euchale@reddit
I see no reason why reading the model and doing the interference needs to happen on the same vram space. This is just how it is done currently. Thats why I said, someone smarter than me. Transfer rates can be easily overcome by doing something like raid.
danielv123@reddit
Uh what? For each token you do some math between the previous token and all your weights. So you need to read each weight once for each sequential token generated. R1 has 700GB of weights, reading that from an SSD takes 100 seconds. That's a low token rate.
For batch processing you can do multiple tokens per read operation which gets you a bit more reasonable throughput. You might even approach the speed of cpu inferencing, but nothing can make up for a 10 - 100x speed advantage.
eloquentemu@reddit
R1 is MoE with only 37B parameters needed per token. As a result, it's less slow than you think, but since it's a "random" 37B you can't really batch either.
Anyways, yeah, we already can run off SSD but it's basically unusably slow
danielv123@reddit
Yes, I suppose my numbers are more relevant for the 405b models or something. I am very conflicted about Moe because the resource requirements are so weird for local use.
Healthy-Nebula-3603@reddit
You mean 4 -12 GB/s
ultrahkr@reddit
Apple already has a published paper talking about this, on an iPhone...
Adapting it to a desktop Mac should be easier for them...
Euchale@reddit
So did Phison https://www.tomshardware.com/pc-components/cpus/phisons-new-software-uses-ssds-and-dram-to-boost-effective-memory-for-ai-training-demos-a-single-workstation-running-a-massive-70-billion-parameter-model-at-gtc-2024
Creative-Size2658@reddit
So you won't even bother comparing similar spec?
How much memory do you have for $3k in the Digits?
Mac Studio M4 Max with 128GB is $3,499
mxforest@reddit
10k is the price for 512GB and 3k is for 128GB. Don't compare.
oldschooldaw@reddit
MountainGoatAOE@reddit
This post is silly. Apples and oranges. If you have money to spare, of course you just buy the most powerful thing out there. The advantage of the others is their price/value. Apple, as always, is not the best bang for buck, but provides certain value if you have money to spare.
These kinds of spots "X is better than Y" are starting to sound more and more like paid ads.
Xyzzymoon@reddit
No, this is not correct. One of Digit's selling features is that you can link 4 of them up. When you buy 4 of them, it ends up being more expensive than the Macs.
More testing and benchmarking need to be done to confirm, but so far, Apple is actually the best value for the buck if you want 512 GB of VRAM on paper.
MountainGoatAOE@reddit
Again, that is apples and oranges. If you buy four of them, you are getting a whole lot more than just 4x the memory - you obviously get four whole machines.
Xyzzymoon@reddit
You are not making any sense at the moment.
How does "getting four whole machines", do any good when you want to load 1 model that only fits in 512GB of VRAM? If you want 4 different machines, sure, but in the given scenario, 1 machine is way better.
MountainGoatAOE@reddit
What, you were the one that suggested to compare with four machines. I am saying that's not the point to begin with. You don't buy DIGITS or Framework when you need the 512GB of VRAM. It's a different class of product.
Apple is overpriced in the sense that they ask massive markups for additional memory. Pureky looking at hardware cost they take advantage of their position. It has always been like that. So yes - if you NEED 512, then you sadly have little other choice and you'll pay a markup.
Xyzzymoon@reddit
No, this isn't about DIGITS, this is about "Apple is actually the best value for the buck if you want 512 GB of VRAM on paper." when you said "Apple, as always, is not the best bang for buck, but provides certain value if you have money to spare."
It is not always the case.
MountainGoatAOE@reddit
I don't know why you keep deliberating misreading what I'm saying so I'll try one last time.
Bang for buck (price per GB) it is not a good buy. It doesn't have competitors in its segment (apart from custom systems) but that doesn't mean it's the best value. It's like saying that a Porsche 911 is a better value than a Volkswagen Golf.
I started with apples and oranges, and it still is the case. The devices are in different segments so you can't compare the devices in terms of performance, and on top of that Apple does not provide competitive value for what you get (price/GB) if you purely look at the hardware but because of the lack of competition it's the only option in that segment.
Xyzzymoon@reddit
In that case, can you explain what you means when you said "I don't know why you keep deliberating misreading what I'm saying so I'll try one last time."Apple, as always, is not the best bang for buck, but provides certain value if you have money to spare"? That is what you said.
My conclusion is that this is not correct. Apple sometimes is the best bang for the buck. Please explain how you want me to interpret what you said if I'm understanding it incorrectly.
That is irrelevant. If you need this much VRAM. You have no alternative.
I definitely don't understand what you are trying to say. If you can't get the same thing else where, or anything similar, how is it not best bang for buck?
Ok_Warning2146@reddit
Jensen never said you can stack four. Only two at most.
calcium@reddit
People always talk about there being some Apple tax but for most workstations they’re comparable to any other company like Dell or HP. I think people hyper fixate on their stupid pairings like $1k wheels for a machine.
Individual_Aside7554@reddit
Must be nice to live in a world where $10,000 = $3,000
BumbleSlob@reddit
You’re comparing the 512Gb apple to the 128Gb digits. Apples’s 128 is $3500.
gRagib@reddit
I'm not putting money down till I see actual performance numbers.
Common_Ad6166@reddit (OP)
Yeah specifically the comparison between these, and just going with a full EPYC server rack with a bunch of RAM... And maybe a GPU to speed up prompt processing. Maybe they will release MatMul specific ASIC cards as well??? A man can dream
gRagib@reddit
I need usable tokens/s. For my uses, that's at least 30 tokens/s. If one of these desktop systems can do it for 70b models, I'm all for it.
Nanopixel369@reddit
I'm still so confused by the conversation that framework or Mac mini is even in the same league as DIGITS... Neither of them have tensor cores especially Gen 5 none of them have the new Grace CPU designed for AI inferencing neither of them can handle a petaflop of performance and who gives a shit if anything can fit up to a $200 billion parameter model if you have to wait forever for it to give you any outputs.DIGITS it's not your standard hardware that people are used to seeing so you guys are comparing something you don't even know yet acting like you guys have owned the architecture for years. Framework and Mac mini are not even in the same league as project DIGITS.... People pay $10
Common_Ad6166@reddit (OP)
I'm just trying to run and train 70B models at full FP16. With KV Cache, and long context lengths, the memory costs balloon, but the performance is not really limited at all, because the model itself will only be a quarter of the memory.
AnomalyNexus@reddit
I doubt it’ll run at any reasonable pace at ~500 size. Huge inc in size without a corresponding throughput change
Common_Ad6166@reddit (OP)
I'm just trying to run and train FP16 70B models. Only a quarter of the memory will be for the model weights. The other half will be for KV Cache and the context length will scale this as well, so for a majority of the duration of the run, I should hopefully be getting \~20t/s
Forgot_Password_Dude@reddit
Is it even comparable? How does the Mac compare to cuda?
notsoluckycharm@reddit
It doesn’t, at all. Running inference on an llm is just a fraction of what you can do “with ai”. In the stable diffusion world, no one bothers with MPS. Then there’s what we used to call “machine learning.” That still exists. lol
staatsclaas@reddit
You sound like the “achieved AI nirvana” version of me. I’d love to pick your brain sometime. Feel like I’m doing it all wrong.
hishnash@reddit
Depends on what your doing but Metal and for ML MLX is very compatible to CUDA in lots of ways.
illathon@reddit
Didn't AMD just release something like this as well?
ProfessionalOld683@reddit
I simply hope Nvidia DIGITS will support or later develop a way to cluster more than 2 units. If they can deliver us a way to cluster them. It's all good. Tensor parallelism during inference will be help with the bandwidth constraints.
If this is a product race, the first company to deliver a product that can enable us to run a trillion parameter model (Q4) with reasonable tokens/s without drawing more than a kilowatt will win.
Ok_Warning2146@reddit
DIGITS can be competitive if they make a 256GB version at 576GB/s
CryptographerKlutzy7@reddit
You can stick two of them together to get that, but now it is twice the price, so....
DifficultyFit1895@reddit
Can the Mac Studios be stuck together too?
notsoluckycharm@reddit
Thunderbolt 5 bridge is 80gb/s, that’s what you’re going to want to do. But yes, you can chain them. People have taken the Mac mini and run the lowest deep seek across 5-6 of them.
DifficultyFit1895@reddit
would it be too slow to be practical?
notsoluckycharm@reddit
It won’t match any of the commercial providers. So you have to ask yourself, do you need it to? Cline pointed locally to a 70b r1 llama was pretty unusable, a minute or so to start coming back per message. And that’s before the message history starts to add up.
But I run my own hand rolled copy of deep research and I don’t need answers in a few minutes. 30m queries are fine for me when it’ll comb through 200 sources in that time period and spend 2-3 minutes over the final context.
Really large things I’ll throw to Gemini for that 1m context window. I wrote my thing to be resumable for that kind of event.
But yeah, it’s a fun toy to play with for sure. If you want to replace a commercial provider, not even close. If you just need something like a home assistant provider, or whatever, it’s great.
DifficultyFit1895@reddit
Thanks. What I have in mind is more of a personal assistant to use in conjunction with commercial models as needed. Ideally it would be a smaller more efficient model with a bigger context window that I can use for managing personal and private research data (relatively light volume of text). It would also great if it could help coordinate interactions with the bigger expert models, knowing when to go for help and how to do it without exposing private info.
CryptographerKlutzy7@reddit
Not in the same way, the digits boxes are designed to be chained together like this, and have a special link to do so. You can only chain 2 of them though, and that is going to be pretty pricey.
I expect they will be better than the macs stuck together for running LLMs but, the macs will be able to be used for a lot more, so it depends if you have a lot of continuous LLM work if they are to be worth it or not. We do, but I can't see it being worth it for a lot of people over just buying datacenter stuff by the millions of tokens.
2TierKeir@reddit
What’s the 128 version? The studio is like 800GB/s I think.
I’m pretty convinced that memory bandwidth is 75% of what matters with local AI. I’m sure at some point you’ll run into needing more GPU horsepower, but everything I’ve seen so far is basically totally dependent on bandwidth.
Ok_Warning2146@reddit
DIGITS has CUDA and tensor core, so its prompt processing is much faster than M3 Ultra. If it has 256GB version, we can stack two together to run R1.
johnnytshi@reddit
It's not just more memory bandwidth or more memory, but also compute.
Does it make sense to have 1TB of VRam on a 5090? Can it actually compute all of that?
I think it's memory / memory bandwidth / compute ratios that matter. Only pay for what's achievable, wait for reviews.
YearnMar10@reddit
If you compare prices you see that the Mac is not that cheap :) it’s a remarkable piece of hardware for sure though. But maybe with two 4090D at 96GB and a Xeon with 512gb of ram you can achieve higher performance than the Mac can for the same price?
nborwankar@reddit
Add in the cost of power and it doesn’t look so great. The Mac idle power is negligible.
daniele_dll@reddit
All that memory is pointless for inference.
What's the point to be able to load a 200/300/400GB model for inference if the memory bandwidth is constrained and you will get to produce just a few tokens/s if you are lucky?
It doesn't apply to MoE models but the vast majority are not MoE and therefore having all that memory for inference is pointless.
Perhaps for distilling or quantizing models makes a bit more sense but will be unbareably slow and for that amount of cash you can easily rent H100/H200 GPUs for quite a while and be done with it in a day or two (or more if you want to do something you can't actually do on that hardware because would be unbareably slow).
Sudden-Lingonberry-8@reddit
DEEPSEEK
daniele_dll@reddit
Meanwhile you are free to spend your money as you prefer, I would take into account the evolution of the hardware and the models before spending 10k:
- DeepSeek is not the only model in the planet
- New models non MoE are released that are very effective
- In a few months you might have to use "old tech" because you can't run it at a reasonable speed on the Apple HW
- Online to run DeepSeek R1 - the full model - costs about 10$ 1mln tokens (or less, depending on the provider).
On the apple hardware you will most likely do about 15 t/s which means about 18 hours to produce 1 million tokens therefore to recover the cost of a 10k machine you would need to produce 15 t/s non stop for about 2 years.
Sure, you can fine tune a bit more if you can run it locally but also ... is it worth to spend 10k just to run DeepSeek? Not entirely sure. Wouldn't be better to buy different hardware that keeps the door opened for the future? :)
Also, the DeepSeek LLAMA distills in Q8 work very very well and meanwhile it will be a bit slower (as it's not MoE), you will also not need to spend 10k for it :)
AppearanceHeavy6724@reddit
You get privacy and sense of ownership. And macs have excellent resale.
No they all suck. I've tried, none below 32b were good. 32b+ were not impressive.
daniele_dll@reddit
> You get privacy and sense of ownership. And macs have excellent resale.
Anything related to GPUs have an excellent resale and something that doesn't cost 10k is easier to sell :)
Sure, you get privacy, but again you don't need 512GB of ram for that, I do care about my privacy but it's silly to spend 10k UNLESS you do not use ANY cloud service AT ALL (sorry for the upper case but I wanted to highlight the point ;))
> No they all suck. I've tried, none below 32b were good. 32b+ were not impressive.
The LLAMA distill isn't 32B it's 70B which is why I mentioned LLAMA and not Qwen which instead is 32B.
The DeepSeek R1 LLAMA distil 70B Q8 works well, it seems also to work well with tools (although I did really just a few tests).
Temporary-Size7310@reddit
Digits: • Can run native FP4 with blackwell • Has Cuda • We don't know the bandwidth at the moment • Is natively stackable • Not their first try (ie: Jetson AGX 64GB)
daZK47@reddit
CUDA is now but I don't want another Adobe Flash situation all over again
Temporary-Size7310@reddit
At the moment there is no faster inference framework than tensor-rt llm, take for a middle sized company it can deliver Llama3 70B at FP4 and you have enough room for FLUXdev generation fp4 and so on
Cuda is the main reason why they are number 1 in AI, Flashplayer was really different
xor_2@reddit
Even if CUDA falls out of fashion the DIGITS itself will become unusably slow/limited before that happens. And it is not like it only supports CUDA. That said on Nvidia hardware almost no one tests other ways because CUDA like Jensen said "just works"
davewolfs@reddit
I think that the M3 Ultra will be underwhelming as well.
jeffwadsworth@reddit
It is 10 grand vs 3K. There is no underwhelming yet.
05032-MendicantBias@reddit
It's unfortunate 512GB still is not enough to run deepseek R1. You can run perhaps Q6, more reasonably Q4.
tmvr@reddit
You can only really run up to Q4 with 512GB RAM to have space left for KV cache and context. Maybe Q5 as well, but realistically with only 820GB/s bandwidth (probably around 620-650GB/s real life) you may want to stick to the lowest usable quant anyway.
DifficultyFit1895@reddit
Does it help for the speed that it’s MoE so it’s only running one 37B at a time? If so would that allow higher quants?
tmvr@reddit
Being MoE only helps with speed as only a part is active during inference, but you still need to access the whole model so it stilll needs to be loaded. What quant is OK to use depends on the amount or RAM.
Sudden-Lingonberry-8@reddit
you need to buy 2 of them. 20k
Xyzzymoon@reddit
Still better than buying 8 DIGITS. If they can even link that many together, and if it is even in stock, you can buy that many at once.
mgr2019x@reddit
I do not get the digits and studio hype. Prompt processing will be slow. No fun for RAG usage. Some numbers: https://reddit.com/r/LocalLLaMA/comments/1he2v2n/speed_test_llama3370b_on_2xrtx3090_vs_m3max_64gb/
DifficultyFit1895@reddit
Just to be clear the linked numbers are for m3 max and m3 ultra is two of these stuck together, right? Would that be double the performance?
ThenExtension9196@reddit
The DIGITS has a connectx (2x100g) and a Blackwell gpu.
SteveRD1@reddit
If the Digits is a meh product, it will be available for $3,000.
If it's a decent product, you will be able to pick it up from scalpers from $5,000.
If it's a really good product, it will be unavailable to individuals...deep pocketed corporations will suck up all availably supply.
Welcome to Nvidia.
cellsinterlaced@reddit
I mean you’re either locked in Nvidia’s ecosystem or Apple, pick your poison and make the best of it.
a_beautiful_rhind@reddit
Yea, they kinda always were. DIGITS might have some prompt processing assistance though.
None of those options are very compelling for the price if you already have other hardware. They aren't "sell your 3090s" exciting.
RedditDiedLongAgo@reddit
Can we please stop talking about the Corpos? They don't deserve the mental real estate circle jerk session y'all spend all day pondering.
Apple/AMD don't give a fuck about the community. Certainly not us. Don't project their PR department onto yourself.
These rich kids' bread and circuses are the capitalist leash around our neck.
Secure_Reflection409@reddit
"No such thing as a bad product, just a bad price."
aikitoria@reddit
They have similar memory bandwidth. All of it is underwhelming and useless for running LLMs fast.
megadonkeyx@reddit
stuff like qwq-32b show the way forward. my single 3090 is flexing like shcwartzenneggerrr
tmvr@reddit
Yeah, I can fit the IQ4_KS version with Flash Attention and 16K context into the 24GB of my 4090 and it runs at about 33 tok/s in LM Studio which is a good speed.
zyeborm@reddit
I got a 10gb 3080 (got the 3090 later) combining the two gives guide context on the smaller models. Just sayin. (Like 40k context on 22b model)
KoalaRepulsive1831@reddit
capitalism is like war, this is how corps work, they only pull out their best cards when in need, I think ,the reason apple released the 512memory mac right now, are framework's releases, and nividia's digits, to maintain monoply , they must do this, and if we also don't think strategically, we will also have to endue their monoply ,not for the next decade but for many more years
madaradess007@reddit
you guys don't consider this half-assed hype train DIGITS will break in 1 year, while Mac could serve your family for a few generations
I know it's very counterintuitive, but Apple is the cheapest option.
zyeborm@reddit
Generations? Not many people still using their Apple 2e's or passing them on to their kids
hishnash@reddit
people with working Apple 2e's.are keeping them in good condition hoping to sell them for $$$ for third kids collage funds.
lothariusdark@reddit
Thats just the poor paradox, however its actually called. Poor people buy twice while the well off can afford to only buy once.
Its definitely not the cheapest option, you need to be able to afford the up front cost, to be able to own a device that theoretically has specs that last for a long time. And you also need to be able to afford the extreme repair prices should anything in that monolithic thing break.
Its not the cheapest, its the premium version. You get good hardware, but it costs a lot of money.
I personally dont care about either DIGITS, Ryzen AI(Framework) or Apple, but I just had to correct this, its simply not true.
AIMatrixRedPill@reddit
all my macs, and I have plenty, are bricks not possible to be maintained or upgraded. Apple never more.
LiquidGunay@reddit
Not enough FLOPS compared to DIGITS
hishnash@reddit
Depends a LOT on what your doing, if your doing inference you might need the capacity more than the FLOPS and even if your doing training if the data set or training method your using is latency sensitive with data retrieval and you exceed the DIGITS memory then the studio will be orders of magnitude faster. (not point in having lots of flops if your not using them due to being stalled waiting for memory).
FullOf_Bad_Ideas@reddit
I don't that either of them is a well rounder for diverse AI workloads.
I don't want to be stuck doing inference of MoE llm's only, I want to be able to inference and train at least image gen diffusion, video gen diffusion, llm, VLM and music Gen models. Both inference and train, not just inference. A real local AI dev platform. Options there right now is to do 3090maxxing (I opt for 3090 ti maxxing myself) or 4090maxxing. Neither framework desktop nor apple Mac really move the needle there - they can run some specific ai workloads well, but they all will fail at silly stuff like training A SDXL/Hunyuan/WAN LoRA or do inference of an LLM at 60k context.
shaolinmaru@reddit
"we" won't.
uti24@reddit
Apple had 192GB 900GB/s RAM mac ultra in 2023, so digits was "underwhelming" long before it's release.
Well, it's not that underwhelming for me, it's just price is too steep anyways.
It's good there is multiple competitors for niche though: DIGITS, Framework (or whatever), Mac, so price will go down because of this, it seems like it's what users will be inferencing locally in near future.
Calcidiol@reddit
I hope we'll see a renaissance of open systems where one can buy a powerful desktop / workstation and be able to expand its storage / RAM / peripherals widely and have the baseline compute performance we see in today's mid-range DGPUs coupled with the baseline RAM BW we see in today's mid-range DGPUs.
So like a 4070 wrt. compute & RAM BW with up to TBy level replaceable modular commodity standards compliant RAM. And still have a choice of more / less powerful CPUs and still have powerful PCIE/CXL/whatever expansion options and networking options.
AMD / Intel and all the ARM PC vendors have ignored the need in the world for desktop computers with 500-1000 GBy/s RAM BW and expansion options to 1 TBy RAM and CPU performance which could be adequate for most needs without buying a PCIE DGPU but it's unrealistic that desktops should not integrate the power that $400-$600 range DGPUs have added for several generations as effectively a required capacity.
ab2377@reddit
i hope digits and llama 4 do release before the end of times. 🤞