Mac Mini M4 pro (12/16) 48gb vs Mac Studio M4 Max 36gb (14/32). Which one would you choose assuming that you will use <=32b 4-bit quantized (mlx) LLMs with max. 16K context size. According to my experiments, let's say for QwQ, one will output approximately 11-13t/s, while the other either 17-19 or 22-24t/s
With the memory bandwidth issue I'm not sure if this M4 Max entry model of mac studio has a memory bandwidth of 410 or 546.
https://preview.redd.it/agq4au1sqvme1.jpeg?width=1290&format=pjpg&auto=webp&s=3b02abc558a7fe519500d1303b37fac24f7992ff
"Testing conducted by Apple in January and February 2025 using preproduction Mac Studio systems with Apple M3 Ultra, 32-core CPU, 80-core GPU, and 512GB of RAM, production Mac Studio systems with Apple M2 Ultra, 24-core CPU, 76-core GPU, and 192GB of RAM, and production Mac Studio systems with Apple M1 Ultra, 20-core CPU, 64-core GPU, and 128GB of RAM, each configured with 8TB SSD. LM Studio v0.3.9 tested by measuring token rate using a 174.63GB model. Mac Studio systems tested with an attached 5K display. Performance tests are conducted using specific computer systems and reflect the approximate performance of Mac Studio."
Don't forget that without setting `iogpu.wired_limit_mb` the M2 Ultra only has about 144GB default meaning it doesn't fully run a model of 174GB on GPU, but rather uses CPU for the rest even if it doesn't have to use swap like the M1 Ultra with 128GB. These results are skewed wait for reviews...
I can't fault them. Everyone is doing it. At least Apple compares against itself. I disliked AMD marketing comparing Strix Halo to Nvidia GPUs even more.
Also it works. Screenshots like this are always shared massively on social media and news pages. Besides some nerds noone is gonna bother to fact check things and if enough people see it some will believe it. Probably also has to do with investors, same thing applies there.
Yep, and some people preorder a $10k computer because of it... I'll wait for the reviews and the independent benchmarks with details about how they tested.
Totally nailed it you. If they test with a 80GB model it will be a no different from M2 Ultra. Why are these idiots comparing memory overflow with within memory cases? As if we want to test the usability of higher RAM.
> If they test with a 80GB model it will be a no different from M2 Ultra.
I wouldn't say that. Since the M2 Ultra is faster than the M1 Ultra even though they have the same memory bandwidth. Until now, there's more memory bandwidth than the M1 can use. Time will tell if it's the same with the M2. So the M3 can be faster.
There's no reason to not set it there even if your general use case isn't inference. It's not like on an AMD system where it's a hard limit. It's not reserved on a Mac. That just sets the limit that the GPU can use if it needs to. If it doesn't, that memory is available for the CPU to use for anything. On a Mac, it's dynamic. It's not static like it is on an AMD system.
well the one M2 Ultra did 14 tokens with 1.58bit dynamic quant or 2 Ultra’s with EXO did 4 bit also at around 14 tokens … so if this holds true of 2x between M2 and M3 then brrrrr 30 tokens/s are in reach 🤯
They have less time until they can be shoved down in the mines. A child can reasonably be utilized in mining operations once they hit the age of 8, so if you take a first-born at 6 years old vs a second-born at 4, it's an extra 2 years before Tim Apple can see an increase to his coal mining investment.
First borns also tend to be more compliant than subsequent children. The middle children are especially difficult to manage, often wanting higher portions of food, and slacking on the job to "play with friends." Apple has found that second borns cost an average of 18% more on disciplinary actions.
Overall, first borns just make more financial sense.
Completely false, this is not the real reason.
The first born is first in line for succession so he will inherit it and they could sell it again after 20.. 50 years.
Disagree strongly.
Having the line of succession is a "nice to have," but the idea that it's the primary motivator is a complete fake news conspiracy theory.
You see, the morality rate is 86% by the time the mine worker reaches the age of 12, and 94% by the time the mine worker reaches 18; so inheritance usually isn't collected.
Add to this that the family selling the firstborn is doing this because they are poor (and ugly, but that's besides the point), and the 6% inheritance collection isn't the primary motivator of Tim Apple.
It is an important aspect, just not the primary motivator. Tim is honest when he says "I want to send your rat children down into the mines! You filthy ugly beasts. Buy my Apples bitches!"
Rumors from where? My guess is that Nvidia hasn't communicated the bandwidth because they wanted to see what they could get away with. Now that AMD and Apple are releasing directly competing products, Nvidia will feel more pressure to offer more bandwidth.
It is DOA for those who want just to use models. But not for anyone who is doing training as it has full CUDA support in such incredibly small form factor.
Which framework supports training with ARM CPU like what GH200 has?
Compute wise, it's gonna be at single 3090 level. That's not as powerful as you might think.
lower compute than single 3090? I think it should be around equal. It's almost 1000 FP4 sparse TOPS, once you convert it to real FP16 non-sparse you get 125 FP16 TFLOPS.
3090 has 142 FP16 TFLOPS as per [GA102 Whitepaper](https://www.nvidia.com/content/PDF/nvidia-ampere-ga-102-gpu-architecture-whitepaper-v2.pdf)
that's 3090-level compute I mentioned earlier. It's a bit lower, so your statement is true, and who knows if it will throttle, but it's very similar.
I think they have it listed as just 1 PFlop in specs.
>AI Performance 1 PFLOP FP4
https://www.nvidia.com/en-us/project-digits/
probably not more than a few percent off.
On their marketing slide it's also without the "up to" thingy.
It absolutely says "up to". From your link.
"Experience **up to** 1 petaflop of AI performance at FP4 precision with the Grace Blackwell architecture."
For sure. In some usecases, you want more VRAM. Sometimes you want more compute. I've been in both. I hope DIGITS will be good, but I think I'll be sticking with normal GPUs. Or if I make a switch, it will be to PCI-E NPUs. Something like what Tenstorrent is doing.
I am not the most qualified person to answer that, but I believe almost any that comes in form of source files that you can compile for any platform that has recent enough implementation of C++.
Right tool for the job. I think it's great news for everyone if that's true.
If DIGITS is worse at inference then the new Mac stuff (even the M4 Pro has more bandwidth), people can buy Macs, better availability then Nvidia stuff anyway.
For the folks doing training that means there is less of a run on the DIGITS product and they might actually get it at a normal price...
Do we know anything about the training speed on DIGITS? I haven't seen any benchmarks but I remember that the expectation was that it would be slower than 5090.
POC’ing on rented GPUs isn’t that bad either, I regularly rent out 4x4090 machines for about a $1.20 an hour.
I do my experiments locally on my MBP M2 or on my gaming rig (my precious) that has a 3070 and then POC in the cloud, usually no more than 12 hours for a test train.
(Then for the full training I dip into the 2xH200 $7-$8 an hour machines)
How are you running Microsoft Office on it? How are you running Davinci resolve? How are you running the vast library of software on both Windows and Mac OS that people use for general computing?
Intel offerings are 8 channel, AMD Genoa drops to 4800MHz if you use 12 channels. 80% of "DDR5 compatible servers" are out of the question to start with.
If you want to have 12channels on DDR5-6000MHz you need to use AMD Turin. Single CCD read memory bandwidth on Turin is 106GB/s
https://preview.redd.it/bdaahicwawme1.png?width=1055&format=png&auto=webp&s=96c31ad8f461c3c3e85b53eed99d7cca14c5469a
You need 5x CCD to go above 500GB/s. Cheapest one that has that is EPYC 9355P, it costs $2998. :)
So there you go with "cost a lot less too".
Of course you can!
Now instead of $2998 for CPU you need to get two 9275F, two of these cost $7k.
If you use ktransformers (no-brainer for CPU inference) you also need to load weights twice, therefore instead of 512GB RAM you'll need 1TB.
Go ahead! :)
Because thats how ktransformers work.
"copy model into RAM *twice* for big dual socket systems (as cross NUMA nodes is bottleneck)"
[https://github.com/ubergarm/r1-ktransformers-guide/blob/main/README.md](https://github.com/ubergarm/r1-ktransformers-guide/blob/main/README.md)
> # ONLY IF you have Intel dual socket and >1TB RAM to hold 2x copies of entire model in RAM (one copy per socket)
# Dual socket AMD EPYC NPS0 probably makes this not needed?
# $ export USE_NUMA=1
Also this is an MoE model not a dense model. You shouldn't need to load it twice. Even if you had to manually designate experts surely you could split the model instead of loading it twice.
Do you have couple of minutes?
[https://github.com/ggml-org/llama.cpp/discussions/11733](https://github.com/ggml-org/llama.cpp/discussions/11733)
Go here and teach them. One person gets 102.2% performance where other has better result "only 105% compared to a single CPU benchmark run".
> "cost a lot less too".
About 2/3 the price of the $10k Mac m4 - or at least was. I will admit there now seems to be a big shortage of server memory. The Mac will be a bit faster, but you will get ecosystem locked, so it is kind of swings and roundabouts.
That $10k Mac Studio has **M3 Ultra** chip that does more than **800GB/s**.
It's not "a bit faster", Mac is **40% faster**. This is in best case scenario for server, as Mac has beefy GPU that will massively speed up prompt eval, effectively making responses like **2x faster** if you have longer context.
All that while eating like 1/4 of the power and sitting in random place on your desk barely making any sound.
I don't know why you underestimate the Mac so much, table has flipped and now Apple is the value king for performance across a lot of workloads.
I'm not commenting just about high end, this goes all way to bottom end, where on PC you're 2 generations behind on CPU (Ryzen 5 5500), 2 generation behind on GPU (RTX3050 6GB) and that 7nm+8nm+DDR4 combo is supposed to compete with 3nm M4 Mac Mini that costs $529 right now at Amazon.
Lots of moving parts need to get right to get this kind of speed. More than dual-channel is rare in consumer world at consumer prices. Even with quad channel and higher, silly things like some internal CPU die specs on AMD CPUs matter and drop your bandwidth beyond what you should get. Also AMD cpus seem to be getting less bandwidth out of the theoretical maximum for some reason. Reasonably priced DIY computers have up to around 200gb/s bandwidth, beyond that costs are similar to what you'd be getting with Macs/ Digits.
It depends for what you want it.
If it's for being your DeepSeek R1/R2 backend and make it work and produce income, it can be totally justifiable economically regardless if will become obsolete in a few years. That's why people keep buying work machines and computers.
But if it's just for fun, jewels and the mac with m4 max are just a matter of taste.
Produce income? Unless you're talking about a programmer using it for work, I can't imagine what that'd be. And even then, it'd be so glacially slow compared to API, I just can't see it.
If you were trying to run it to serve an actually service to customers, you're not going to get the studio IMO... so this purchase comes down to interest in LLMs and if you can justify using the mac for something else also.
It's gonna be insanely slow compared to online services, and extremely cost ineffective. This is for if you're doing something you absolutely don't want sent to any outsider or as a hobby.
Still cool it exists, but anyone with space for a server and this kind of money might be best served to go that route with older GPUs or GPUs mixed with RAM- at least to my understanding.
I doubt DeepSeek is going to go the Western route of just adding moar layers and parameters.
Maybe they do, maybe they don't. What I expect is they will find better algorithms and optimizations to run rationalizing multimodal models, probably with *less" parameters or execution overhead.
I hope that some streamer or another shows us what running a larger model looks like on this machine. $10k for a q5\_K\_M of Deepseek R1 may not be, from my perspective, not a particularly terrible deal as long as it runs at any form of an acceptable speed.
Oh, it went the same as all the reviews out there. Basically 18t/s on deepseek r1. One thing I was impressed with that I didn't see mentioned is the speed of the nvme storage. Deepseek was loading at over 10GB/sec and loads fairly quickly, which surprised me.
I’m actually going to return it. The 900GB bw is nice, but the RTX 6000 pro with 96gb ddr7 is more what I’m looking for. The ability to run smaller denser models at speed, rather than r1 at 18tk/sec.
Yeah I think R1 is probably the best one to test; also try it with big context & different quants. If it gives a decent speed with big context and some decent quants, I bet people will be really interested.
Also try some big non MoEs, maybe the Llama's just to see how they perform although I assume a dense 405B will be extremely slow
I will be messaging you in 7 days on [**2025-03-12 15:59:51 UTC**](http://www.wolframalpha.com/input/?i=2025-03-12%2015:59:51%20UTC%20To%20Local%20Time) to remind you of [**this link**](https://www.reddit.com/r/LocalLLaMA/comments/1j43us5/apple_releases_new_mac_studio_with_m4_max_and_m3/mg5xyuw/?context=3)
[**CLICK THIS LINK**](https://www.reddit.com/message/compose/?to=RemindMeBot&subject=Reminder&message=%5Bhttps%3A%2F%2Fwww.reddit.com%2Fr%2FLocalLLaMA%2Fcomments%2F1j43us5%2Fapple_releases_new_mac_studio_with_m4_max_and_m3%2Fmg5xyuw%2F%5D%0A%0ARemindMe%21%202025-03-12%2015%3A59%3A51%20UTC) to send a PM to also be reminded and to reduce spam.
^(Parent commenter can ) [^(delete this message to hide from others.)](https://www.reddit.com/message/compose/?to=RemindMeBot&subject=Delete%20Comment&message=Delete%21%201j43us5)
*****
|[^(Info)](https://www.reddit.com/r/RemindMeBot/comments/e1bko7/remindmebot_info_v21/)|[^(Custom)](https://www.reddit.com/message/compose/?to=RemindMeBot&subject=Reminder&message=%5BLink%20or%20message%20inside%20square%20brackets%5D%0A%0ARemindMe%21%20Time%20period%20here)|[^(Your Reminders)](https://www.reddit.com/message/compose/?to=RemindMeBot&subject=List%20Of%20Reminders&message=MyReminders%21)|[^(Feedback)](https://www.reddit.com/message/compose/?to=Watchful1&subject=RemindMeBot%20Feedback)|
|-|-|-|-|
Ran the 4bit MLX deepseek r1. Long story short, 18t/s like everyone else found out. But the longer the context, the longer the TTFT. That prompt processing is slow. How can I get an exact benchmark for that besides arbitrary contexts?
What config did you get? I'm wondering if the +$1500 for more cores is worth it. Otherwise I will go with the 256gb memory and 4tb storage (which hurts, but I think I'd eventually need that much storage.
Remember you can get \~$600 off for a student discount if you qualify
Two m2 ultras maxed out runs q4 at 17 tokens/s , expect it be less than it for q5 or q4 on one mac.. Maybe 8-10 tokens/s due to less memory bandwidth but faster interconnect and higher flops
GPU compute is 2x faster than M2 Ultra and 2.6x faster than M1 Ultra per the press release. I also have doubt on this. But we just have to wait for more tests to confirm.
with LLMs it's really just about that memory speed. losing 20% compared to the m4 ultra that it was supposed to be was a big letdown when I saw the news.
> with LLMs it's really just about that memory speed.
That's completely not true. If that were the case then a RX580 would be competitive with a 4060. It's not.
It's only about memory speed if you have enough compute to use it. On Mac Ultras that hasn't been the case. They have more memory bandwidth than they have compute to use it. That's why the M2 Ultra is faster than the M1 Ultra even though they have the same memory bandwidth. There's no reason to believe that the M2 Ultra is using all available memory bandwidth. If not, the the M3 Ultra will be faster.
No it won’t do acceptable speed. You need GPU processing power also, on Macs bigger the model slower prompt processing, and can’t handle big context window. So pointless for local usage.
Yes, yes I know you learnt how to use division)) And now look here [https://github.com/ggml-org/llama.cpp/discussions/4167](https://github.com/ggml-org/llama.cpp/discussions/4167)
It's MoE -- isn't it just a matter of being able to keep R1 in vram and then it runs whichever 36B model relatively quickly or am I missing something for decent speeds?
Someone ran R1 on 2x 192gb macs at about 17 tokens/second at a decent quant, so yeah, should be possible to get good usable speed with one of these 512gb rigs.
And it's important to make sure they try it with large text. One thing is when you ask about QStarberries and another - if you want to work with code or text summaries.
Agreed. With that said, I do want to make a post not long from now discussing this in a bit more detail. I've always been hard on Macs for the prompt processing speed; I don't mind waiting, but other people do, and if you look at my profile I've made sure to pin a post showing the real numbers of what large context looks like on an M2 Ultra.
With that said, I decided to test out ChatGPT's Deep Research by asking it to find me the ms per token numbers of inferencing 70b models on an a6000 (not the ada), and interestingly, it came back with results showing several posts putting the inference around 5ms per token in prompt processing.
I recently got my hands on the more powerful M2 Ultra, the 76 core GPU version, and it processes prompts on Qwen2.5 72b at 10ms per token. It's 2x slower on the Mac,but that's not as bad as what I think a lot of folks were imagining. And with speculative decoding it was a much smaller gap for prompt writing, so i want to try to do a bit more research and get a conversation going about just how big of a difference there is in response times on BIG models at big contexts between certain CUDA cards and a Mac.
https://huggingface.co/unsloth/DeepSeek-R1-GGUF/discussions/37
IQ3_M/IQ4_XS is all you need for V3/R1
I believe that a 3k ram server with a 3090 with ktransformers would equally, around 13-15 tok/s. I may be wrong.
This thing is using LPDDR5 (not X) 6400. Which street price is $1.8 per GB lets say $2per GB on (12/16/18GB modules)
So $921-$1024 for 512GB LPDDR5 6400.
Apple is selling $5000 the 256GB.
Can I ask a stupid question, why aren't there others offerng that kind of RAM for that price point? Do you suspect there will be some going forward?
Just no demand until now? Also does this apply to VRAM? I'm a bit of a tech novice and trying to understand, for these LLM's we want VRAM right?
Yes. Because there wasn't much demand up to now. And corporations are kinda slow to make decisions always been minimum 6-12 months behind. Also some manufacturers will be extremely reluctant to do that, like AMD & NVIDIA because it would cut from their professional and accelerators markets.
>Apple is selling $5000 the 256GB.
The CPU+GPU aren't $5.6k - $5000 = $600.
512GB ECC RAM @6000Mhz is $1.7k on newegg: https://www.newegg.com/p/1X5-0009-00A03
That's the price you pay to have ~~a cold soulless robot put it in and solder it in place~~ an Apple engineering expert handcraft and personally sign the ram stick in a process that takes days and the uttermost love and then carefully place it in the slot.
I wonder if we can get our hands on the 512GB BIOS and the PCB is the same with the cheapest version (64/32GB), if we can replace the LPDDR5 6400 modules with 512GB ones. It would cost less than $1000 to buy 512GB worth of modules, replace the cheapest version and flash the bios 🤔
That will buy more than a dozen 3090s, which would run rings around the mac. Like order-of-magnitude faster.
512GB in a "unified memory" machine like this with laughable GPU cores is objectively pointless. Even with 123b models at moderate bpw you're only looking at about 96GB memory needed, and the mac would already be horrifically bandwidth and compute bound at that point. You load up a model that actually needs 512GB of memory and that mac will be lucky to produce more than a dozen tokens *per minute*.
You wouldn't buy 12 3090s. You'd buy a reasonable number like 4. The point here is that it's factually impossible to actually take advantage of the "512GB" of memory in the mac. It's too slow in several metrics to run models that large at anything approaching usable speeds.
Very good point. It's likely that we'll get cheaper local options in the coming years (months 🙏), but given that most start ups have a small size like this and budget could probably accommodate one local LLM this way.
https://preview.redd.it/es871ccxsvme1.jpeg?width=1284&format=pjpg&auto=webp&s=d73170646c8281250a3c4219264efc2faad8d9d5
The wording here certainly aims to suggest that
Really curious in the performance for Diffusion models. Stable Diffusion is running much better than I thought it would be on my 24gb mac mini.. 512GB sounds… tasty
If I'm not mistaken, diffusion models are compute bound, so as long as the diffusion model fits in the RAM/VRAM (most image diffusion models fit in 24 gb of RAM), you shouldn't get faster generation if it's the same exact GPU.
Is it tho? For $10k you can buy a proper 12-channel DDR5 system with similar memory BW, expandability (i.e. an nvidia card for prompt processing, more than 512GB RAM), and far more CPU compute power. -or- you can just rent $10k of actual cloud on a proper hopper, blackwell, etc. system and get orders of magnitude the throughput.
I mean it's priced competitively to that once you factor in the apple tax, but it's not exactly a game changer in that price range.
That's theoretical though. The more Kits you have the high the chance that they will run at lower clocks. I will be surprised if 12 modules result in it barely managing 5000-5200.
Well, DDR5-6000 past 32GB are still pretty rare. There's Kingston https://www.kingston.com/unitedkingdom/de/memory/search/?partid=KVR64A52BD8-64
but i'm not sure if UDIMMs are officially supported
It's not a gaming PC. if you are buying a workstation or server class CPU you just look at the HQL for that cpu + motherboard and buy one of the samsung or hynix part numbers they actually tested in that config. Everything else is ymmv.
It’s tough because you really have to watch NUMA parameters at that point. Ktransformers makes shadow copies of critical matrices at each NUMA node to prevent this, but that kind of tuning is not generally available for all models.
> you can just rent $10k of actual cloud on a proper hopper, blackwell, etc. system and get orders of magnitude the throughput.
sir this is /r/localllama
Apple's "Ultra" chips can be thought of as two "Max" chips glued together. So, an M3 Ultra is like having two M3 Max chips working together. Unless the new M4 Max was somehow magically twice as powerful as the M3 Max, the older M3 Ultra (which again is two M3 Max chips) will still be faster.
The big mystery is why Apple didn't just release an M4 Ultra.
I don’t know that I keep track of T/s trainings I more care about time per iteration. We do Lora MLX fine tuning and get the impact we want. I don’t get the obsession with full sized models and training I run a SaaS product that wouldn’t make sense without quantization in both training and inference.
Have you experimented with full fine-tuning and found you get similar results to LoRA?
For my fine tuning use cases (coding) I’ve found a significant difference in performance when doing a full fine-tune rather than going the LoRA/QLoRA route.
Also, I don’t know what size models you’re fine tuning, but the biggest one I use (commercially) is a 32b parameter model and the various serverless services out there make it pretty reasonable to fine tune and serve inference (in production) for my customers.
I’m doing a 70b and I’m not finetuning code I’m finetuning language style, format and word choice preference which probably makes a big difference. I have not tried this approach vs full fine tune.
What kind of models are you finetuning? T/s in training is just time per iteration / batch size / sample length. Same thing in the end, but it's more fundamental since you can change sample length in your dataset and batch size, so time per iteration isn't really meaningful, unless we're talking about diffusion image models.
Isn't memory bandwidth becoming the limiting factor here rather than memory size?
The M3 Ultra has a memory Bandwidth of 800GB/s. Local R1 in Q4 is about 400GB.
Wouldn't that make for a terrible experience at roughly 2 tokens per second?
Is that good value for money at a minimum $9,499.00?
MLX has solid KV cache quant options to boot. 6-8 feels near lossless. I’m not familiar enough with their backing algos to recommend 4-bit yet but at higher quants it’s great.
405B monolithic was always hubristic. Silly that we even considered it for hosted inference. MoE was in the wild when it dropped. Just Meta being silly and throwing compute at problems instead of brains.
True! Given the rumors that LLAMA team scrambled after R1 release, I think MoE is the way to go. Specially when thinking tokens need much higher tps to be usable.
I will note that MoEs process prompts a little differently than the active param size would imply, and you definitely feel it on Mac. I have an M2 Ultra and one of my favorite models used to be WizardLM2 8x22b. The prompt processing time was definitely longer than what I'd expect a 40 something b model to process at; it felt like it was closer to a 70b in prompt processing speed, and the full size of it was around 141b if I remember right.
Once it started writing, things sped up a lot.
It's still available, just not from the original repo. It was dropped under open source license, some folks forked the repo while it was up, and those repositories continued to exist and gguf kept going up.
You could still find it on huggingface if you were so inclined, but otherwise there wasn't a lot of buzz because without the official repo up, not many benchmarks wanted to run the numbers. Eventually, by the time they did, new models had come out that beat it pretty easily, so it wasn't worth the chatter anymore.
I do still have it, but I haven't done a hard benchmark of real numbers to compare. However, as much as I've used both, I can tell you that I feel that knowledge wise and coherence wise Qwen is better.
From my experience:
* Wizard 8x22b was absolute magic in terms of coding ability for its time, but it's been a while since then; Qwen2.5 32b Coder is better.
* Wizard sounded amazing in terms of speech quality and general understanding; it was exceptionally clever in terms of contextual reading between the lines. If you gave it requirements, it did a great job of really digging in to find what you actually wanted. It beats Qwen2.5 72b for me in that regard
* Qwen2.5 72b is far better at RAG/summarization for me. Wizard hallucinated more than I liked with in-context learning.
Just make sure you pay it forward.
I’ve given similar advice over the years and people are usually too scared to try. This is just how Apple works.
My work machine has 64gb of RAM but the way it’s looking I won’t ever be able to afford that much in a personal machine, much less something like 512gb.
Ok, dumb newbie question here. The M3 Ultra will be enough to run the 671b Deepseek? Also, I work with bioinformatics and never used Mac, it is hard to use it with Ubuntu? Almost all my pipelines are built for Linux.
It really can’t be understated that we now have access to 256GB of unified ram at 800GB/s and you don’t need to have an electrician fix your house up with 240V drops.
Sorry can you explain this to me. Is this regular RAM? I thought for AI applications we need VRAM.
Is the GPU able to access to the "regular" ram because it's "unified" so we're all good there?
Yes, "unified" in this case means the GPU and CPU have equal access to the same RAM at the same speed (800GB/s).
In a "standard" setup (3090 and Intel i7 for example), the 3090 will have 800GB/s access to a small pool of 24GB of RAM. The Intel chip will have access to say a pool of 32GB of RAM at an anemic 70GB/s. (The GPU can technically access the Intel's RAM pool, but through the PCI-E lanes then through the sluggish DDR5 interface.) This means you realistically have access to only the 24GB near the GPU for "fast" inference.
Compare this with the M3 Ultra: the CPU and GPU are on the same chip, and share the same high-speed memory controllers. They both have access to the full 800GB/s at all times, with no PCI-E or NUMA traversal. I hope this all makes sense haha.
Yes it does! I'm learning more about computers work, so it's very cool to see some of these terms in your answer.
>n a "standard" setup (3090 and Intel i7 for example), the 3090 will have 900GB/s access to a small pool of 24GB of RAM. The Intel chip will have access to say a pool of 32GB of RAM at an anemic 70GB/s. (The GPU can technically access the Intel's RAM pool, but through the PCI-E lanes then through the sluggish DDR5 interface.) This means you realistically have access to only the 24GB near the GPU for "fast" inference.
This explanation you gave is very clear! I'm sure there's design reasons for it. Is this something only apple is able to do because of they have full control of their chip?
Like, I try to run AI applications on my crummy 8GB VRAM computer and I'm so limited, but what's stopping others from imitating this and getting us super high VRAM/Unified memory?
This seems really exciting, because I mean even the high end NVDA chips are like what 24GB for consumer models like you said? But now you're saying I could potentially run something with like 512GB of vram?? I'm hoping more folks take the leap here, it seems the cost for high vram could come down quite quickly?
So, It's not really Apple specific; AMD does this in their APU line. It's more about having the CPU and the GPU on the same die (or processor, if that's easier to visualize.) Once their stuck together, they can utilize the same circuits to access the RAM (the memory controllers), and share the small, fast internal buffers all processors have (cache).
What makes Apple's approach somewhat interesting here is both that the CPU and GPU are on the same die, _and_ they have simply thrown a crazy number of memory controllers at the problem. RAM, as it happens, can only send so much data at once. You need one memory controller per RAM "lane." You might hear "dual-channel" memory; that literally means there are 2 memory controllers that can access data from two different RAM banks at once[1]. The M3 Ultra chip packs 48 memory controllers, which can each access data from different RAM chips in parallel. This is closer to a GPU's architecture than a standard CPU architecture.
What is stopping us from getting the perfect balance between these two approaches is, generally, this kind of insane memory bandwidth simply isn't needed for most computing workloads. So inexpensive computers without these requirements simply won't drive up costs to meet them. (AMD has historically positioned their APU line as budget hardware, which is why we're seeing them lag on this front.) Also, as you can imagine, 48 memory slots on a motherboard would look silly, take up too much space, and come with its own set of problems with traces running hither and yon. (DIMMs actually pack several RAM chips on one small board; GPU and Apple memory is soldered and accessed individually to support wider controller access.) Server boards can ship with 24 in dual processor configurations, but they're generally large, expensive and have bonkers power requirements to boot.
So yeah, I hope that clarifies some of this anyway. Let me know if there's anything else I can help clarify.
[1] This is reaching the edge of my knowledge of Intel memory layouts. It may be that a single controller can do dual issue, but the concept remains the same.
I never responded here, I'm sorry. This is absolutely tremendous, thank you very much, I've learned a lot reading this post and keep referencing it as I read more about this stuff the last few days.
Thank you!
CXL is used in some servers to enable you to have memory over PCIe but it is way way way slower than the memory in these systems.
The difficulty here is the longer the trace and the more connectors the higher the resistance and interfrances on the single. These singles are very very very fast, and it is easy for RV interfaces to screw them up so that you cant read the correct value. Even internal reflections on the wires themselves are issues when your switching signals at these speeds the copper trace stops behaving the same as I would for a constant current flow.
Oh, I thought the interface was fast enough. Thank you for the educational reply. I assumed pci was as fast as the gpu slot. I guess in my brain "slots are slots" and it's more about where, and size, than what kind. Like, we can plugin ram, why can't we plugin more ram, know what I mean? Simplistic I guess, ignorant XD Thanks again I'll RTFM some more X)
PCI is as fast as a GPU slot but a GPU does not access is VRAM over the PCI buss. The VRAM is typicly on the GPU card and has much much faster (and lower latency) connections.
The issues here are all down the trace length and trace quality, with the speeds we are dealing with these days for memory the speed of light (electricity is light) in copper becomes a huge factory.
I’m not overly impressed with the speed of models I can run on my Apple silicon. I’m wondering if I’ll ever run locally at this point. Considering my monthly expense for ai is now over $40, it’s still cheap compared to one of these bad boys.
The main use case for these is if you're doing any personal model customization or dealing with data that you cant legally send to a third party.
Private company data, legal data, medical, mill, gov data etc.
It depends on the industry but for many industries the paper work and compliance needed to send the data off site or to any server that it is not already approved on is a f-ing nightmare. (and ends up costing a lot in pointless hours of legal compliance contracts).
There are some cloud providers that have certification but if your company has not yet gone through the steps to validate and approve that provide it is often not worth the effort.
on perm deployments are returning to companies all over the world.
For example I used to work for a SW company than build software for the mining industry, sometimes clients would have issues and would share their projects (real world locacitnos of high volue deposits) with us. This data was considered extremely valuable, as the company may have spent millions if not billions to surveying the location to collect the data, when they provided it to us it was explicitly provided to a named engineer and to that engineers (air gapped) machines only. The paper work, and insurance we would have had to go through if we wanted to say upload that to a cloud service was a no go so we had to have on prem compute HW for doing the needed processing of this data (HW that would commonly be fully wiped between each customers data being loaded).
Very disappointed that it’s not an M4 Ultra although 512GB instead of 256GB is very cool. Will have to wait for benchmarks to make any kind of decision though. If it can handle R1 at good speeds then it’ll make a great in house LLM host. I have a feeling that smaller dynamic quants of R1 might end up working better though in which case the 512 one might be overkill.
The main user of this chip is apple itself within thier ML data centers. I expect the reason they want this volume of mem is to have multiple separate models loaded ready to go.
Don’t think Ultra is worth it. M4 max with 64GB is probably best choice. But still getting two 5090s would be best choice for local usage - big context window. Macs can’t handle that.
Where are you seeing the benchmark showing both models fitting into VRAM where the speed is comparable? Mac only wins when offloading is included, from what I see. Outside of that 4090 wins by a factor of 4.
Memory bandwidth matters for text processing speed - and it’s close to nvidia with 819GB/s but prompt processing speed relies on GPU AI capabilities and here M3 Max was 10x slower than multiple 3090. With M3 Ultra it might halve. So bigger the context window the more you will feel it.
Dunno how to explain it -
See here under llama 3.0 you have a table with text processing speed in relation to context window which is directly linked to GPU bandwidth and then you have prompt processing speed which on Mac is sometimes 10x slower than nvidia GPU
https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference
That is not the only thing determining LLM speed, see here for comparisons. https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference
you can see that several Macs SKUs here tested have the same memory speed, but different processors impact the speed.
Yes. It will be interesting to see two Project Digits (128GB $3000 each) connected with their high speed networking compete with a single $5600 Mac Studio M4 Max with 256GB RAM.
You can probably aggregate the three ports together between two Mac Studios for 240Gbit/s bidirectional bandwidth.
Or you can connect multiple Mac Studios with about 1-2 connections between each-other for up to 160Gbit/s bandwidth.
> You can probably aggregate the three ports together between two Mac Studios for 240Gbit/s bidirectional bandwidth.
You cannot.
> Or you can connect multiple Mac Studios with about 1-2 connections between each-other for up to 160Gbit/s bandwidth.
The 80 Gbps will still be the bottleneck.
I misspoke earlier, btw, the memory bandwidth isn't 10 times faster, since it's 800 GB/s, not 800 Gbit/s. It's actually 80 times faster
Probably won’t be released.
They (Apple) specifically stated (for the first time publicly) that “not all CPU generations will get the Ultra variant” = No M4 Ultra, that’s why we’re getting an M3 Ultra so deep into the M4 rollout.
That’s probably the point I’d float if the M4 Ultra wasn’t scheduled for at another year or so. Otherwise, knowledge of superior specs would hurt M3 Ultra sales, which is pure kryptonite to Apple. Notice how they didn’t *specifically* say that there will be no M4 Ultra.
Not a computer guy here, but is there someone that could help me translate the computer power of m4max to the equivalent of what I have in my pc to know which apple chip will be the correct one for my wife to edit photos for her business and also play her guilty pleasure... WoW?
My pc
Intel i7-12700
Nvidia GeForce rtx 3070ti
1tb storage
32gb ddr4 ram
Thank you.
The RAM speed is disappointing. I'm not sure how practical the 512GB of RAM will be outside of niche MOE models that use smaller experts. It sounds great for a local Deepseek at a decent quant, but I'd really like to see what the landscape of new 200B+ models are, architecture-wise, before wanting to invest in this device. Will Llama4 405B be a MOE, or is Meta going to stick with monolithic models?
I'm particularly interested in the use case of extended context length. With enough context length, I can feed entire repositories into context and the model has to make fewer assumptions about how to use it.
OK, so the Max is an M4 Max but the Ultra is an M3 Ultra.
819GB/s for the RAM for the M3 Max.
German prices:
11874€ for the 512GB model
6999€ for the 256GB model (with the smaller CPU model)
> It's interesting to compare this to a RTX 4090 with 96GB VRAM for $6000 (with around 1TB/s mem bandwidth).
96GB 5090 (L50? or A6000?) with like 1.7TB/s.
So basically get a much better amount of RAM, similar but materially slower speeds and a full MacOS front end for the same price? Is my interpretation there off base at all?
It just shows how overpriced these RTX 4090 96GB are.
The Mac memory is also overpriced i'm sure but it's hard to get 819GB/s of memory bandwidth for unified memory anywhere...
Mac 48 GB doesn’t equal 4090 48GB in speeds. Prompt processing and context window matters a lot for any serious use and Mac simply is orders of magnitude worse than Nvidia
Mac is still overpriced like usual. However, when putting them next to Nvidia. Suddenly it doesnt look like it is that overpriced. When the price of this 512GB Mac studio is same as 1 A6000 48GB Ada
If you search Google you will find it - seqrch for Alex Ziskind YouTube channel - memory bandwidth for system runs at lower speeds and only ram for GPU usage can access those speeds, therefore you can see there is a correlation between bandwidth speed and number GPU cores/RAM size (both go up) - hence 60 GPU core version will have around 25% lower bandwidth. Haven’t seen if anyone tested if this correlation exists for GPU core count or memory size - in other words if 32 GPU max with 48 and 64GB will have same vram bandwidth or different.
I believe it’s bound to GPUs, since only GPU accesses that bandwidth- and otherwise only 512GB version of M3 Ultra would be getting 819GB/s bandwidth which I don’t think would be the case (since 192GB M2 Ultra was).
Looks like it is exactly 2 M3 Max chips, connected:
“Apple says the M3 Ultra chip is essentially two M3 Max chips fused together with its "UltraFusion" technology, so the chip's specs are all doubled compared to the M3 Max. There was speculation last year about the M3 Max chip lacking UltraFusion technology, but Apple's announcement today has proven that rumor was false.”
https://www.macrumors.com/2025/03/05/apple-introduces-m3-ultra-chip/
I wish they released a chip which had like 100x the neural engine size. Like an ultra chip but all that extra space and compute goes only to a gigantic neural engine. On my m4 running the same language model purely on the neural engine takes 1.7W, on the GPU it takes 8W. And that 8W is already much more efficient than running on a "normal" GPU. Now imagine scaling up that neural engine 100x to work at the same power draw as an nvidia gpu. It would be like having your own groq chips at home.
You need M3 Ultra to get > 128GB unified memory, and M3 Ultra w/80 core GPU to get 512GB
$14099 for top spec (m3 ultra, 32 core cpu, 80 core gpu, 512GB unified memory, 16 TB SSD)
$9500 if you go with 1 TB SSD instead (cheapest config with 512GB memory)
$3500 for M4 Max w/40 core GPU, 512GB SSD, 128 GB unified memory (cheapest 128GB)
If the insides don't change as much, I presume someone will reverse-engineer the NAND flash carrier PCB, and we'll get replaceable storage again like what happened with the last Mac Studios.
I’ve been paying hundreds of dollars per week for Claude credits using Cline/RooCode. I’m considering getting an M3 Ultra maxed out except for SSD (so around the $9500 price point). Can someone explain to me what I can expect to see? I e read that I could run R1 Q4 but I don’t know what kind of experience it is? Would I be disappointed compared with Claude? Open to any other model suggestions and expectations. I’ve also heard that you can connect 3 together if anyone has more information about doing that I’d consider investing in that if it means I could run R1 or something similar fully. What I don’t want to have happen is make a big purchase and still need to use Claude for most of my coding. I’m not very experienced with hardware so if anyone can explain how big of a jump it will be to M4 Ultra I’d appreciate it because I don’t know if I should wait for a Mac Pro. If it’s only marginally better or faster architecture then I’d rather buy a Mac Studio now.
I looked but didn’t find it there. I’ll try running it in the cloud though before I make the hardware purchase. That’s a good idea I can experience the capabilities of the model but I won’t know how the speed compares with local hardware.
Sure won't be as small or efficient but with dual sockets we would have a theoretical bandwidth of **921.6 GB/s** that's more then the M3 Ultra. And obviously you get the flexibility of adding more Ram. Obviously one isn't clearly better then the other but for me i would preferer the epyc over the apple
Yeah, for around 6k you can get 6-8t/s with a dual socket build. I’m conflicted whether to pull the trigger or not, but I’ll hold off because they will announce the m4 ultra soon, It has less bandwidth than a 4090 which isn’t promising.
I am considering buying the maxed out new Mac Studio with M3 Ultra and 512GB of unified memory as a CAPEX investment for a startup that will be offering a then local llm interfered with a custom database of information for a specific application.
The hardware requirements appears feasible to me with a ~15k investment, and open source models seems build to be tailored for detailed use cases.
Of course this would be just to build an MVP, I don't expect this hardware to be able to sustain intensive usage by multiple users.
How feasible is that?
yeah you'd have to process one prompt at a time, there are ways to queue them but if you have a lot of people hitting the server it would be hard. Also not all models are optimized for MLX, most are cuda optimized so you may be limited when trying out some specific fine tunes
The question is how fast can it process my prompts when allocating 32k context for LLMs up to 123B, like Mistral Large. Given all that, if it can output 250 tokens at decent speeds, regardless of context size, I would fucking get one right away because, holy shit, this is what I have been waiting for.
These specs are good. I would like to know how they compare to the equivalent GPU. The advantage of GPUs is that you can batch requests. While a single individual prompt can run at 15 tokens per second in a GPU, you can run 20 prompts in parallel to achieve an effective throughput of hundreds of tokens per second. Can this be done on a Mac?
It is a slightly weakened 3090 with 512GB at max config as it gets 114.688TFLOPS FP16 vs 142.32TFLOPS FP16 for 3090 and memory bandwidth of 819.2GB/s vs 936GB/s.
I understand these things do quite well with simple prompts and no to little context. Is this device going to perform well when using 16k to 32k context or will performance plummet?
This seems like a decent deal just saw a YouTube where network chuck tested 5 M2 Max studios with 64 gb of ram running deepseek r2 with exos - that’s at least $10k in hardware and only gets you 320gb of ram and thunderbolt 4 is nowhere near 800gb/s.
I will not buy it but this is cheap. I’ve only recently started using a MacBook Pro and it’s a beast. For anyone used to Linux getting a super powerful Mac is hugely appealing.
>Up to 16.9x faster token generation using an LLM with hundreds of billions of parameters in LM Studio when compared to Mac Studio with M1 Ultra, thanks to its massive amounts of unified memory.
Yeah, cause it fits and doesn't use disk (swap)... Can't wait for actual numbers
> biggest bottleneck for Macs is the memory bandwidth
Not in the context of LLM. A 4090, for example, only has 1008 GB/s. Slightly more than an M2 Ultra, but as long as the model fits, 4090 is around 4 times faster. Even underclocking the memory speed on the 4090 doesn't yield a significant drawback. This suggests that the bottleneck on the M2 Ultra is most likely Processing.
The Apple Chips are great machines but I am not convinced the hardware will be capable of running a model that large at adequate speed. I hope to be proven wrong but the M3 and M4 max chips don’t do that well with anything beyond 32B. The so called thinking models output way too many tokens for them not to be running at least 40-60 tokens per second if you want an adequate experience.
Shit. I didn't think they would go 512GB. But it's great that they are holding price line with the 256GB model. That's the same price as the M2 Ultra with 192GB.
Sorry for the noob question but how does this compare for training or fine tuning? Do these specs still only make it better for inference or does it make training easier/faster?
So you can run Unsloth DeepSeek R1 on the m3 ultra / 256GB ram at home for $7k (it needs 160Gb (V)RAM), while still having room for smaller models to use in speculative decoding.
Very interested to see what real world tokens per second you could get out of this.
To be clear this is still super expensive but it’s getting DeepSeek R1 closer to hobbyist households.
I’d probably be willing to throw $5k at a solution that can run it at home at a reasonable throughput.
On my M1 Ultra Mac Studio I get 13.8 t/s with Llama 3.3 70B Q4 mlx.
M1 Max to M4 Max inference speed seems to roughly double, so let's assume the same for M1 Ultra to M3 Ultra.
Accounting for 2x faster performance, \~9.5x more parameters, Q2 vs Q4, it seems like you'd get closer to 5.8 t/s for R1 Q2 on M3 Ultra?
It's definitely awesome that you can run this at home for <$8k, but I feel like using cloud infrastructure becomes more attractive at this point.
Sales taxes (VAT) depends on the state and they tend to be a lot lower then in the EU. I'm from the Netherlands and we pay 21% VAT, and that's not even the highest in the EU.
The US version is \~$9500, the Dutch one is almost €12k.
€1.00 is worth $0.93, so when we do the conversion and add the VAT or prices are around 10% higher then in the US.
This makes me wonder what NVIDIA will do with Project DIGITS. I know that Nvidia limits their consumer GPUs so that they can charge a fortune for their enterprise GPUs. There's also quite a bit of buzz, at least in my little curated feed, for systems that can run and even train larger local models such as EXO Labs. It seems like NVIDIA could really crush it in the Local Model market if they wanted to.
This will crush with reasoning MoE’s like Deepseek R1. The bandwidth for generating hundreds if not thousands of reasoning tokens at 37 billion active parameters I think will put both the M4 Max and M3 Ultra ahead of Strix Halo and Project Digits.
Let’s wait for tests from first owners.
But I’m doubtful it will be any good for any serious usage. Macs do suck with fine tuning or big context window or bigger prompts.
The 512GB option is a bigger deal on the ultra than having only 2xM3 Max instead of 2xM4 Max.
Looking forward to getting my hands on one of these refurbished in a few years lol
512 gb RAM is amazing for huge MoE models like R1. Not so good for huge dense models like 405B.
Price is terrible. I don't think it's worth that price tbh. But it's Apple. I guess no one is surprised. I'm sad that affordable 64/96 gb mac studio is no longer an option like it used to be for M2 Max one.
[EPYC 9334 CPU + Motherboard](https://www.ebay.com/itm/186024089736?_skw=epyc%20motherboard%20cpu%20combo%20sp5&itmmeta=01JNKK0S8CYSFPETQ4KPPPNNVC&itmprp=enc%3AAQAKAAABAFkggFvd1GGDu0w3yXCmi1cv6BJxhVmKioCpkwhXSOagZn3aap%2F2ZO6q8rZK%2BMtaHiWtbiV3LzoQdWQgLwk8FSJf%2BwuLnXrbbLYKlm9N%2FxOPXHWLNE%2F2M3g%2FkyvGvutipUDcZxoStxIfcjJ4jFd5%2FcAwdSPewTE%2F3BdiJbDo7W97BsZ28pGGpwXuj82XSmOzDea%2FmiXCfsjyE%2BgK5Wfbp4Wkur%2BxXfxCYAo%2BR5O5oyHo7JLdUwgJMd0eGzC1PDwRoWEdjzWEQxFKv7SQyE4o0QemR1XcuDCyuvU%2FzDdW6w0dyT%2BJ1bGdEHQpTvBMx9rkQun0fQ%2FzRdePSN0F0mPzcGo%3D%7Ctkp%3ABFBMrpSD86xl) \- $1.500
12x [32gb DDR5 RAM](https://www.ebay.com/itm/116454950032?_skw=ddr5%20ram%2032gb&itmmeta=01JNKK95NRR7J2NMVSP62X9QHN&itmprp=enc%3AAQAKAAAA0FkggFvd1GGDu0w3yXCmi1erSEmmMgbYq7x2QOAq52alOd2HX9EHxvdjyh27g5emCVyGaMhBYcCrZsITCQLGkkJR1ExvsJEMcsbx6FB%2F%2BIp9eu15Y2RE%2B2SvZjJhRnIFR3YQLFBb%2Bbl8V5eMP1AzyEcKtzP9A9RaJVkoxPRnQL916GT5oVfGVGC9w7YyljtSmCE9wz6BrhD%2Bn5cOtcTnlHuLOoVxNm4KmBkGUPhU5TElKipxahhd5IVRc0BWyu4RW27mgOmi0nZ1r5hKabnafTQ%3D%7Ctkp%3ABk9SR4LbpPOsZQ) \- $1.000
Other stuff like cooler, 1 TB nvme etc. - $250
= 2.750$ - 384gb RAM, 460gb/s bandwidth, 1 TB ssd - **384gb RAM should be enough for running MoE models like R1, can get 64gb sticks instead if need more RAM, can upgrade RAM and storage yourself, can use linux instead of macos**
M4 Max = 3.700$ - 128gb RAM, 409gb/s bandwidth, 1 TB ssd - **costs more for less bandwidth**
M3 Ultra = 9.500$ - 512gb RAM, 800gb/s bandwidth, 1 TB ssd - **costs 3.5x more, theoretically only 50% faster at inference**
I'm assuming the limiting factor for running a full model like R1 will be it's memory bandwidth at this point? How many toks/s can the maxed out memory config expect?
Tim Cooked with this one. Based on RAM configs and their examples. It seems to be aimed directly as an R1 machine without saying it out loud to avoid Backlash from 🥭 for supporting China.
Consider that if you have a bonafide business _need_ for that much memory, then this is probably well within a reasonable budget.
If this is a _want_ then the price probably seems absurd and that's ok.
546 GB/sec memory bandwidth. So just over one token per second if you run the largest model that fits in the unified memory (with no mixture of experts or speculative decoding).
> offers an up to 80-core GPU, more than any Apple silicon chip; a powerful 32-core Neural Engine for on-device AI and machine learning (ML)
Holy hell!! MLX go brrrrrrr
Really looking forward to the benchmarks. Let's hope someone reviews the 512GB variant with R1, you can probably fit Q6 in there.
It's definitely more power efficient than the cpumax or gpumax way. But not sure about the performance. Realistically you can probably fit 8? 3090s in a rack, but thats less than half the VRAM, and it will cost around 9K for a setup like that.
478 Comments
anonynousasdfg@reddit
MusingsOfASoul@reddit
iCruiser7@reddit (OP)
Chelono@reddit
pkmxtw@reddit
b0tbuilder@reddit
Chelono@reddit
Cergorach@reddit
smith7018@reddit
fullouterjoin@reddit
Yes_but_I_think@reddit
TastesLikeOwlbear@reddit
fallingdowndizzyvr@reddit
dinerburgeryum@reddit
fallingdowndizzyvr@reddit
dinerburgeryum@reddit
fallingdowndizzyvr@reddit
siegevjorn@reddit
2str8_njag@reddit
SubstantialSock8002@reddit
Careless_Garlic1438@reddit
GreatBigJerk@reddit
Remote_Cap_@reddit
darth_chewbacca@reddit
b0tbuilder@reddit
FreezeS@reddit
darth_chewbacca@reddit
Everlier@reddit
Cergorach@reddit
GreatBigJerk@reddit
-oshino_shinobu-@reddit
bfume@reddit
catgirl_liker@reddit
SecuredStealth@reddit
geekgodOG@reddit
Tadpole5050@reddit
pkmxtw@reddit
mxforest@reddit
b0tbuilder@reddit
Paganator@reddit
emprahsFury@reddit
ReginaldBundy@reddit
perelmanych@reddit
FullOf_Bad_Ideas@reddit
bigmanbananas@reddit
FullOf_Bad_Ideas@reddit
fallingdowndizzyvr@reddit
FullOf_Bad_Ideas@reddit
fallingdowndizzyvr@reddit
FullOf_Bad_Ideas@reddit
fallingdowndizzyvr@reddit
FullOf_Bad_Ideas@reddit
florinandrei@reddit
bigmanbananas@reddit
FullOf_Bad_Ideas@reddit
perelmanych@reddit
noiserr@reddit
perelmanych@reddit
Cergorach@reddit
amhotw@reddit
fallingdowndizzyvr@reddit
perelmanych@reddit
Enough-Meringue4745@reddit
perelmanych@reddit
indicava@reddit
Ok_Warning2146@reddit
WhiteHorseTito@reddit
DirectAd1674@reddit
Tadpole5050@reddit
Abject_Radio4179@reddit
fallingdowndizzyvr@reddit
DirectAd1674@reddit
-6h0st-@reddit
Yes_but_I_think@reddit
animealt46@reddit
power97992@reddit
sage-longhorn@reddit
fallingdowndizzyvr@reddit
TheElectroPrince@reddit
fallingdowndizzyvr@reddit
Caffeine_Monster@reddit
AXYZE8@reddit
Desm0nt@reddit
AXYZE8@reddit
noiserr@reddit
AXYZE8@reddit
noiserr@reddit
AXYZE8@reddit
noiserr@reddit
AXYZE8@reddit
noiserr@reddit
Caffeine_Monster@reddit
AXYZE8@reddit
FullOf_Bad_Ideas@reddit
rorowhat@reddit
-6h0st-@reddit
Zyj@reddit
Playful_Accident8990@reddit
mxforest@reddit
Individual_Aside7554@reddit
tothatl@reddit
poli-cya@reddit
Forgot_Password_Dude@reddit
poli-cya@reddit
zxyzyxz@reddit
b0tbuilder@reddit
tothatl@reddit
WhyIsSocialMedia@reddit
llamabott@reddit
xor_2@reddit
SkyFeistyLlama8@reddit
Playful_Accident8990@reddit
SomeOddCodeGuy@reddit
joninco@reddit
Lyuseefur@reddit
joninco@reddit
DeSibyl@reddit
joninco@reddit
joninco@reddit
ComingInSideways@reddit
DeSibyl@reddit
thrownawaymane@reddit
joninco@reddit
man_and_a_symbol@reddit
joninco@reddit
man_and_a_symbol@reddit
Lyuseefur@reddit
RemindMeBot@reddit
joninco@reddit
joninco@reddit
AstroZombie138@reddit
joninco@reddit
EternalOptimister@reddit
EvilPencil@reddit
power97992@reddit
Careless_Garlic1438@reddit
Hoodfu@reddit
Ok_Warning2146@reddit
Hoodfu@reddit
fallingdowndizzyvr@reddit
-6h0st-@reddit
chespirito2@reddit
bullerwins@reddit
Its_Powerful_Bonus@reddit
Yes_but_I_think@reddit
perelmanych@reddit
Healthy-Nebula-3603@reddit
perelmanych@reddit
joninco@reddit
teachersecret@reddit
joninco@reddit
martinerous@reddit
SomeOddCodeGuy@reddit
TyraVex@reddit
IlIllIlllIlllIllllI@reddit
ykoech@reddit
Harvard_Med_USMLE267@reddit
thunk_stuff@reddit
fotiro@reddit
Ok_Warning2146@reddit
cafedude@reddit
Rich_Repeat_22@reddit
Karyo_Ten@reddit
Rich_Repeat_22@reddit
ResolveSea9089@reddit
Rich_Repeat_22@reddit
Karyo_Ten@reddit
Rich_Repeat_22@reddit
Karyo_Ten@reddit
Rich_Repeat_22@reddit
Karyo_Ten@reddit
Rich_Repeat_22@reddit
Karyo_Ten@reddit
Rich_Repeat_22@reddit
fullouterjoin@reddit
Rich_Repeat_22@reddit
Karyo_Ten@reddit
Rich_Repeat_22@reddit
fullouterjoin@reddit
Kavor@reddit
jabblack@reddit
rorowhat@reddit
angry_queef_master@reddit
the_Luik@reddit
YearnMar10@reddit
zoe934@reddit
YearnMar10@reddit
zoe934@reddit
Rich_Repeat_22@reddit
Forgot_Password_Dude@reddit
wen_mars@reddit
StoneyCalzoney@reddit
BigMagnut@reddit
candre23@reddit
ASYMT0TIC@reddit
candre23@reddit
indicava@reddit
candre23@reddit
mxforest@reddit
Wildcard355@reddit
-6h0st-@reddit
Healthy-Nebula-3603@reddit
tangoshukudai@reddit
roshanpr@reddit
mxforest@reddit
DirectAd1674@reddit
half_a_pony@reddit
2016YamR6@reddit
nexusprime2015@reddit
getmevodka@reddit
ready-eddy@reddit
Background-Hour1153@reddit
tomz17@reddit
Zyj@reddit
mxforest@reddit
tomz17@reddit
Zyj@reddit
tomz17@reddit
Zyj@reddit
tomz17@reddit
Zyj@reddit
Zyj@reddit
dinerburgeryum@reddit
tomz17@reddit
BumbleSlob@reddit
calcium@reddit
gandhi_theft@reddit
mxforest@reddit
philguyaz@reddit
bullerwins@reddit
Abject_Radio4179@reddit
Barry_Jumps@reddit
CtrlAltDelve@reddit
Barry_Jumps@reddit
animax00@reddit
Competitive-Bake4602@reddit
TrashPandaSavior@reddit
animealt46@reddit
mxforest@reddit
philguyaz@reddit
FullOf_Bad_Ideas@reddit
philguyaz@reddit
indicava@reddit
philguyaz@reddit
FullOf_Bad_Ideas@reddit
piggledy@reddit
mxforest@reddit
piggledy@reddit
Longjumping_Kale3013@reddit
animealt46@reddit
piggledy@reddit
dinerburgeryum@reddit
dinerburgeryum@reddit
mxforest@reddit
Kind-Log4159@reddit
SomeOddCodeGuy@reddit
Mrleibniz@reddit
SomeOddCodeGuy@reddit
fullouterjoin@reddit
SomeOddCodeGuy@reddit
Yes_but_I_think@reddit
mxforest@reddit
Low-Opening25@reddit
revotfel@reddit
thrownawaymane@reddit
revotfel@reddit
thrownawaymane@reddit
revotfel@reddit
anythingisavictory@reddit
revotfel@reddit
anythingisavictory@reddit
revotfel@reddit
Turbulent_Pin7635@reddit
Daniel_H212@reddit
dinerburgeryum@reddit
ResolveSea9089@reddit
dinerburgeryum@reddit
ResolveSea9089@reddit
dinerburgeryum@reddit
ResolveSea9089@reddit
dinerburgeryum@reddit
mxforest@reddit
xXprayerwarrior69Xx@reddit
Daniel_H212@reddit
phata-phat@reddit
Innomen@reddit
hishnash@reddit
Innomen@reddit
hishnash@reddit
I_EAT_THE_RICH@reddit
hishnash@reddit
I_EAT_THE_RICH@reddit
hishnash@reddit
I_EAT_THE_RICH@reddit
Spanky2k@reddit
hishnash@reddit
power97992@reddit
hishnash@reddit
-6h0st-@reddit
Glebun@reddit
Xyzzymoon@reddit
Glebun@reddit
Xyzzymoon@reddit
Glebun@reddit
dkaminsk@reddit
Glebun@reddit
dkaminsk@reddit
Glebun@reddit
dkaminsk@reddit
Glebun@reddit
dkaminsk@reddit
Glebun@reddit
Xyzzymoon@reddit
swagonflyyyy@reddit
Zyj@reddit
swagonflyyyy@reddit
Zyj@reddit
Glebun@reddit
TheElectroPrince@reddit
Glebun@reddit
TheElectroPrince@reddit
Glebun@reddit
petuman@reddit
SeymourBits@reddit
indicava@reddit
SeymourBits@reddit
JBsthirdleg@reddit
iCruiser7@reddit (OP)
JBsthirdleg@reddit
iCruiser7@reddit (OP)
JBsthirdleg@reddit
synn89@reddit
WhereIsYourMind@reddit
Zyj@reddit
Abject_Radio4179@reddit
Synyster328@reddit
noiserr@reddit
AbominableMayo@reddit
Zyj@reddit
AbominableMayo@reddit
poli-cya@reddit
AnotherSoftEng@reddit
-6h0st-@reddit
Final-Rush759@reddit
dinerburgeryum@reddit
Such_Advantage_6949@reddit
dinerburgeryum@reddit
-6h0st-@reddit
Glebun@reddit
-6h0st-@reddit
Enough-Meringue4745@reddit
siegevjorn@reddit
-6h0st-@reddit
ReginaldBundy@reddit
noxtare@reddit
PongRaider@reddit
SeymourBits@reddit
xrvz@reddit
SeymourBits@reddit
fallingdowndizzyvr@reddit
indicava@reddit
Dax_Thrushbane@reddit
mxforest@reddit
BaysQuorv@reddit
TheElectroPrince@reddit
SteveRD1@reddit
Aaaaaaaaaeeeee@reddit
BaysQuorv@reddit
AngleFun1664@reddit
Master-Meal-77@reddit
dinerburgeryum@reddit
dissemblers@reddit
joninco@reddit
TheElectroPrince@reddit
Magnus919@reddit
Cool-Cicada9228@reddit
RikuDesu@reddit
Cool-Cicada9228@reddit
canyonkeeper@reddit
lordmord319@reddit
Glebun@reddit
lordmord319@reddit
Glebun@reddit
Kind-Log4159@reddit
SporksInjected@reddit
Kind-Log4159@reddit
indicava@reddit
chaddone@reddit
RikuDesu@reddit
blacPanther55@reddit
tnnnn@reddit
Soft_Constant_7355@reddit
Sudden-Lingonberry-8@reddit
gripntear@reddit
Ok_Warning2146@reddit
ortegaalfredo@reddit
Ok_Warning2146@reddit
DinoAmino@reddit
ortegaalfredo@reddit
Glebun@reddit
AutomaticDriver5882@reddit
AbheekG@reddit
Spirited_Eggplant_98@reddit
Mediocre-Ad9008@reddit
thrownawaymane@reddit
extopico@reddit
Chelono@reddit
Mochila-Mochila@reddit
MoffKalast@reddit
v00d00_@reddit
sluuuurp@reddit
Remote_Cap_@reddit
Doublespeo@reddit
NNN_Throwaway2@reddit
Mochilongo@reddit
Xyzzymoon@reddit
Mochilongo@reddit
SteveRD1@reddit
maddogawl@reddit
davewolfs@reddit
fallingdowndizzyvr@reddit
Turbulent-Week1136@reddit
BumbleSlob@reddit
SubstantialSock8002@reddit
Individual_Holiday_9@reddit
OverCategory6046@reddit
DirectAd1674@reddit
MagicZhang@reddit
Sudden-Lingonberry-8@reddit
darth_chewbacca@reddit
OverCategory6046@reddit
animealt46@reddit
nonsoil2@reddit
robertotomas@reddit
Cergorach@reddit
robertotomas@reddit
Cergorach@reddit
42nd_loop@reddit
robertotomas@reddit
Glebun@reddit
nonsoil2@reddit
Glebun@reddit
tzujan@reddit
NootropicDiary@reddit
SeymourBits@reddit
The_Hardcard@reddit
-6h0st-@reddit
Puzzleheaded-Dust268@reddit
gintrux@reddit
Regrets_397@reddit
Only-Letterhead-3411@reddit
TaloSi_II@reddit
Only-Letterhead-3411@reddit
codingworkflow@reddit
roshanpr@reddit
mxforest@reddit
roshanpr@reddit
Dax_Thrushbane@reddit
pseudonerv@reddit
davewolfs@reddit
undefinex@reddit
Feisty-Pineapple7879@reddit
mxforest@reddit
AaronFeng47@reddit
Solaranvr@reddit
bmo333@reddit
tibbon@reddit
AaronFeng47@reddit
AaronFeng47@reddit
sluuuurp@reddit
Krazie00@reddit
Least_Expert840@reddit
AaronFeng47@reddit
NeedsMoreMinerals@reddit
MannowLawn@reddit
albus_the_white@reddit
nrkishere@reddit
bullerwins@reddit