Tinygrad Driver testing!
Posted by Street-Buyer-2428@reddit | LocalLLaMA | View on Reddit | 58 comments
Boutta Thrash some MoE speeds on a blackwell + m3 Ultra RDMA cluster. Theres a bit less than 2tb of ram here. I want to exchange ideas with you guys and make some cool experiments. what benches would you guys like to see?
Evening_Ad6637@reddit
Nice!
Can you try one of the deepseek-v4 or both? I’m wondering what maximum context-size you can squeeze into this stack and how TG & PP speeds do look at the given maximum
Street-Buyer-2428@reddit (OP)
2x m5 Max 128gb — If you guys want to experiment with those as well lmk lol
ElementNumber6@reddit
Not as interesting as the capacity to run Deepseek v4 Pro. I'd focus on that for now.
superdariom@reddit
Can you explain what I'm looking at here?
Street-Buyer-2428@reddit (OP)
Apple approved a driver to plug in som gpus through thunderbolt 5. I wanna use the blackwell for prefill and the m3u’s for kv caching/decode.
cleversmoke@reddit
Wait a minute, did they really do it?? Finally on M devices?? 😱
Street-Buyer-2428@reddit (OP)
Yeah, but theres definitely a lot to optimize. This isnt fast enough. Im trying to see if i could use the driver's mapping technique and optimize it, but this definitely needs work.
segmond@reddit
i often see these posts then they never come back to tell us what they did.
Street-Buyer-2428@reddit (OP)
I’m actually gonna do it. currently setting up add menon x @mlx_reaper for updates.
segmond@reddit
ok, I want to see the difference the 6000 makes in prompt processing. Load a 100B model, say MistralMedium3.5-128b on both macs and test, then load on the 6000 and 1 mac and test.
polandtown@reddit
whaaat? very cool - go apple!
Street-Buyer-2428@reddit (OP)
Hell yeah. have a feeling apples new ceo is gonna kill it.
super1701@reddit
How much was this total? Looking at my own "jarvis" setup and this seems like a dream for it lol.
Street-Buyer-2428@reddit (OP)
bout $30k for the stdios (yes i know — sourced refurb a year ago for the for a great price), 13k for the m5 max, and 7k for the blackwell so all in bout 50. its worth way more in todays market tho
super1701@reddit
God. Guessing you own your own business for that. Jealous af.
Street-Buyer-2428@reddit (OP)
Yeah I do local AI for small to medium businesses that need ti handle sensitive information. I literally just soend all the money they give me on buying shit like this lol
super1701@reddit
How'd you get into that? Doing a cloud, or make the rigs and hand it to them?
Street-Buyer-2428@reddit (OP)
Started on the cloud, benefitted from some of those free credits google/MS are giving out, and customers just kept asking to get out of the cloud so I just started doing it with macs because macs are easier to sell.
danish334@reddit
eGPU
redmctrashface@reddit
Nice setup, are you a millionnaire?
Street-Buyer-2428@reddit (OP)
Lol unfortunately not
Creepy-Bell-4527@reddit
I hate to break it to you...
But the tinygrad driver usually performs about the same as the M3 Ultra CPU.
That is to say, completely ass.
Street-Buyer-2428@reddit (OP)
Yeah, Noticed that. A bit disappointed here. I’m checking to see if i could use Vulkan or retrofit something through the new JACCL backend to process the matmuls.
6969its_a_great_time@reddit
Is it going to get enough airflow in one of those?
MisticRain69@reddit
i think it has a blower
6969its_a_great_time@reddit
Really? Couldn’t tell from the picture. it just looked like a data center GPU with that gold plating at the top similar to like an L40S or A100 which don’t have fans.
Technical-Earth-3254@reddit
Rtx 5000 definitely has a blower
Street-Buyer-2428@reddit (OP)
I have a liquid cooler i can probably tap into it. I think it has one fan though
6969its_a_great_time@reddit
Interested to see the final setup
Street-Buyer-2428@reddit (OP)
Awesome! I’m trying to structure the content since this got so much interest, so add me on x @mlx_reaper for updates. ill also be posting here
Adrian_Galilea@reddit
Would love to see content about this, let us know what sticks after testing.
Also, what specs?
What gpu?
One-Pain6799@reddit
Nice setup!
CheatCodesOfLife@reddit
Which thunderbolt -> PCIe product is that?
Street-Buyer-2428@reddit (OP)
egpu
MatlowAI@reddit
Razer Core X V2? Depending on the m5 ultra I plan on heading this direction.
CheatCodesOfLife@reddit
Thanks
Street-Buyer-2428@reddit (OP)
I think so. Its the latest tb5 one.
Pixer---@reddit
How much does that cuda gpu speed up prompt processing ?
madsheepPL@reddit
tinygrad doest use cuda
lots_of_apples@reddit
For your macs I know exo works to run them all as a cluster, but does exo support egpus?
Street-Buyer-2428@reddit (OP)
Exo is unfortunately not good for production workflows. I had to even build my own backend to be able to actually use the rdma in a stable format over long contexts. I tried reaching out to them to help out and see if I could collaborate but i never received a reply
Longjumping_Crow_597@reddit
Let's collab! I tried sending an email but it bounced.
Street-Buyer-2428@reddit (OP)
Huh that’s weird. I’ll hit you up on PM.
Cosack@reddit
That's a used car worth of hardware sitting in this corner here...
Street-Buyer-2428@reddit (OP)
More like a used 2020 911 lol
Cosack@reddit
Guess no choice now. Gonna have to set some agents loose to hack Google and then run Genie 3 locally to drive a pretend 911
Street-Buyer-2428@reddit (OP)
Lol. i heard world models are getting better anyways so maybe it won’t make a difference
Objective-Picture-72@reddit
You putting any content on YouTube or medium? would love to follow your work
Street-Buyer-2428@reddit (OP)
I should right? I’ve been doing this by myself for months and I feel like theres def. a gap for this type of content
FullOf_Bad_Ideas@reddit
Which inference engines would support offloading attention, shared experts and kv cache to GPU while keeping sparse experts on unified memory? I'd like to see performance on that, especially prefill at high context.
Street-Buyer-2428@reddit (OP)
Yes Yes and Yes. Added to the list. This is exactly what i was looking for.
cheapybastard@reddit
Cool!
Technical-Earth-3254@reddit
Nice setup, I would be interested in some smaller, current models like DS V4 Flash or MiMo V2.5, in addition to the full size DS V4 Pro, Kimi K2.6, MiMo V2.5 Pro and maybe GLM 5.1.
Street-Buyer-2428@reddit (OP)
added to the list!
xornullvoid@reddit
Nice, which card is that?
Street-Buyer-2428@reddit (OP)
blackwell 5k 72gb
xornullvoid@reddit
Nice, looked familiar. I have the little brother 48GB.
Do let us know the benchmarks, not seen many Apples combined with Blackwell here.
Street-Buyer-2428@reddit (OP)
I don’t understand why people havent gone apeshit on it ngl