Tinygrad Driver testing!

[-]

Evening_Ad6637@reddit

Nice!

Can you try one of the deepseek-v4 or both? I’m wondering what maximum context-size you can squeeze into this stack and how TG & PP speeds do look at the given maximum

[-]

Street-Buyer-2428@reddit (OP)

2x m5 Max 128gb — If you guys want to experiment with those as well lmk lol

[-]

ElementNumber6@reddit

Not as interesting as the capacity to run Deepseek v4 Pro. I'd focus on that for now.

[-]

superdariom@reddit

Can you explain what I'm looking at here?

[-]

Street-Buyer-2428@reddit (OP)

Apple approved a driver to plug in som gpus through thunderbolt 5. I wanna use the blackwell for prefill and the m3u’s for kv caching/decode.

[-]

cleversmoke@reddit

Wait a minute, did they really do it?? Finally on M devices?? 😱

[-]

Street-Buyer-2428@reddit (OP)

Yeah, but theres definitely a lot to optimize. This isnt fast enough. Im trying to see if i could use the driver's mapping technique and optimize it, but this definitely needs work.

[-]

segmond@reddit

i often see these posts then they never come back to tell us what they did.

[-]

Street-Buyer-2428@reddit (OP)

I’m actually gonna do it. currently setting up add menon x @mlx_reaper for updates.

[-]

segmond@reddit

ok, I want to see the difference the 6000 makes in prompt processing. Load a 100B model, say MistralMedium3.5-128b on both macs and test, then load on the 6000 and 1 mac and test.

[-]

Street-Buyer-2428@reddit (OP)

Hell yeah. have a feeling apples new ceo is gonna kill it.

[-]

super1701@reddit

How much was this total? Looking at my own "jarvis" setup and this seems like a dream for it lol.

[-]

Street-Buyer-2428@reddit (OP)

bout $30k for the stdios (yes i know — sourced refurb a year ago for the for a great price), 13k for the m5 max, and 7k for the blackwell so all in bout 50. its worth way more in todays market tho

[-]

super1701@reddit

God. Guessing you own your own business for that. Jealous af.

[-]

Street-Buyer-2428@reddit (OP)

Yeah I do local AI for small to medium businesses that need ti handle sensitive information. I literally just soend all the money they give me on buying shit like this lol

[-]

super1701@reddit

How'd you get into that? Doing a cloud, or make the rigs and hand it to them?

[-]

Started on the cloud, benefitted from some of those free credits google/MS are giving out, and customers just kept asking to get out of the cloud so I just started doing it with macs because macs are easier to sell.

[-]

danish334@reddit

eGPU

[-]

redmctrashface@reddit

Nice setup, are you a millionnaire?

[-]

Street-Buyer-2428@reddit (OP)

Lol unfortunately not

[-]

Creepy-Bell-4527@reddit

I hate to break it to you...

But the tinygrad driver usually performs about the same as the M3 Ultra CPU.

That is to say, completely ass.

[-]

Street-Buyer-2428@reddit (OP)

Yeah, Noticed that. A bit disappointed here. I’m checking to see if i could use Vulkan or retrofit something through the new JACCL backend to process the matmuls.

[-]

6969its_a_great_time@reddit

Is it going to get enough airflow in one of those?

[-]

MisticRain69@reddit

i think it has a blower

[-]

6969its_a_great_time@reddit

Really? Couldn’t tell from the picture. it just looked like a data center GPU with that gold plating at the top similar to like an L40S or A100 which don’t have fans.

[-]

Technical-Earth-3254@reddit

Rtx 5000 definitely has a blower

[-]

Street-Buyer-2428@reddit (OP)

I have a liquid cooler i can probably tap into it. I think it has one fan though

[-]

6969its_a_great_time@reddit

Interested to see the final setup

[-]

Street-Buyer-2428@reddit (OP)

Awesome! I’m trying to structure the content since this got so much interest, so add me on x @mlx_reaper for updates. ill also be posting here

[-]

Adrian_Galilea@reddit

Would love to see content about this, let us know what sticks after testing.

Also, what specs?

What gpu?

[-]

One-Pain6799@reddit

Nice setup!

[-]

CheatCodesOfLife@reddit

Which thunderbolt -> PCIe product is that?

[-]

Street-Buyer-2428@reddit (OP)

egpu

[-]

MatlowAI@reddit

Razer Core X V2? Depending on the m5 ultra I plan on heading this direction.

[-]

CheatCodesOfLife@reddit

Thanks

[-]

Street-Buyer-2428@reddit (OP)

I think so. Its the latest tb5 one.

[-]

Pixer---@reddit

How much does that cuda gpu speed up prompt processing ?

[-]

madsheepPL@reddit

tinygrad doest use cuda

[-]

lots_of_apples@reddit

For your macs I know exo works to run them all as a cluster, but does exo support egpus?

[-]

Street-Buyer-2428@reddit (OP)

Exo is unfortunately not good for production workflows. I had to even build my own backend to be able to actually use the rdma in a stable format over long contexts. I tried reaching out to them to help out and see if I could collaborate but i never received a reply

[-]

Longjumping_Crow_597@reddit

Let's collab! I tried sending an email but it bounced.

[-]

Street-Buyer-2428@reddit (OP)

Huh that’s weird. I’ll hit you up on PM.

[-]

Cosack@reddit

That's a used car worth of hardware sitting in this corner here...

[-]

Street-Buyer-2428@reddit (OP)

More like a used 2020 911 lol

[-]

Cosack@reddit

Guess no choice now. Gonna have to set some agents loose to hack Google and then run Genie 3 locally to drive a pretend 911

[-]

Street-Buyer-2428@reddit (OP)

Lol. i heard world models are getting better anyways so maybe it won’t make a difference

[-]

Objective-Picture-72@reddit

You putting any content on YouTube or medium? would love to follow your work

[-]

Street-Buyer-2428@reddit (OP)

I should right? I’ve been doing this by myself for months and I feel like theres def. a gap for this type of content

[-]

FullOf_Bad_Ideas@reddit

Which inference engines would support offloading attention, shared experts and kv cache to GPU while keeping sparse experts on unified memory? I'd like to see performance on that, especially prefill at high context.

[-]

Street-Buyer-2428@reddit (OP)

Yes Yes and Yes. Added to the list. This is exactly what i was looking for.

[-]

cheapybastard@reddit

Cool!

[-]

Technical-Earth-3254@reddit

Nice setup, I would be interested in some smaller, current models like DS V4 Flash or MiMo V2.5, in addition to the full size DS V4 Pro, Kimi K2.6, MiMo V2.5 Pro and maybe GLM 5.1.

[-]

Street-Buyer-2428@reddit (OP)

added to the list!

[-]

xornullvoid@reddit

Nice, which card is that?

[-]

Street-Buyer-2428@reddit (OP)

blackwell 5k 72gb

[-]

xornullvoid@reddit

Nice, looked familiar. I have the little brother 48GB.
Do let us know the benchmarks, not seen many Apples combined with Blackwell here.

[-]

Street-Buyer-2428@reddit (OP)

I don’t understand why people havent gone apeshit on it ngl