oMLX just implemented DFlash

Posted by butterfly_labs@reddit | LocalLLaMA | View on Reddit | 13 comments

https://github.com/jundot/omlx/commit/28fab9fc28f0c0013ffb307f3b21d30658ae1a72

[-]

Ok_Technology_5962@reddit

Sigh too bad cant use for glm 5.1....

[-]

layer4down@reddit

Looks like someone may already be working on it

[-]

I do want to try it but I am not sure why they are not working on Gemma-4, looks like they have undertaken GLM5.1 which few can run locally instead of taking on gemma-4 which seems like 2 of the viable local models atm.

[-]

layer4down@reddit

Undertaking GLM5.1 would be a massive win particularly for those slummin it in the 1-2bit range. I'm grateful to already be getting \~17tps+ TG and 180-250tps PP from qwen3.5-397b-a17b-2.6bit in oMLX. I'm so surprised at how well it works it even seems to outperform qwen3.5-27b-fp16 (which is a very smart model). I was baselining at \~9.5tps TG for 27b@bf16 but just pulled of 38.5 tps with my own DFlash upgrade. Though I'm eager to try the official release now.

[-]

Dany0@reddit

Why do you think zlab owes you anything in the first place. Train your own diffusion model. A small one you could do on a free colab. 2b-4b my hand estimate says is like 100-200$ of compute

[-]

Specter_Origin@reddit

"Why do you think zlab owes you anything in the first place. Train your own diffusion model. A small one you could do on a free colab. 2b-4b my hand estimate says is like 100-200$ of compute"

Never said they own me anything... also they have not opened up their code yet.

[-]

butterfly_labs@reddit (OP)

I'm going to wait as well... releases are already pretty unstable for me.

[-]

Beginning-Window-115@reddit

this guy is the goat