oMLX just implemented DFlash
Posted by butterfly_labs@reddit | LocalLLaMA | View on Reddit | 13 comments
https://github.com/jundot/omlx/commit/28fab9fc28f0c0013ffb307f3b21d30658ae1a72
Posted by butterfly_labs@reddit | LocalLLaMA | View on Reddit | 13 comments
https://github.com/jundot/omlx/commit/28fab9fc28f0c0013ffb307f3b21d30658ae1a72
Ok_Technology_5962@reddit
Sigh too bad cant use for glm 5.1....
layer4down@reddit
Looks like someone may already be working on it
Specter_Origin@reddit
I do want to try it but I am not sure why they are not working on Gemma-4, looks like they have undertaken GLM5.1 which few can run locally instead of taking on gemma-4 which seems like 2 of the viable local models atm.
layer4down@reddit
Undertaking GLM5.1 would be a massive win particularly for those slummin it in the 1-2bit range. I'm grateful to already be getting \~17tps+ TG and 180-250tps PP from qwen3.5-397b-a17b-2.6bit in oMLX. I'm so surprised at how well it works it even seems to outperform qwen3.5-27b-fp16 (which is a very smart model). I was baselining at \~9.5tps TG for 27b@bf16 but just pulled of 38.5 tps with my own DFlash upgrade. Though I'm eager to try the official release now.
Dany0@reddit
Why do you think zlab owes you anything in the first place. Train your own diffusion model. A small one you could do on a free colab. 2b-4b my hand estimate says is like 100-200$ of compute
Specter_Origin@reddit
"Why do you think zlab owes you anything in the first place. Train your own diffusion model. A small one you could do on a free colab. 2b-4b my hand estimate says is like 100-200$ of compute"
Never said they own me anything... also they have not opened up their code yet.
layer4down@reddit
Oh son of a gun I just spent all day implementing it myself 😂 should've checked localllama
maschayana@reddit
Only up to 2k tokens? Because else it makes no sense
Dany0@reddit
m3 max 64gb. I get 20 tok/s decode on 27b without dflash in omlx. And 11-13tok/s with dflash on. What am I doing wrong? I tried the mlxcommunity 8bit (exceeded memory and went to swap) and 4bit models (used like 44gb iirc)
Zestyclose_Yak_3174@reddit
Happy to see exciting new developments with oMLX, early days for me using it but it looks really promising
dametsumari@reddit
Yep. The speculative execution was brewing in a branch for a month, and looking forward to 0.35 to try this out :) I am not in a hurry so not going to use git main..
butterfly_labs@reddit (OP)
I'm going to wait as well... releases are already pretty unstable for me.
Beginning-Window-115@reddit
this guy is the goat