DFlash Doubles the T/S Gen Speed of Qwen3.5 27B (BF16) on Mac M5 Max

Posted by MiaBchDave@reddit | LocalLLaMA | View on Reddit | 31 comments

The new DFlash support in oMLX 0.3.5 RC1 looks like it doubles (!!!) the speed of Qwen3.5 27B (BF16). Initial test. Generation T/S went from 9 to 22 T/S!

Models used (HuggingFace)

Main Model: Jackrong/MLX-Qwopus3.5-27B-v3-bf16
Draft Model: z-lab/Qwen3.5-27B-DFlash

System: M5 Max 128GB

DFlash on Github: https://github.com/bstnxbt/dflash-mlx?tab=readme-ov-file

oMLX (v0.3.5 RC1): https://omlx.ai

I'm not affiliated with any of the developers. Since the Qwen3.5 27B model is so good for the size, with speed being the only thing holding it back, I thought that this may help deploy this model locally at higher quants/full weights.

I've yet to test with OpenCode or other harness.