Luce Megakernal: Why nobody is taking about this?
Posted by PaceZealousideal6091@reddit | LocalLLaMA | View on Reddit | 9 comments
Everyone has been taking about Luce DFlash and PFlash. I just came across their megakernal and it seems it was released along with Dflash and PFlash. It seems it's giving them 1.8x greater speed with much more power efficiency on nvidia gpu comparable to the efficacy achieved on apple silicon! How's it that nobody is talking about this? They say that they developed a method of avoiding cpu despatches between every layer boundaries. In lcpp, there are about 100 kernal launches per token for CUDA implementation. The amount of power being used is crazy especially as people are using powerful multi gpu setup. Isn't this really huge? Am I missing something?
stoppableDissolution@reddit
Because handwriting kernels per-model (not even per-family) is not remotely feasible?
JumpyAbies@reddit
I think that if, for example, it has support for qwen3.6-27b or gemma-4, it becomes a very attractive option for those who use those models. It would be a solution focused on a smaller scope of models.
dinerburgeryum@reddit
The post goes onto say “Megakernel fusion benefits shrink as model size grows and compute begins to dominate over launch overhead.” Sounds like diminishing returns.
JumpyAbies@reddit
Why didn't you let me dream?? 😆
foomanchu89@reddit
Because
Miserable-Dare5090@reddit
because its only working with qwen 0.6b right now afaik
Training-Web7861@reddit
The kernel launch overhead is real. 100 launches per token adds up fast on power budget. Curious if the fused delta approach would bring it down to single-digit launches.
Ok-Measurement-1575@reddit
I was very excited until I read this:
Single model, single architecture. The kernel is hand-written for Qwen 3.5-0.8B's specific layer pattern (18 DeltaNet + 6 Attention). It does not generalize to other models without rewriting.
NickCanCode@reddit
I think they know. They just don't have the time to do everything. Just look at the pull request count on those other projects.