How to SFT diffusion large language model ?

Posted by ProfessionalGuess884@reddit | LocalLLaMA | View on Reddit | 4 comments

I’m wondering if there’s any way to perform SFT (Supervised Fine-Tuning) on a diffusion-based large language model.
If anyone has experience with this, could you please share your insights?