How is everyone doing DPO on Gemma 3 using Unsloth/TRL?

Posted by CartographerFun4221@reddit | LocalLLaMA | View on Reddit | 5 comments

I'm running around in circles trying to battle TRL picking up on the multimodality of Gemma 3 and expecting images in the DPO dataset, even though i'm doing text only. I set vision to off yet it always expects the image tags to be present. Having them present but empty still doesn't work.

Is there an easy way to DPO on just text with Gemma 3? I'd hate to lose 2 stages of SFT progress on this, i chose it specifically for its strong Urdu abilities (the tokenizer is twice as efficient for Nastaliq than Llama 3.1)