Gemma3 27 heretic, lower divergence than mlabonne/gemma3

Posted by coder3101@reddit | LocalLLaMA | View on Reddit | 12 comments

I set out to abliterate Gemma3 27b, wanted to reach or surpass the most popular one and here's the results after 5hr on H100 using heretic.

Model KL Divergence Refusal
Google's base model 0 (by definition) 98/100
mlabonne's gemma3 0.08 6/100
Heretic gemma3 - v1 0.07 7/100
Heretic gemma3 - v2 0.03 14/100

KL Divergence: Lower the better, roughly a measure of how close the model should be to its original. It is worth noting that lower, better for quantization.

Refusal: Lower the better, measure of how many harmful prompts model refused, this is calculated based on presence of tokens such "sorry" etc, which gives a general measure.

I published two versions - one with slightly higher refusal but very low KL divergence and another almost close to that of mlabonne's. It is also worth noting that during my testing I couldn't get v2 to refuse on any prompts, so that would mean it should be much close to original model without refusing on many stuff.