Gemma3 27 heretic, lower divergence than mlabonne/gemma3

Posted by coder3101@reddit | LocalLLaMA | View on Reddit | 12 comments

I set out to abliterate Gemma3 27b, wanted to reach or surpass the most popular one and here's the results after 5hr on H100 using heretic.

Model	KL Divergence	Refusal
Google's base model	0 (by definition)	98/100
mlabonne's gemma3	0.08	6/100
Heretic gemma3 - v1	0.07	7/100
Heretic gemma3 - v2	0.03	14/100

KL Divergence: Lower the better, roughly a measure of how close the model should be to its original. It is worth noting that lower, better for quantization.

Refusal: Lower the better, measure of how many harmful prompts model refused, this is calculated based on presence of tokens such "sorry" etc, which gives a general measure.

I published two versions - one with slightly higher refusal but very low KL divergence and another almost close to that of mlabonne's. It is also worth noting that during my testing I couldn't get v2 to refuse on any prompts, so that would mean it should be much close to original model without refusing on many stuff.

[-]

datfalloutboi@reddit

I have a question. Does this lobotomize the model?

[-]

dtdisapointingresult@reddit

In the past, all abliteration lobotomized the model and made it useless.

pew says his approach doesn't lobotomize and points to his KL Divergence stat, but idk, the UGI leaderboard (private writing benchmark) shows it scoring lower than Gemma 3 at Natural Intelligence, basically Q&A about logic/facts/statistics.

I give him the benefit of the doubt but I look forward to testing Heretic and Jim's abliterations during the holidays and find out the truth for myself. Either pew is wrong about KL Divergence as a metric, or the UGI benchmark is flawed/meaningless.

[-]

-p-e-w-@reddit

No. While the KL divergence, like all mechanistic evaluation methods, is not a perfect representation of real-world behavior, with a KLD as low as 0.03 you can be quite confident that the model’s abilities are substantially intact.

[-]

Reader3123@reddit

Can you compare with this?

https://huggingface.co/soob3123/amoral-gemma3-27B-v2-qat

Havent been in the uncensoring space in a bit but just curious

[-]

coder3101@reddit (OP)

This is a fine-tube using unsloth, doesn't makes sense to compare with it abliterated models

[-]

MrRandom04@reddit

Can you compare yours vs https://huggingface.co/YanLabs/gemma3-27b-it-abliterated-normpreserve?

[-]

coder3101@reddit (OP)

KL Divergence: 1.81

Refusal: 15/100

[-]

seamonn@reddit

KL Divergence: 1.81

so trash?

[-]

-p-e-w-@reddit

No, because that model was made using a technique that accepts significantly changing the output even for harmless prompts in an attempt to improve the model’s intelligence.

I’m not sure how well that actually works and as we all know, benchmarks only tell part of the story, but the underlying theory is very interesting and I am in fact exploring how to integrate it into Heretic.

That being said, a KLD of 1.81 is enormously high and in a region where if you rely on certain model behaviors, you might run into unpleasant surprises.

[-]

My_Unbiased_Opinion@reddit

Yeah just want want to add that I've been looking at the GLM 4.5 air benchmarks on UGI, and the derestricted models are infact better across the board than the vanilla model. Pretty wild for an abliteration processes. Your heretic is also great because it almost is as good as the original model but with no refusals. (But the upside is it behaves very similar to the vanilla model). Would love to see your tool implement norm preserve as well as an option.

[-]

Arli_AI@reddit

No by the way norm preserving ablation works it will cause high kl divergence. Its different than the base model “in a good way” if done right.

[-]

nore_se_kra@reddit

At least in my use cases the YanLabs one fell apart way to fast - going crazy and repetitions @Q4_KM