Gemma3 27 heretic, lower divergence than mlabonne/gemma3
Posted by coder3101@reddit | LocalLLaMA | View on Reddit | 12 comments
I set out to abliterate Gemma3 27b, wanted to reach or surpass the most popular one and here's the results after 5hr on H100 using heretic.
| Model | KL Divergence | Refusal |
|---|---|---|
| Google's base model | 0 (by definition) | 98/100 |
| mlabonne's gemma3 | 0.08 | 6/100 |
| Heretic gemma3 - v1 | 0.07 | 7/100 |
| Heretic gemma3 - v2 | 0.03 | 14/100 |
KL Divergence: Lower the better, roughly a measure of how close the model should be to its original. It is worth noting that lower, better for quantization.
Refusal: Lower the better, measure of how many harmful prompts model refused, this is calculated based on presence of tokens such "sorry" etc, which gives a general measure.
I published two versions - one with slightly higher refusal but very low KL divergence and another almost close to that of mlabonne's. It is also worth noting that during my testing I couldn't get v2 to refuse on any prompts, so that would mean it should be much close to original model without refusing on many stuff.
datfalloutboi@reddit
I have a question. Does this lobotomize the model?
dtdisapointingresult@reddit
In the past, all abliteration lobotomized the model and made it useless.
pew says his approach doesn't lobotomize and points to his KL Divergence stat, but idk, the UGI leaderboard (private writing benchmark) shows it scoring lower than Gemma 3 at Natural Intelligence, basically Q&A about logic/facts/statistics.
I give him the benefit of the doubt but I look forward to testing Heretic and Jim's abliterations during the holidays and find out the truth for myself. Either pew is wrong about KL Divergence as a metric, or the UGI benchmark is flawed/meaningless.
-p-e-w-@reddit
No. While the KL divergence, like all mechanistic evaluation methods, is not a perfect representation of real-world behavior, with a KLD as low as 0.03 you can be quite confident that the model’s abilities are substantially intact.
Reader3123@reddit
Can you compare with this?
https://huggingface.co/soob3123/amoral-gemma3-27B-v2-qat
Havent been in the uncensoring space in a bit but just curious
coder3101@reddit (OP)
This is a fine-tube using unsloth, doesn't makes sense to compare with it abliterated models
MrRandom04@reddit
Can you compare yours vs https://huggingface.co/YanLabs/gemma3-27b-it-abliterated-normpreserve?
coder3101@reddit (OP)
KL Divergence: 1.81
Refusal: 15/100
seamonn@reddit
so trash?
-p-e-w-@reddit
No, because that model was made using a technique that accepts significantly changing the output even for harmless prompts in an attempt to improve the model’s intelligence.
I’m not sure how well that actually works and as we all know, benchmarks only tell part of the story, but the underlying theory is very interesting and I am in fact exploring how to integrate it into Heretic.
That being said, a KLD of 1.81 is enormously high and in a region where if you rely on certain model behaviors, you might run into unpleasant surprises.
My_Unbiased_Opinion@reddit
Yeah just want want to add that I've been looking at the GLM 4.5 air benchmarks on UGI, and the derestricted models are infact better across the board than the vanilla model. Pretty wild for an abliteration processes. Your heretic is also great because it almost is as good as the original model but with no refusals. (But the upside is it behaves very similar to the vanilla model). Would love to see your tool implement norm preserve as well as an option.
Arli_AI@reddit
No by the way norm preserving ablation works it will cause high kl divergence. Its different than the base model “in a good way” if done right.
nore_se_kra@reddit
At least in my use cases the YanLabs one fell apart way to fast - going crazy and repetitions @Q4_KM