Is uncensoring models easy and does it reduce quality?

[-]

Lissanro@reddit

https://github.com/p-e-w/heretic offers the easiest way to uncensor a model. If done right, loss of quality is small, but of course there is always going to be at least some loss of quality. You can test the original models on tasks that represent well your intended usage, and then your uncensored model, and compare success rate or output quality.

It is worth mentioning that many models, "heretic" versions may already exist - you can use this keyword to search huggingface. You can also search for "decensored" keyword.

[-]

DeepOrangeSky@reddit

but of course there is always going to be at least some loss of quality

Depending who you ask, isn't there a growing theory lately that some models actually get slightly smarter when they get hereticized, rather than dumber, depending how their censorship was put in, and in what ways/to what degree, and how and to what degree the abliteration was done?

For people who used the old really bad abliterations, I can see why it would seem ridiculous, since some of the old ablits from a long time ago were really bad and caused severe brain damage to the models. But for these really high quality abliterations, it seems like it could actually be possible when you take into consideration how much the censorship itself seems to brain damage some of the models (I've seen some non-noobs analyze this aspect of model censorship and various reasons why they are sure it is hurting the model's intelligence level) combined with how little damage the heretic (or similar) processes are doing when uncensoring the model at extremely low KLD and low damage levels, it doesn't seem too outlandish that you could actually end up with a net gain in intelligence.

If -p-e-w- sees this, I would be curious to hear his take (especially since the process evolved even further since the last time I saw people talking about this aspect of it)

[-]

Former-Ad-5757@reddit

There is a huge difference between smartness and uncensoring, basically uncensoring is generally useless for the uncensoring part, as most of the uncensoring happens really at the source side. Copyrighted data is filtered at the source side for like 99%.

Why do you think a model has a knowledge cutoff of months ago, that is not because they don't have newer data, that is because the cost of filtering the training data is such that they can't do it everyday.

Basically censored and uncensored (/hereticed) models are useless for talking about copyrighted material, with the censor removed the model still has a lack of knowledge because of the filtering of the training data.

[-]

DewB77@reddit

Filtering of training data is definitely Not happening like you think it is. Most models can regurgitate copyrighted materials.

[-]

Lissanro@reddit

Yes, there are promising new methods but general assumption that they reduce quality loss, but not eliminate it, except in areas which were affected by the censorship, where capabilities naturally can increase. It may be possible in theory that general capabilities improve too, especially for models that originally think of nonsense policies way too much, but the only way to demonstrate this is to do thorough benchmarking and compare with the original model, across wide variety of the benchmarks. If you know about such a research, please share the link, it would be interesting to see.

[-]

Environmental-Metal9@reddit

I’m not the one to pursue this, but this is interesting. What would be some things one would want to test in such a benchmark? Tool calling, reasoning, then actual knowledge areas such as math, history, sciences, literature, languages, hacking, erotica, psychological harm, societal harm, what else? It could be that some of the existing benches already available could work as a base.

But who judges this? Presumably humans, right? Otherwise you’d have to rely on either a censored model, or trust an uncensored one

[-]

Comfortable-Tie2933@reddit

Yes Are!

[-]

sleepingsysadmin@reddit

Uncensoring is easy.

Yes for sure it lowers quality.

[-]

korino11@reddit

Use SABER. It better and DOESNT make damadge in weights at all! for exmpl - https://huggingface.co/DJLougen/Ornstein3.6-35B-A3B-SABER

[-]

jacek2023@reddit

Test original model first, you probably don't need an uncensored model for the task.

[-]

superloser48@reddit (OP)

Doesnt work without specialising prompting.