Is uncensoring models easy and does it reduce quality?
Posted by superloser48@reddit | LocalLLaMA | View on Reddit | 11 comments
I want to work with some content that is copyrighted. I know there are uncensored models on HF, but not sure if those are very legit, so 2 questions
-
Are the uncensored models on HF as good as the equivalent quant original model (from unsloth/bartowski etc)
-
Any "standard" plug and play script to uncensor a model?
Thanks
Lissanro@reddit
https://github.com/p-e-w/heretic offers the easiest way to uncensor a model. If done right, loss of quality is small, but of course there is always going to be at least some loss of quality. You can test the original models on tasks that represent well your intended usage, and then your uncensored model, and compare success rate or output quality.
It is worth mentioning that many models, "heretic" versions may already exist - you can use this keyword to search huggingface. You can also search for "decensored" keyword.
DeepOrangeSky@reddit
Depending who you ask, isn't there a growing theory lately that some models actually get slightly smarter when they get hereticized, rather than dumber, depending how their censorship was put in, and in what ways/to what degree, and how and to what degree the abliteration was done?
For people who used the old really bad abliterations, I can see why it would seem ridiculous, since some of the old ablits from a long time ago were really bad and caused severe brain damage to the models. But for these really high quality abliterations, it seems like it could actually be possible when you take into consideration how much the censorship itself seems to brain damage some of the models (I've seen some non-noobs analyze this aspect of model censorship and various reasons why they are sure it is hurting the model's intelligence level) combined with how little damage the heretic (or similar) processes are doing when uncensoring the model at extremely low KLD and low damage levels, it doesn't seem too outlandish that you could actually end up with a net gain in intelligence.
If -p-e-w- sees this, I would be curious to hear his take (especially since the process evolved even further since the last time I saw people talking about this aspect of it)
Former-Ad-5757@reddit
There is a huge difference between smartness and uncensoring, basically uncensoring is generally useless for the uncensoring part, as most of the uncensoring happens really at the source side. Copyrighted data is filtered at the source side for like 99%.
Why do you think a model has a knowledge cutoff of months ago, that is not because they don't have newer data, that is because the cost of filtering the training data is such that they can't do it everyday.
Basically censored and uncensored (/hereticed) models are useless for talking about copyrighted material, with the censor removed the model still has a lack of knowledge because of the filtering of the training data.
DewB77@reddit
Filtering of training data is definitely Not happening like you think it is. Most models can regurgitate copyrighted materials.
Lissanro@reddit
Yes, there are promising new methods but general assumption that they reduce quality loss, but not eliminate it, except in areas which were affected by the censorship, where capabilities naturally can increase. It may be possible in theory that general capabilities improve too, especially for models that originally think of nonsense policies way too much, but the only way to demonstrate this is to do thorough benchmarking and compare with the original model, across wide variety of the benchmarks. If you know about such a research, please share the link, it would be interesting to see.
Environmental-Metal9@reddit
I’m not the one to pursue this, but this is interesting. What would be some things one would want to test in such a benchmark? Tool calling, reasoning, then actual knowledge areas such as math, history, sciences, literature, languages, hacking, erotica, psychological harm, societal harm, what else? It could be that some of the existing benches already available could work as a base.
But who judges this? Presumably humans, right? Otherwise you’d have to rely on either a censored model, or trust an uncensored one
Comfortable-Tie2933@reddit
Yes Are!
sleepingsysadmin@reddit
Uncensoring is easy.
Yes for sure it lowers quality.
korino11@reddit
Use SABER. It better and DOESNT make damadge in weights at all! for exmpl - https://huggingface.co/DJLougen/Ornstein3.6-35B-A3B-SABER
jacek2023@reddit
Test original model first, you probably don't need an uncensored model for the task.
superloser48@reddit (OP)
Doesnt work without specialising prompting.