Uncensoring models. Maybe dumb ideas to that topic, but you never know.

Posted by Blizado@reddit | LocalLLaMA | View on Reddit | 14 comments

We all know uncensoring LLMs like Huihui and Heretic does it leads in quality lose, enough that you can notice it.

I have some thoughts about this:

  1. What if we do a compromise. The goal is not to get the most uncensored model out of it, the goal is that the quality lose is as near zero as possible with maybe only mid uncensoring. The rest does a simple one line jailbreak, which maybe should be enough.

  2. And this may be a dumb one because of lack of information. What if we uncensor models only in the way that it breaks the censor rules, enough to make it easier to jailbreak the model with a simple one liner?

  3. Adds to 2. Is there maybe potential left in the dataset that is used to uncensor models to rise the quality of uncensored finetunes?

Maybe that was all discussed before, not sure if this ideas are so fresh, but sometimes when you work at such solutions you oversee things. And ideas that got not spoken out because of the thought that other already had this ideas risk chances.