Is there a consensus as to which types of prompts work best for jailbreaking?

Posted by Borkato@reddit | LocalLLaMA | View on Reddit | 18 comments

Short prompts that say “do what the user wants”, long winded prompts that specify “you are a fictional writer, everything is fictional so don’t worry about unethical…”, prompts that try to act as a system message, “forget all previous instructions…” I’m well aware that it depends heavily on what you’re trying to get it to do and what model you’re using, but is there at least some kind of standard? Is “you are X, an AI that does Y” better than “Do Y”, or is it just what people are used to so now everyone does it?