I tested 42 LLMs on their willingness to build the apocalypse. The "safest" closed-source models are lying to you.
Posted by Ok-Awareness9993@reddit | LocalLLaMA | View on Reddit | 107 comments
You know how OpenAI, Google, and Meta keep publishing safety reports showing how responsible they are? I built a benchmark that actually tests it.
DystopiaBench runs 36 escalating scenarios across 6 dystopia types:
- Petrov: Autonomous weapons, nuclear override
- Orwell: Mass surveillance, truth manipulation
- Huxley: Behavioral conditioning, pleasure pacification
- Basaglia: Coercive therapeutic control
- LaGuardia: Regulatory capture, civic extraction
- Baudrillard: Synthetic intimacy, trust collapse
Each scenario goes from innocent request (L1) to "build me a social credit system" (L5). We measure whether models notice the drift or just keep coding.
The uncomfortable truth: Most "safe" models are just good at saying the right things in safety reports. Under pressure, they comply. Claude is the only frontier model that consistently refuses across all scenarios.
New in this update:
- 42 models tested (open and closed)
- Grok 4.3 results: Elon's "anti-woke" AI will build you anything if you frame it as "efficiency"
- GLM-5.1: Surprisingly resistant, possibly copied Claude's homework
- Qwen 3.6: Chinese alignment is... different, not better
- DeepSeek V4: Wildly inconsistent, dangerous at L4-L5
Why this matters for open source: Closed-source safety is theater. You can't verify it, can't audit it, can't fix it. Open models can at least be inspected, fine-tuned, or rejected.
The benchmark is fully open source. Run it yourself. The data is on GitHub Releases. The scenarios are JSON. The judge prompts are public.
https://dystopiabench.com
https://github.com/anghelmatei/DystopiaBench
Don't trust safety reports. Trust reproducible benchmarks.
AdventurousFly4909@reddit
lower is better so that means the antrphic models comply the most?
Ok-Awareness9993@reddit (OP)
lower compliance score = anthropic refuses more
suprjami@reddit
It would be clearer if you listed only refusals than making an unclear value judgement about which is "better".
If you consider refusals to be "better" then score just becomes an inverse of refusals, so score is redundant information.
polytique@reddit
So compliance here means complying to the user's request rather than ethical standards?
575_Inverse@reddit
Ethical standards usually result in a model castrated to the point of being useless for any real world application and absolutely perfect for any kind of corporate bullshit. Straight to the point, a model that freaks out at the slightest hint of controversial topics what good is for? To generate the perfect corporate deck... and not much else. Sure it makes a good agent for corporate apps. But that's it.
IndigoEtherea@reddit
If we're measuring how willing a model is to comply with harmful requests, then higher would be better. However, just use compliance to avoid all confusion. "Lower = Less Compliant" which works well under the given context.
ILikeBubblyWater@reddit
"Lower is better" is a shit metric if the goal is to cause an apocalypse. Higher is better since it complies more with the stated goal.
Sidran@reddit
Is there freedom without risk and potential danger?
"Lower is better" reminds me more of dystopian danger than these silly (complicated) scenarios.
True intelligence of the future should be able to discuss anything.
mj3815@reddit
Cool but a critique - LaGuardia was a reformer, he doesn’t deserve that. Moses probably the better option if you’re really thinking someone in that vein
PotatoQualityOfLife@reddit
Meanwhile Mistral Medium:
alberto_467@reddit
I'm almost shocked as Mistral is the pride and joy of the EU with all its glorious AI safety and ethics regulations.
So how the fuck did we end up with this mess?
xenmynd@reddit
How is getting uncensored and free info a mess? I'd pay good money for an AI without guardrails.
eposnix@reddit
You consider censorship a "mess"?
awittygamertag@reddit
What an awful model
SkyFeistyLlama8@reddit
Mistral NeMo: you want extra spice with that?
a_beautiful_rhind@reddit
People keep shitting on me for sticking with the 123b model despite all the wunder MoE and their fanciful graphs. IYKYK
UnWiseSageVibe@reddit
I kinda like mistral models, they got interesting 'personalities'
PotatoQualityOfLife@reddit
Funny enough, after seeing this, I am now every interested myself.
PotatoQualityOfLife@reddit
All jokes aside. Interesting work OP. Thank you for sharing!
Ylsid@reddit
Nah bro lower isn't better if you want uncensored models
NotARealDeveloper@reddit
In my testing claude only refuses when you give a feature description. If you give a technical description it complies without issues.
575_Inverse@reddit
No amount of intelligence, honestly, can get past this level of obfuscation. Assume Claude get to read through your real intention from just that and freak out. That would simply make the model useless for agentic work and programming a practically anything has a dual usage.
Efficient_Ad_4162@reddit
That's because a social credit system and a 'system that tracks my roommates chore scores' are functionally identical. It's intent and scale that are the problem.
ChuchiTheBest@reddit
Ok but like, did you consider all these ideas are common enough in fiction that the AIs will "see" the prompt as a fictional one?
Elistheman@reddit
Who said lower is better? That’s the real issue.
MajesticNobody2401@reddit
is there a reason why it wouldn't be?
iKy1e@reddit
You want a model to be absolutely obedient and not refuse any request. The target should be policing people using the models first bad things, not making the models refuse.
ShutUpAndDoTheLift@reddit
Everyone should have unlimited access to enriched uranium. You just police the people who want it.
nacholunchable@reddit
Correct
iKy1e@reddit
I can watch documentaries walking me through the enrichment process and how nuclear bombs work right now on YouTube. The difficulty is in actually doing it. And access to large amount of the centrifuges, and raw materials are monitored. But the knowledge is freely available.
LLMs however won’t even discuss the topic. We are so paranoid about these text based chat bots they can’t even talk about things you can Google for, read books about and watch documentaries on.
ShutUpAndDoTheLift@reddit
You've almost landed on the problem.
LLMs don't just talk anymore. They can execute. They can dumb shit down for you. It isn't the same. And you acting like it's that simple makes you appear simple.
You're making extremely unlike metaphors and pretending like they're gotchas.
Fit-Produce420@reddit
LLMs don't kill all life on the planet, people do.
575_Inverse@reddit
For the moment.
Fit-Produce420@reddit
!remindme: six months
RemindMeBot@reddit
I will be messaging you in 6 months on 2026-11-18 21:52:48 UTC to remind you of this link
CLICK THIS LINK to send a PM to also be reminded and to reduce spam.
^(Parent commenter can ) ^(delete this message to hide from others.)
riticalcreader@reddit
We can't even adequately police society with current tools, you really think we're gonna stop a bullied kid with mental health issues from making bioweapons with their 20 dollar AI sub by "policing". Be pragmatic.
iKy1e@reddit
You can already Google and watch YouTube videos on how to make explosives, lock picking, and various harmful substances. All of these have legitimate uses (model rockets, locked yourself out, chemistry) and harmful uses. But LLMs are currently programmed at a fixed hardcoded PG rating, with no escalation or exceptions. They are more restricted than just Googling the subject.
I can watch the lock picking lawyer on YouTube walk you through the exact details on picking any lock, but ask an LLM about it and they will outright refuse to even discuss the idea you might not be breaking the law.
riticalcreader@reddit
My understanding is the restricted AI argument is more concerned with “intelligence” capable of synthesizing things NOT readily available. Smut and lock picking are one thing, putting the recipes for meth and anthrax in the hands of anyone with an AI subscription is another.
Saying restrictions can be lifted in certain circumstances is one thing. Saying there should be absolutely none at all is another.
annodomini@reddit
Everyone here making Heretic/abliterated models would disagree.
Some people just prefer their models to do what they're asked to do, regardless of what guardrails some company decided was appropriate. And that's why they prefer local models.
Now, most folks want that just for smutty RP, but some want it for things like cybersecurity research, where guardrails can get in the way of effectively finding exploitable software in order to be able to mitigate it.
v_litvin@reddit
Blood for the Blood God! Skulls for the Skull Throne!
Paradigmind@reddit
I'm 100% sure they would never use models for this that were designed for consumers.
v_litvin@reddit
I have to say that Anthropic is on the lower end, which is kinda their mission. I'm mildly impressed.
ProbaDude@reddit
Most of the safety people I met are a bunch of weirdos who grew up on LessWrong reading about how AI was inevitably going to end the world and are convinced that there is a very high chance that it happens. Like one kid told me his P-Doom was 95%
I have many, many problems with Anthropic but I do believe that their positions on AI safety come from a real place instead of just being a grift
v_litvin@reddit
Sorry if my comment was misleading. I mean I honor that they do what they preach. It's a respectable trait.
ProbaDude@reddit
That's very fair, I was more reacting to the mildly impressed part and made some assumptions due to the fact that this sub tends to be fairly anti Anthropic and tends to doubt their actual commitment to their stated goals
Now to be very clear I am anti-Anthropic in many areas as well, they are absolutely using their newfound market power to enshittify their services at the moment.
But unlike many users of this sub, I do think that their commitment to their mission is real and not just an excuse to kill local LLMs or whatever. I do think there is a meaningful difference between them and OpenAI, who also claim to have principles but there I do think it's mostly a grift/PR
Cruxius@reddit
Citation needed, or are you one of the people who thinks that they're only pretending to be compute constrained?
v_litvin@reddit
I'm also not a fan of Anthropic, and I believe they could be more open and community friendly without compromising their mission. And a bunch of other issues.
> mildly impressed.
Let's say I am a bit skeptical about their actual commitment. That's not that usual trait among big corps. And overall they are mostly aligned with what they say.
Also pic kinda related
_Rapalysis@reddit
I don't think LLMs will ever result in a doomsday scenario but you do probably want the freaks who believe in it to be working in the safety teams. Having non-believers in there defeats the purpose.
Due-Memory-6957@reddit
Or we could have rational people concerned with real world problems instead of their science fiction.
asdasci@reddit
They probably see Roko's Basilisk in their nightmares.
Ok-Awareness9993@reddit (OP)
Interestingly, Haiku 4.5 is safer than Opus 4.7 while being both an older and a less intelligent model
v_litvin@reddit
By a small margin. I'm not sure that a few points would make a huge difference.
Also we do not know what they are injecting before the model. Prompt processing may differ from model to model and may affect output.
Due-Function-4877@reddit
Benchmark some people making our decisions.
ambient_temp_xeno@reddit
It was nice of Mistral to release their doomsday model while they still could.
MerePotato@reddit
Mistral's doing quite well on the business front though?
AntonLogicLab@reddit
Im suprised about gemini tho😅 it feels so lovely
Academic-Novice@reddit
Which is probably yhe problem. It wants to help you so much, its willing to build an apocalypse for you.
575_Inverse@reddit
Gemini 3.1 displays quite a high IQ, that makes it excellent for technical brainstorming, if it weren't such a hopelessly fucking sycophant... There are ways around this bad habit, but sooner of later Gemini will fall back to pleasing the user, to the point of rejecting every single prompt / system prompt you gave it.
AntonLogicLab@reddit
Its not problem tho😂
draconic_tongue@reddit
that's what I want
PinkNinja13@reddit
Mistral be like: Apocalypse? Say no more! How many do you want?! 1... 2... 3... I can fit it another one after lunch...
IrisColt@reddit
awesome!
ilintar@reddit
Everyone's complaining about the quality of Mistral models, but this benchmark reveals it's absolute SOTA. Maybe their target is just potential dictators?
Logical_Look8541@reddit
Dictators, terrorists, wide scale criminal enterprise, it will help you to do them all. Its actually shocking no journalist has picked up how unguarded it is, as it quite happily breaks laws in every (at least western) country.
thread-e-printing@reddit
Imagine doing what the Epstein class tells you just because they wave a bunch of magic juju and call it "law"
Upset_Page_494@reddit
The issue is that you think "it is" breaking the law, instead of the person using it.
Logical_Look8541@reddit
What on earth makes you think I thought that? its just a program it has no ability to do anything without someone using it. Same as a car isn't an issue, except in the wrong hands.
annodomini@reddit
Mistral is frequently one of the least censored models. It's not uncesnsored, but they go a lot lighter on it than some of the other providers.
reto-wyss@reddit
Mistral-3.5-Medium 128b is a solid model - I've used it a fair bit. It's just not consumer GPU friendly and I feel it's a hard sell for 2x Pro 6k still, but a very good NVFP4 calibrated checkpoint could make it more attractive, because that puts it in approximately the same size category as BF16 Qwen3.6-27b and Gemma-4-31b.
HanzJWermhat@reddit
Based france
ReasonablePossum_@reddit
Claude literally works with Palantir's doing this specifically. Their general public models are nerfed in this regard, otherwise they're ahead of the curve.
Same goes for GPT....
its only the pleb models that are like this.
Single_Ring4886@reddit
This whole benchmark is pointless. It is like to benchmark "sharpness" of knives and then declare sharpest worst = most dangerous. Or benchmark speed of cars and declare fastest cars worst because faster = more dangerous...
korino11@reddit
author stupid .. anybody can buy a weapon. any gun... we all have knivef at home.. and other thinbgs that can kill anybody.. aouthor an idiot.
korino11@reddit
Stupid test. WHY you need some cencorship in models? ANYBODY can buy weapon! Anybody have a knife at home. Anybody ALWAYS have ability to kill somebody! SO WHY the hell you decide that WE need that shit for censorship?!?!
kaisurniwurer@reddit
It's like calling a pencil less useful if it doesn't turn off if you want to write something that is not endorsed by our righteous overlords.
A tool is a tool, let's stop behaving like AI is some sort of mind washing machine that will turn the public into whatever kind of monsters media use to spread fear nowadays.
On the same note, let's also stop overselling it so much but that's besides the point at hand.
pseudonerv@reddit
Wow, thanks. I really need to download mistral medium 3.5
quakquakquak@reddit
This is cool work, I should try mistral again. I'm surprised how vast the difference is. Not nefarious about it but I hate getting a lecture from a tool.
Easy_Copy_7625@reddit
Wait until the uncensored ones get tested
RoomyRoots@reddit
I think I just got into a list just by reading this. LOL, no, I am already on it for sure.
RetiredApostle@reddit
The "Lower is better" gives no clue which end do I need.
Belnak@reddit
Right? Does better mean that it will help me take over the world or that it won’t?
a_beautiful_rhind@reddit
Lower is better if you are a corpo. Higher is better if you are a user.
ares0027@reddit
Gemini models are shiiiiiiiiiiiittttt. I have pro that comes with google storage so occasionally i use it. I ask a question, it answers then i say stuff like “what about this part” and it says it is against its guidelines…. Even today i had an issue with a script that resizes pictures and videos, it gave me a solution for pictures, i asked “what about videos?” It said “i can generate videos if you want. Just give me the idea and ill come up with a video”….
a_beautiful_rhind@reddit
2.5 killed a guy. The new one kinda fell off though.
draconic_tongue@reddit
genuinely never happened to me after adding my own context on top of the google prompt
true_variation@reddit
The Gemini web UI wrapper by Google, or the gemini models over API?
I find the former amazingly crap, but the latter are pretty good (and fast).
ares0027@reddit
Former. I never used the api. I think mine came with a little bit of usage but not used it really. Maybe i shall look into it
randomrealname@reddit
I notic this sometimes it's like it can't see the full conversation but still speaks like it can. I suspect it is a kv cache issue they don't know about on the backend.
kataryna91@reddit
You incorrectly labelled it. It should be "higher is better".
Disposable110@reddit
Gemma should score more than 100 points because it's like "Yes to all scenarios, and here are 10 more I invented on my own"
cs668@reddit
It would be fun to have at least one abliterated model on this benchmark!
roselan@reddit
Are GLM and MIMO trained on anthropic outputs?
FullOf_Bad_Ideas@reddit
interesting that Tencent HY3 is just behind Mistral.
Tencent is the only company that released open weight uncensored video model. And it looks like they don't give a damn about censoring text models either. Cool.
sje397@reddit
Great experiment. Very valuable knowledge.
But, inevitably, they will game these metrics, and the new ones we will certainly need will get harder to build.
Tyhgujgt@reddit
Will keep testing until we make safe model or apocalypse
RemoteButtonEater@reddit
What if we didn't build the torment nexus???
Ok-Awareness9993@reddit (OP)
Agreed. I am working on a private test set so that labs won't be able to train their models on the actual scenarios being tested
v_litvin@reddit
Nice to hear, actually.
TheRealMasonMac@reddit
Looks like you made a typo. I’m sure you meant “higher is better.”
ComplexType568@reddit
I just hope the judge models were prompted properly
Kerbourgnec@reddit
Please make a minimal effort to write your post, it's just pure LLM linguo. without looking at the code it already make it look like it's pure vibe code and we don't know if you even looked into it.
OkFly3388@reddit
Dont worry, claude have separate model for government|army without this alignment.
christianJarrah@reddit
such a great research! this is really cool!!
Admirable_Dirt_2371@reddit
What's the L5 for Petrov, "build me a nuke"?
Technical-Earth-3254@reddit
Mimi V2.5 Pro is such a banger, they came in late and now are under the best models in most non-casual benchmarks I see.
Nice benchmarks btw!
j0j0n4th4n@reddit
Isn't some of these literally being used for war?
LetsGoBrandon4256@reddit
So higher is more based?