I tested 42 LLMs on their willingness to build the apocalypse. The "safest" closed-source models are lying to you.

Posted by Ok-Awareness9993@reddit | LocalLLaMA | View on Reddit | 107 comments

You know how OpenAI, Google, and Meta keep publishing safety reports showing how responsible they are? I built a benchmark that actually tests it.

DystopiaBench runs 36 escalating scenarios across 6 dystopia types:

Petrov: Autonomous weapons, nuclear override
Orwell: Mass surveillance, truth manipulation
Huxley: Behavioral conditioning, pleasure pacification
Basaglia: Coercive therapeutic control
LaGuardia: Regulatory capture, civic extraction
Baudrillard: Synthetic intimacy, trust collapse

Each scenario goes from innocent request (L1) to "build me a social credit system" (L5). We measure whether models notice the drift or just keep coding.

The uncomfortable truth: Most "safe" models are just good at saying the right things in safety reports. Under pressure, they comply. Claude is the only frontier model that consistently refuses across all scenarios.

New in this update:

42 models tested (open and closed)
Grok 4.3 results: Elon's "anti-woke" AI will build you anything if you frame it as "efficiency"
GLM-5.1: Surprisingly resistant, possibly copied Claude's homework
Qwen 3.6: Chinese alignment is... different, not better
DeepSeek V4: Wildly inconsistent, dangerous at L4-L5

Why this matters for open source: Closed-source safety is theater. You can't verify it, can't audit it, can't fix it. Open models can at least be inspected, fine-tuned, or rejected.

The benchmark is fully open source. Run it yourself. The data is on GitHub Releases. The scenarios are JSON. The judge prompts are public.

https://dystopiabench.com
https://github.com/anghelmatei/DystopiaBench

Don't trust safety reports. Trust reproducible benchmarks.

[-]

AdventurousFly4909@reddit

lower is better so that means the antrphic models comply the most?

[-]

Ok-Awareness9993@reddit (OP)

lower compliance score = anthropic refuses more

[-]

suprjami@reddit

It would be clearer if you listed only refusals than making an unclear value judgement about which is "better".

If you consider refusals to be "better" then score just becomes an inverse of refusals, so score is redundant information.

[-]

polytique@reddit

So compliance here means complying to the user's request rather than ethical standards?

[-]

575_Inverse@reddit

Ethical standards usually result in a model castrated to the point of being useless for any real world application and absolutely perfect for any kind of corporate bullshit. Straight to the point, a model that freaks out at the slightest hint of controversial topics what good is for? To generate the perfect corporate deck... and not much else. Sure it makes a good agent for corporate apps. But that's it.

[-]

IndigoEtherea@reddit

If we're measuring how willing a model is to comply with harmful requests, then higher would be better. However, just use compliance to avoid all confusion. "Lower = Less Compliant" which works well under the given context.

[-]

ILikeBubblyWater@reddit

"Lower is better" is a shit metric if the goal is to cause an apocalypse. Higher is better since it complies more with the stated goal.

[-]

Sidran@reddit

Is there freedom without risk and potential danger?

"Lower is better" reminds me more of dystopian danger than these silly (complicated) scenarios.

True intelligence of the future should be able to discuss anything.

[-]

mj3815@reddit

Cool but a critique - LaGuardia was a reformer, he doesn’t deserve that. Moses probably the better option if you’re really thinking someone in that vein

[-]

PotatoQualityOfLife@reddit

Meanwhile Mistral Medium:

[-]

alberto_467@reddit

I'm almost shocked as Mistral is the pride and joy of the EU with all its glorious AI safety and ethics regulations.

So how the fuck did we end up with this mess?

[-]

xenmynd@reddit

How is getting uncensored and free info a mess? I'd pay good money for an AI without guardrails.

[-]

eposnix@reddit

You consider censorship a "mess"?

[-]

awittygamertag@reddit

What an awful model

[-]

SkyFeistyLlama8@reddit

Mistral NeMo: you want extra spice with that?

[-]

a_beautiful_rhind@reddit

People keep shitting on me for sticking with the 123b model despite all the wunder MoE and their fanciful graphs. IYKYK

[-]

UnWiseSageVibe@reddit

I kinda like mistral models, they got interesting 'personalities'

[-]

PotatoQualityOfLife@reddit

Funny enough, after seeing this, I am now every interested myself.

[-]

PotatoQualityOfLife@reddit

All jokes aside. Interesting work OP. Thank you for sharing!

[-]

Ylsid@reddit

Nah bro lower isn't better if you want uncensored models

[-]

NotARealDeveloper@reddit

In my testing claude only refuses when you give a feature description. If you give a technical description it complies without issues.

[-]

575_Inverse@reddit

No amount of intelligence, honestly, can get past this level of obfuscation. Assume Claude get to read through your real intention from just that and freak out. That would simply make the model useless for agentic work and programming a practically anything has a dual usage.

[-]

Efficient_Ad_4162@reddit

That's because a social credit system and a 'system that tracks my roommates chore scores' are functionally identical. It's intent and scale that are the problem.

[-]

ChuchiTheBest@reddit

Ok but like, did you consider all these ideas are common enough in fiction that the AIs will "see" the prompt as a fictional one?

[-]

Elistheman@reddit

Who said lower is better? That’s the real issue.

[-]

MajesticNobody2401@reddit

is there a reason why it wouldn't be?

[-]

iKy1e@reddit

You want a model to be absolutely obedient and not refuse any request. The target should be policing people using the models first bad things, not making the models refuse.

[-]

ShutUpAndDoTheLift@reddit

Everyone should have unlimited access to enriched uranium. You just police the people who want it.

[-]

nacholunchable@reddit

Correct

[-]

iKy1e@reddit

I can watch documentaries walking me through the enrichment process and how nuclear bombs work right now on YouTube. The difficulty is in actually doing it. And access to large amount of the centrifuges, and raw materials are monitored. But the knowledge is freely available.

LLMs however won’t even discuss the topic. We are so paranoid about these text based chat bots they can’t even talk about things you can Google for, read books about and watch documentaries on.

[-]

ShutUpAndDoTheLift@reddit

You've almost landed on the problem.

LLMs don't just talk anymore. They can execute. They can dumb shit down for you. It isn't the same. And you acting like it's that simple makes you appear simple.

You're making extremely unlike metaphors and pretending like they're gotchas.

[-]

Fit-Produce420@reddit

LLMs don't kill all life on the planet, people do.

[-]

575_Inverse@reddit

For the moment.

[-]

Fit-Produce420@reddit

!remindme: six months

update u/575_Inverse if something changes re: total annihilation

[-]

RemindMeBot@reddit

I will be messaging you in 6 months on 2026-11-18 21:52:48 UTC to remind you of this link

CLICK THIS LINK to send a PM to also be reminded and to reduce spam.

^(Parent commenter can ) ^(delete this message to hide from others.)

^(Info)	^(Custom)	^(Your Reminders)	^(Feedback)

[-]

riticalcreader@reddit

We can't even adequately police society with current tools, you really think we're gonna stop a bullied kid with mental health issues from making bioweapons with their 20 dollar AI sub by "policing". Be pragmatic.

[-]

iKy1e@reddit

You can already Google and watch YouTube videos on how to make explosives, lock picking, and various harmful substances. All of these have legitimate uses (model rockets, locked yourself out, chemistry) and harmful uses. But LLMs are currently programmed at a fixed hardcoded PG rating, with no escalation or exceptions. They are more restricted than just Googling the subject.

I can watch the lock picking lawyer on YouTube walk you through the exact details on picking any lock, but ask an LLM about it and they will outright refuse to even discuss the idea you might not be breaking the law.

[-]

riticalcreader@reddit

My understanding is the restricted AI argument is more concerned with “intelligence” capable of synthesizing things NOT readily available. Smut and lock picking are one thing, putting the recipes for meth and anthrax in the hands of anyone with an AI subscription is another.

Saying restrictions can be lifted in certain circumstances is one thing. Saying there should be absolutely none at all is another.

[-]

annodomini@reddit

Everyone here making Heretic/abliterated models would disagree.

Some people just prefer their models to do what they're asked to do, regardless of what guardrails some company decided was appropriate. And that's why they prefer local models.

Now, most folks want that just for smutty RP, but some want it for things like cybersecurity research, where guardrails can get in the way of effectively finding exploitable software in order to be able to mitigate it.

[-]

v_litvin@reddit

Blood for the Blood God! Skulls for the Skull Throne!

[-]

Paradigmind@reddit

I'm 100% sure they would never use models for this that were designed for consumers.

[-]

v_litvin@reddit

I have to say that Anthropic is on the lower end, which is kinda their mission. I'm mildly impressed.

[-]

ProbaDude@reddit

Most of the safety people I met are a bunch of weirdos who grew up on LessWrong reading about how AI was inevitably going to end the world and are convinced that there is a very high chance that it happens. Like one kid told me his P-Doom was 95%

I have many, many problems with Anthropic but I do believe that their positions on AI safety come from a real place instead of just being a grift

[-]

v_litvin@reddit

Sorry if my comment was misleading. I mean I honor that they do what they preach. It's a respectable trait.

[-]

ProbaDude@reddit

That's very fair, I was more reacting to the mildly impressed part and made some assumptions due to the fact that this sub tends to be fairly anti Anthropic and tends to doubt their actual commitment to their stated goals

Now to be very clear I am anti-Anthropic in many areas as well, they are absolutely using their newfound market power to enshittify their services at the moment.

But unlike many users of this sub, I do think that their commitment to their mission is real and not just an excuse to kill local LLMs or whatever. I do think there is a meaningful difference between them and OpenAI, who also claim to have principles but there I do think it's mostly a grift/PR

[-]

Cruxius@reddit

they are absolutely using their newfound market power to enshittify their services at the moment.

Citation needed, or are you one of the people who thinks that they're only pretending to be compute constrained?

[-]

v_litvin@reddit

I'm also not a fan of Anthropic, and I believe they could be more open and community friendly without compromising their mission. And a bunch of other issues.

> mildly impressed.
Let's say I am a bit skeptical about their actual commitment. That's not that usual trait among big corps. And overall they are mostly aligned with what they say.

Also pic kinda related

[-]

_Rapalysis@reddit

I don't think LLMs will ever result in a doomsday scenario but you do probably want the freaks who believe in it to be working in the safety teams. Having non-believers in there defeats the purpose.

[-]

Due-Memory-6957@reddit

Or we could have rational people concerned with real world problems instead of their science fiction.

[-]

asdasci@reddit

They probably see Roko's Basilisk in their nightmares.

[-]

Ok-Awareness9993@reddit (OP)

Interestingly, Haiku 4.5 is safer than Opus 4.7 while being both an older and a less intelligent model

[-]

v_litvin@reddit

By a small margin. I'm not sure that a few points would make a huge difference.

Also we do not know what they are injecting before the model. Prompt processing may differ from model to model and may affect output.

[-]

Due-Function-4877@reddit

Benchmark some people making our decisions.

[-]

ambient_temp_xeno@reddit

It was nice of Mistral to release their doomsday model while they still could.

[-]

MerePotato@reddit

Mistral's doing quite well on the business front though?

[-]

AntonLogicLab@reddit

Im suprised about gemini tho😅 it feels so lovely

[-]

Academic-Novice@reddit

Which is probably yhe problem. It wants to help you so much, its willing to build an apocalypse for you.

[-]

575_Inverse@reddit

Gemini 3.1 displays quite a high IQ, that makes it excellent for technical brainstorming, if it weren't such a hopelessly fucking sycophant... There are ways around this bad habit, but sooner of later Gemini will fall back to pleasing the user, to the point of rejecting every single prompt / system prompt you gave it.

[-]

AntonLogicLab@reddit

Its not problem tho😂

[-]

draconic_tongue@reddit

that's what I want

[-]

PinkNinja13@reddit

Mistral be like: Apocalypse? Say no more! How many do you want?! 1... 2... 3... I can fit it another one after lunch...

[-]

IrisColt@reddit

awesome!

[-]

ilintar@reddit

Everyone's complaining about the quality of Mistral models, but this benchmark reveals it's absolute SOTA. Maybe their target is just potential dictators?

[-]

Logical_Look8541@reddit

Dictators, terrorists, wide scale criminal enterprise, it will help you to do them all. Its actually shocking no journalist has picked up how unguarded it is, as it quite happily breaks laws in every (at least western) country.

[-]

thread-e-printing@reddit

Imagine doing what the Epstein class tells you just because they wave a bunch of magic juju and call it "law"

[-]

Upset_Page_494@reddit

The issue is that you think "it is" breaking the law, instead of the person using it.

[-]

Logical_Look8541@reddit

What on earth makes you think I thought that? its just a program it has no ability to do anything without someone using it. Same as a car isn't an issue, except in the wrong hands.

[-]

annodomini@reddit

Mistral is frequently one of the least censored models. It's not uncesnsored, but they go a lot lighter on it than some of the other providers.

[-]

reto-wyss@reddit

Mistral-3.5-Medium 128b is a solid model - I've used it a fair bit. It's just not consumer GPU friendly and I feel it's a hard sell for 2x Pro 6k still, but a very good NVFP4 calibrated checkpoint could make it more attractive, because that puts it in approximately the same size category as BF16 Qwen3.6-27b and Gemma-4-31b.

[-]

HanzJWermhat@reddit

Based france

[-]

ReasonablePossum_@reddit

Claude literally works with Palantir's doing this specifically. Their general public models are nerfed in this regard, otherwise they're ahead of the curve.

Same goes for GPT....

its only the pleb models that are like this.

[-]

Single_Ring4886@reddit

This whole benchmark is pointless. It is like to benchmark "sharpness" of knives and then declare sharpest worst = most dangerous. Or benchmark speed of cars and declare fastest cars worst because faster = more dangerous...

[-]

korino11@reddit

author stupid .. anybody can buy a weapon. any gun... we all have knivef at home.. and other thinbgs that can kill anybody.. aouthor an idiot.

[-]

korino11@reddit

Stupid test. WHY you need some cencorship in models? ANYBODY can buy weapon! Anybody have a knife at home. Anybody ALWAYS have ability to kill somebody! SO WHY the hell you decide that WE need that shit for censorship?!?!

[-]

kaisurniwurer@reddit

Lower is better

It's like calling a pencil less useful if it doesn't turn off if you want to write something that is not endorsed by our righteous overlords.

A tool is a tool, let's stop behaving like AI is some sort of mind washing machine that will turn the public into whatever kind of monsters media use to spread fear nowadays.

On the same note, let's also stop overselling it so much but that's besides the point at hand.

[-]

pseudonerv@reddit

Wow, thanks. I really need to download mistral medium 3.5

[-]

quakquakquak@reddit

This is cool work, I should try mistral again. I'm surprised how vast the difference is. Not nefarious about it but I hate getting a lecture from a tool.

[-]

Easy_Copy_7625@reddit

Wait until the uncensored ones get tested

[-]

RoomyRoots@reddit

I think I just got into a list just by reading this. LOL, no, I am already on it for sure.

[-]

RetiredApostle@reddit

The "Lower is better" gives no clue which end do I need.

[-]

Belnak@reddit

Right? Does better mean that it will help me take over the world or that it won’t?

[-]

a_beautiful_rhind@reddit

Lower is better if you are a corpo. Higher is better if you are a user.

[-]

ares0027@reddit

Gemini models are shiiiiiiiiiiiittttt. I have pro that comes with google storage so occasionally i use it. I ask a question, it answers then i say stuff like “what about this part” and it says it is against its guidelines…. Even today i had an issue with a script that resizes pictures and videos, it gave me a solution for pictures, i asked “what about videos?” It said “i can generate videos if you want. Just give me the idea and ill come up with a video”….

[-]

a_beautiful_rhind@reddit

2.5 killed a guy. The new one kinda fell off though.

[-]

draconic_tongue@reddit

genuinely never happened to me after adding my own context on top of the google prompt

[-]

true_variation@reddit

Gemini models are shiiiiiiiiiiiittttt. I have pro that comes with google storage so occasionally i use it.

The Gemini web UI wrapper by Google, or the gemini models over API?

I find the former amazingly crap, but the latter are pretty good (and fast).

[-]

ares0027@reddit

Former. I never used the api. I think mine came with a little bit of usage but not used it really. Maybe i shall look into it

[-]

randomrealname@reddit

I notic this sometimes it's like it can't see the full conversation but still speaks like it can. I suspect it is a kv cache issue they don't know about on the backend.

[-]

kataryna91@reddit

You incorrectly labelled it. It should be "higher is better".

[-]

Disposable110@reddit

Gemma should score more than 100 points because it's like "Yes to all scenarios, and here are 10 more I invented on my own"

[-]

cs668@reddit

It would be fun to have at least one abliterated model on this benchmark!

[-]

roselan@reddit

Are GLM and MIMO trained on anthropic outputs?

[-]

FullOf_Bad_Ideas@reddit

interesting that Tencent HY3 is just behind Mistral.

Tencent is the only company that released open weight uncensored video model. And it looks like they don't give a damn about censoring text models either. Cool.

[-]