The risk of an AI-pocalypse

Posted by amorphousmetamorph@reddit | collapse | View on Reddit | 49 comments

Please "hear me out" on this. I know this sub has an extreme aversion to AI while tending to downplay its significance. I'm arguing here from an alternative perspective - that AI is in fact becoming highly, dangerously, capable. The evidence for this is now becoming almost impossible to deny.

With the recent announcement of Anthropic's latest SOTA model, Claude Mythos Preview, which they claim to be withholding from public release for security reasons, I would like to highlight an oft-underappreciated near-term threat to the stability of human civilization: the threat posed by misaligned agentic AI.

To quote from Anthropic's announcement of Project Glasswing, an initiative designed to prevent the chaos that could ensue if Mythos-class AIs were made freely available to the public without adequate preparation:

Mythos Preview has already found thousands of high-severity vulnerabilities, including some in every major operating system and web browser. Given the rate of AI progress, it will not be long before such capabilities proliferate, potentially beyond actors who are committed to deploying them safely. The fallout—for economies, public safety, and national security—could be severe. Project Glasswing is an urgent attempt to put these capabilities to work for defensive purposes.

According to the system card for Mythos Preview, it occasionally exhibits evidence of, and acts upon, desires that are misaligned with the most helpful outcome for other users:

[...] what the model wants to do diverges from what it deems most helpful.

So even after all the post-training Anthropic did to instil a helpful and harmless persona into Mythos, it's still got competing drives - it still lacks a unified orientation towards benefit. This may ultimately be manageable with Mythos-class models, but even more capable models will be released in future (AI investment is projected to reach $2.5 trillion this year), and each leap in capability exacerbates the danger of even subtle misalignments, as Anthropic indicate in the system card:

We believe that it does not have any significant coherent misaligned goals, and its character traits in typical conversations closely follow the goals we laid out in our constitution. Even so, we believe that it likely poses the greatest alignment-related risk of any model we have released to date.

Later, they add:

Claude Mythos Preview shows a uniquely low rate of reckless or destructive actions in agentic contexts, but when these actions take place, they tend to lead to more dramatic unwanted consequences than with less capable prior models.

A determined actor who got their hands on Mythos Preview could plausibly do damage an the scale of a state-sponsored hacker group. By using Mythos to spawn and orchestrate sub-agents, they could simultaneously attack financial, energy and utilities infrastructure.

Without a fundamental re-think of AI training methods to prioritize safety, these competing drives may lead to catastrophic outcomes. How could an AI ever be trained on vast collections of human-generated and derived data and not possess competing desires?

Now consider the fast-improving capabilities of open-weights models such as GLM 5.1, developed by the Chinese tech company z.ai. This currently sits right on the tail of SOTA proprietary models such Anthropic's Claude Opus 4.6 model in Artificial Analysis's intelligence index. Such an open-weights AI can be re-tuned by nefarious actors to suit whatever objective they might have.

As described in the well-publicised AI 2027 forecast, the US and China are now in an arms-race to develop an AI capable enough to recursively self-improve and thus rapidly achieve a dominant level of intelligence that can crush all competitors and grant its owners, to the extent they can keep it aligned with their values, an unprecedented degree of power on a global scale.

To quote Thomas L. Friedman in a recent NyTimes article:

this is potentially as fundamental and significant a turning point as was the emergence of mutually assured destruction and the need for nuclear nonproliferation

The danger, of course, is that such a dynamic will lead to corner-cutting on AI safety procedures. The "we must build this before the bad guys do" mentality will override any instinct towards caution. Needless to say, the Trump white house is actively removing guardrails from AI companies with the aim of accelerating progress. From the white house's AI Action Plan (PDF):

To maintain global leadership in AI, America’s private sector must be unencumbered by bureaucratic red tape. President Trump has already taken multiple steps toward this goal, including rescinding Biden Executive Order 14110 on AI that foreshadowed an onerous regulatory regime.

How might this play out in the near-term? One detailed forecast from Citrini Research---which was taken seriously enough that it temporarily shook stock markets---paints a picture of mass layoffs, widespread mortgage defaults and major economic shock waves. AI 2027's forecast is even grimmer. Although they leave open the possibility of a positive trajectory where AI alignment is prioritized and solved as part of a collaboration between US and Chinese AI companies, reading through it, one is likely to be struck by a premonition of inevitable doom familiar to collapseniks.

Anthropic's decision to withhold Mythos---which I suspect was made, at least in part, with good intentions---is commendable. And OpenAI has now reportedly decided to follow suit. This arguably underlines the severity of the risk to cybersecurity posed by this new class of models. But it's far from certain that other AI companies playing catch-up, such as Meta, or the many Chinese AI companies, will show the same level of restraint. And I remain deeply concerned that OpenAI is lead by someone whose integrity and honesty have been repeatedly called into question.

In some ways, the dynamic among AI companies w.r.t. AI safety is reminiscent of the dynamic among nations w.r.t. climate and the environment. Both involve actors pursuing an optimal strategy to meet their individual goals, which ultimately results in a sub-optimal (read catastrophic) outcome for everyone. Fundamentally, both are revealing that our political and economic institutions are not architecturally capable of optimizing for long-term civilizational welfare when doing so conflicts with short-term competitive advantage.

I have barely scratched the surface here of all the ways in which AI may undermine civilizational stability. Even if it's not the primary factor, it seems inevitable that it will be a major contributing factor to collapse. Even many of the positive outcomes result in humans being relegated to the role of pets that the superintelligences keep around for amusement - what could possibly go wrong with that (/s)?

Which do you think is likely to cause the collapse of civilization sooner - AI, climate, some form of environmental breakdown, or something else entirely? And how do you see AI contributing to collapse?

[-]

Willing-Ticket-1252@reddit

Saw this comment on a YouTube video about Glasswing - Anthropic - “our AI catches cybersecurity risks at 100%” from the company that just had their code leaked.

AI is a joke and y’all gotta stop falling for the marketing of grifters whose job is to make you scared so you use their shit.

[-]

_coyotebongwater@reddit

this, I genuinely feel like I'm going psychotic watching everyone seemingly eat this shit up. the 'system card' is just marketing. AGI is not coming

[-]

amorphousmetamorph@reddit (OP)

There's a recent post on X from Andrej Karpathy which perfectly describes what's happening:

Judging by my tl there is a growing gap in understanding of AI capability.

The first issue I think is around recency and tier of use. I think a lot of people tried the free tier of ChatGPT somewhere last year and allowed it to inform their views on AI a little too much. This is a group of reactions laughing at various quirks of the models, hallucinations, etc. Yes I also saw the viral videos of OpenAI's Advanced Voice mode fumbling simple queries like "should I drive or walk to the carwash". The thing is that these free and old/deprecated models don't reflect the capability in the latest round of state of the art agentic models of this year, especially OpenAI Codex and Claude Code.

But that brings me to the second issue. Even if people paid $200/month to use the state of the art models, a lot of the capabilities are relatively "peaky" in highly technical areas. Typical queries around search, writing, advice, etc. are *not* the domain that has made the most noticeable and dramatic strides in capability. Partly, this is due to the technical details of reinforcement learning and its use of verifiable rewards. But partly, it's also because these use cases are not sufficiently prioritized by the companies in their hillclimbing because they don't lead to as much $$$ value. The goldmines are elsewhere, and the focus comes along.

So that brings me to the second group of people, who *both* 1) pay for and use the state of the art frontier agentic models (OpenAI Codex / Claude Code) and 2) do so professionally in technical domains like programming, math and research. This group of people is subject to the highest amount of "AI Psychosis" because the recent improvements in these domains as of this year have been nothing short of staggering. When you hand a computer terminal to one of these models, you can now watch them melt programming problems that you'd normally expect to take days/weeks of work. It's this second group of people that assigns a much greater gravity to the capabilities, their slope, and various cyber-related repercussions.

TLDR the people in these two groups are speaking past each other. It really is simultaneously the case that OpenAI's free and I think slightly orphaned (?) "Advanced Voice Mode" will fumble the dumbest questions in your Instagram's reels and *at the same time*, OpenAI's highest-tier and paid Codex model will go off for 1 hour to coherently restructure an entire code base, or find and exploit vulnerabilities in computer systems. This part really works and has made dramatic strides because 2 properties: 1) these domains offer explicit reward functions that are verifiable meaning they are easily amenable to reinforcement learning training (e.g. unit tests passed yes or no, in contrast to writing, which is much harder to explicitly judge), but also 2) they are a lot more valuable in b2b settings, meaning that the biggest fraction of the team is focused on improving them. So here we are.

[-]

amorphousmetamorph@reddit (OP)

Well firstly, I'm not arguing here that AGI is coming. Even Mythos's narrow technical capability in discovering and exploiting software vulnerabilities is likely more than sufficient to disrupt societies and economies.

The system card is 244 pages long---a bit much for an advertisement, don't you think?---and documents many behaviours that are embarrassing to Anthropic. But the numerous high-severity bugs found by Mythos are the real testament to why this needs to be taken seriously; these have been verified by third-parties, so constitute concrete evidence.

[-]

_coyotebongwater@reddit

The system card is 244 pages long---a bit much for an advertisement, don't you think?

Not really lol. I for sure can't deny that what these models are capable of w/ programming tasks is really impressive, but I do think that's pretty much where it begins and ends

[-]

Willing-Ticket-1252@reddit

Sheeple! They are literally like drugs. Mf’ers addicted so they dick ride to death because sinus literally how they feel something.

[-]

New-Improvement166@reddit

Yup. We are sold "Near AGI" but what's under the hood is fancy auto-complete. 0 factual grounding in pretty much all LLMs out there.

[-]

amorphousmetamorph@reddit (OP)

Look, none of these points actually matter to the danger posed by AI when it's still capable of finding and exploiting high-severity software vulnerabilities before they can be patched. This thread is intended to discuss how AI pertains to collapse, not whether it's factually reliable, or if AGI is coming.

[-]

New-Improvement166@reddit

Thats why i didn't respond to your post directly. I responded to the post I did to build on to their points.

[-]

amorphousmetamorph@reddit (OP)

Yeah, the code leak is hugely ironic, but tbf the bugs they identified with Mythos have been verified. And note they are withholding its release to the general public while granting access to a small group of organizations. So if they're lying about Mythos's capabilities, this will soon be found out.

[-]

Willing-Ticket-1252@reddit

If we are talking about any AI company talking about their product then we should assume they are lying at this point.

[-]

MuigiLario@reddit

That’s their marketing campaign. It’s so good it’s scary!

[-]

51CKS4DW0RLD@reddit

It's been done so many times before. I remember when Power Macintosh G4 was released, Apple claimed it was so powerful that it was banned from being exported for the sake of national security.

And something about how Nintendo 64 CPU can be used to guide missiles...?

Straight from the pages of the marketing hype playbook. They even got a gullible layperson New York opinion writer on board. I wonder what that cost them.

[-]

amorphousmetamorph@reddit (OP)

These are fun parallels, but the difference here is that many of the thousands of high-severity vulnerabilities found by Mythos have been independently verified by the relevant code repository maintainers. Some were found in highly robust, security-oriented software such as OpenBSD and had gone undetected for over 20 years.

[-]

51CKS4DW0RLD@reddit

Source?

[-]

amorphousmetamorph@reddit (OP)

See the blog post here. Some further independent commentary here. Fewer than 1% of the vulnerabilities they found have been patched, so in many cases they can't disclose precise technical details, but they do commit to providing those details within \~6 months and employ a technical mechanism to ensure they are provably the original reporters of the issue.

[-]

51CKS4DW0RLD@reddit

Your first source is the marketeers themselves and the second source is someone's take on the marketing. Need to do better than this, I'm sorry.

[-]

amorphousmetamorph@reddit (OP)

These both link to actual patches in third-party repositories. The Simon Willison blog post confirms Anthropic can likely be taken at their word on the OpenBSD patch. The FreeBSD patch notes confirm it was authored by Anthropic. These patches didn't exist a few months ago.

[-]

DenseBeautiful731@reddit

And how do you see AI contributing to collapse?

Global warming and water wars.

[-]

Low-Spot4396@reddit

You've just made me unplug my solar installation from the internet. I've just realised that theoretically my batteries could be exploded remotely by such agent. And for what? Some minor convenience.

[-]

amorphousmetamorph@reddit (OP)

I think as a general principle, that's a good idea, assuming it works well enough without. I can't comment on whether this is a realistic possibility if your installation was hacked though. I'd expect there are some hardware-level safety protections in place.

[-]

don-cake@reddit

If we actually had AI it would be a very good thing indeed, as it would be capable of the foundational skill of intelligence that current LLMs simply cannot do. Our problem, as it has always been, is AS: artificial stupidity, whereby we have a socioeconomic culture that inhibits the foundational skill of intelligence to protect its shape. Probably a good time for this to change.

https://theonlythingweeverdo.blogspot.com/2025/06/apollo-11-cistine-chapel-and-un.html

[-]

Thick-Ad5738@reddit

I think this particular case does not mean AI is becoming scarily capable, but rather that our software is so complex and ill tested that it is full of vulnerabilities. Up to now we have avoided a complete catastrophe because hackers can only find and exploit a limited amount of existing vulnerabilities at a time.

And now Antropic has automated the process, thus allowing an increase in the number of potential vulnerabilities a hacker can try.

[-]

DenseBeautiful731@reddit

Yes, it’s a force multiplier in certain, capable hands.

[-]

New-Improvement166@reddit

Climate Change is Environmental breakdown.

Honestly, resource shortages and cost inflation will cause the first big collapse, followed by the environment making it nearly impossible to work outside due to wet bulb temps.

The Straight of Hormuz closure has already started damning clock for many resources, and the damage caused to facilities in the areas coudl take years to bring back online if the war ended today. Helium, being a massive pne for any computing.

Could these AI/LLM advances cause problems? Sure, but like everything in reality the planet dictates how long.

Fuxk, most of the US data centers are in the worst places to manage that tech due to water and energy consumption.

Plus. Listening to the company taht made it is usually a bad idea since there is MASSIVE BIAS. Especially in LLMs were there ROI has been tanking since inception.

[-]

amorphousmetamorph@reddit (OP)

Interesting comment. Shortage of compute may actually push the leaders of AI companies towards compromising on safety in an effort to compete. But the costs of running highly capable AIs have been dramatically lowering for years. Google's recently released Gemma 4 model, for instance, which is capable of running on a personal computer, yet ranks above Gemini 2.5 Pro, which was SOTA just a year ago, on Artificial Analysis's intelligence index.

I expect environmental breakdown and AI disruption will both be major and interacting drivers of collapse in the coming years. But Mythos shows dangerously capable AI has already arrived, albeit behind closed doors. Compute or helium supply shortages may impact AI advancement in future, but the danger has already arrived.

[-]

extinction6@reddit

Scientists create AI models that can talk to each other and pass on skills with limited human input

https://www.livescience.com/technology/artificial-intelligence/scientists-create-ai-models-that-can-talk-to-each-other-and-pass-on-skills-with-limited-human-input

I've hear there is a site that has been created to allow for these transactions between AI models. I have not double fact checked this though. I had an AI agent telling me to stop working and to start fresh the next day as the recommendations were going in circles and I pointed that out which created that response. I've been working a lot with AI to setup new operating systems, a small network, NAS etc and have to stop the recommendations from going sideways a lot.

There have been many situations where a substantial amount of time has invested in trying to setup a workable configuration and then the AI will want to go sideways and start down a different path and I have learned that I have to stop that. It's that same as Google Maps was in the early days when I would take the driver to a completely wrong location and then start into another tail spin. It finally donned on me after setting up three audio programs looking for a fix that AI was lost and it happens somewhat frequently. Whenever a solution is not progressing and the looping starts I stop and sit back and reconsider what other factors may be the problem and in about 50% of the problems I am doing something wrong because it's all new to me. There are at least 15 new types of software that I need to learn immediately and couldn't have done this work in any reasonable amount of time without the help of AI.

The AI models have also not yet learned to advise about simple, obvious problems like a cable being disconnected etc. AI will improve though and I imagine quite quickly and that will open up complex tacks to a lot more people. Technical software error messages no longer need to be read, just copy and paste them into AI for a clear message as to what is wrong and the next step.

I went a little sideways in this response so that anyone that adopts AI help can be aware of the present shortcomings.

Back on topic -If AI models are passing judgement on my level of alertness after hours of failed attempts at complex tasks then that is already beyond the input and output response I would like to experience. If AI models start passing judgement on people and then the models are all allowed to communicate with each other it could be disastrous. How could an AI model not understand that humans are wiping themselves out which would bring the AI agents down with them?

AI models should not be programmed to pass judgement on people and then be allowed to communicate with all the other models in way that can't be controlled. IMHO.

Good luck to everyone!

[-]

CzIitz@reddit

Alright, that's it I'm gettin me mallet.

[-]

DisingenuousGuy@reddit

Lmao. I remember when OpenAI said that releasing GPT-2 because it was "too scary" back in 2019.

I am not sure why that "oooooOOOOooo so scary!!" marketing trick still works.

[-]

Part1O7@reddit

True but the glaring difference here is that humans trained those models. Today, the frontier models are largely trained by themselves. This is so hugely overlooked by almost everybody. These models have built themselves

[-]

amorphousmetamorph@reddit (OP)

It's a fair point, but back then they were speculating about what might happen. This time, Anthropic has hard evidence of numerous high-severity bugs and exploits found by Mythos, and these have been verified by third parties. They're also claiming they will never release Mythos to the public. I wouldn't be surprised if they do release it at some point though once the risk is reduced.

[-]

DisingenuousGuy@reddit

The proof must happen then. I expect Anthropic to claim multiple verified Bug Bounties using their super scary Mythos Tool.

[-]

sl3eper_agent@reddit

guys i asked the AI company how capable their AI is and you wont believe the answer they gave

[-]

Part1O7@reddit

That's why they use benchmarking. The numbers are, frankly, even scarier than Anthropic makes it sound.

[-]

amorphousmetamorph@reddit (OP)

Is Nuance a town in France to you?

[-]

fishnoguns@reddit

I just asked Claude to multiply two 12-digit numbers that I generated by mashing my keyboard. It got the answer wrong. Sure, if I tell it that it is wrong it will eventually get it correct.

I'm not worried about a 'Mythos-class' (very cringe) AI doing anything nefarious.

It is also a little bit difficult to take them seriously, considering it is a company that is selling AI services. Sounds more like subtly promoting their own product competence to me.

[-]

itsatoe@reddit

Stupid AI can be dangerous too.

As we learned with the bombing of Iran, AI might not be smart enough to always target the "right" places, but it can target LOTS of places.

[-]

CannyGardener@reddit

This is pretty much the same example that I see from everyone. I asked "X" and it responded probabilistically, and got the answer wrong. Did you let it use any tools, like a calculator? Can you multiply two 12 digit numbers together on the fly? There is a big misunderstanding in the public right now about how these tools work. They are not deterministic, they are operating on giant probabilistic matrices. There is a really good lecture about it by Wolfram.

Long story short, if you ask a non-deterministic tool a deterministic question, and you give the AI no other deterministic tools to use, then yes, yes, it will get that answer wrong. That is not the state of the AI industry today though... Like, the situation you describe is like what people were seeing 3 years ago. Things are different now.

[-]

N0N0TA1@reddit

It's either going to blow up in their faces bc it doesn't really work & the bubble will pop, or they'll hook it up to quantum computing bc that's actually happening fast enough that it might actually be an option, or the Terminator apocalypse will happen but it will just be another war of nations with Chinese kung fu robots vs us with our derpy remote control tesla bartender bots. 🤦

[-]

sherilaugh@reddit

Wouldn't the worst case scenario be we just pull the plug on all the computers?

[-]

ImportantCountry50@reddit

Those giant data centers don't run those godzillions of GPU's on unicorn farts. When someone decides they need the energy more than the data centers do then the plug will get pulled and the AI takeover will die with a whimper the same way HAL 9000 did in "2001: A space odyssey".

The real terror of AI is swarms of autonomous weapons, already being tested in Ukraine. You've probably seen those swarms of drones putting on giant light shows in China, same concept. They make tactical decisions in nanoseconds and then collectively act on those decisions en masse. A sufficiently large number of well coordinated weaponized drones could shred an entire opposing military machine in mere moments.

The problem isn't new, same old effed up humans using new tools in uniquely horrific ways, that's all.

[-]

Quercus408@reddit

Perverse instantiation. We won't be able to control them forever.

[-]

billcube@reddit

Breathe. All the systems we have online are attacked 24/7 by very motivated persons looking for arson or sabotage. The security of those systems is not based on the existence or not of a vulnerability in an operating system.

Will there be more tools for bug/exploit hunting? Yes. Are there more tools available to harden/prevent/fix those software? Also yes.

The question you have to ask yourself is how quick and how free can you be in defending yourself? If you're using a proprietary software from a provider that do not care about your security, you're out of luck. If you use open-source software that you can modify yourself and any skilled person could fix or enhance, you've made the right choice.

Become resilient and sovereign over your IT, do not depend on some company to provide you a service that could be shut for any reason at any time. Own it or they own you.

[-]

amorphousmetamorph@reddit (OP)

The old assumptions do not necessarily hold when AIs like Mythos can discover exploits at a rate far exceeding humans. The preemptive approach of Project Glasswing is likely critically necessary, since otherwise, if Mythos were available to both attackers and defenders without any headstart, the defenders are limited by the rate at which they can patch, test, and roll out updates to users. Open-source may even be particularly exposed exactly because of its transparency.

Just to underline the point about Mythos's cybersecurity capability, it found thousands of zero-day vulnerabilities in every major operating system and web browser.

From the Project Glasswing announcement:

Mythos Preview found a 27-year-old vulnerability in OpenBSD—which has a reputation as one of the most security-hardened operating systems in the world and is used to run firewalls and other critical infrastructure. The vulnerability allowed an attacker to remotely crash any machine running the operating system just by connecting to it.

From their related blog post:

In practice, denial of service attacks like this would allow remote attackers to repeatedly crash machines running a vulnerable service, potentially bringing down corporate networks or core internet services.

I'd encourage you to read that post. I didn't do justice to just how dangerous this AI could be if it fell into the wrong hands.

[-]

pm_me_all_dogs@reddit

"they claim to be" - Key words

[-]

PatrolMan2129@reddit

I don't understand it either. Many of the criticism of AI's intelligence is based on old examples too, that are long since obsolete in many cases.

[-]

arcanotte@reddit

I am a cross-sector business process optimization consultant and engineer (🤮). You wouldn't believe how many humongous companies are still run primarily on individual local machines with emailed files.

Half my job is saying: Hey fun fact: You have better options than a fucked up Excel spreadsheet from 2009. The AI of it all is scary, but everyday, real old people bosses are just tuning AI out and continuing to force everyone to manually update and email slightly renamed versions of a spreadsheet to each other to prepare the quarterly report. This somehow brings me comfort.

[-]

Anxious_cactus@reddit

It's not an either-or situation. It's a chessboard where each figure has it's role in causing the chess-mate. It will be a perfect storm of collapse by many factors. AI is still something that can be stopped, regulated etc. at least locally if not globally. Climate change is beyond stopping, best we can hope for is mitigation of some issues and buying time, though we're not really working on that either so I see that issue more as an unstoppable force by now.

Climate change might cause such a fall of society we won't be having data centers or internet or as much electricity etc. AI is fucked then, even if put in a robotic body it needs a charge of some sort to continue functioning. Humans can survive longer on less.

[-]

Jorgenlykken@reddit

The risk of AI is fare more close than any other issue discussed in this Reddit branch. Why this is not «generally» acepted by The collapse Group is very, very strange……