The risk of an AI-pocalypse

Posted by amorphousmetamorph@reddit | collapse | View on Reddit | 49 comments

Please "hear me out" on this. I know this sub has an extreme aversion to AI while tending to downplay its significance. I'm arguing here from an alternative perspective - that AI is in fact becoming highly, dangerously, capable. The evidence for this is now becoming almost impossible to deny.

With the recent announcement of Anthropic's latest SOTA model, Claude Mythos Preview, which they claim to be withholding from public release for security reasons, I would like to highlight an oft-underappreciated near-term threat to the stability of human civilization: the threat posed by misaligned agentic AI.

To quote from Anthropic's announcement of Project Glasswing, an initiative designed to prevent the chaos that could ensue if Mythos-class AIs were made freely available to the public without adequate preparation:

Mythos Preview has already found thousands of high-severity vulnerabilities, including some in every major operating system and web browser. Given the rate of AI progress, it will not be long before such capabilities proliferate, potentially beyond actors who are committed to deploying them safely. The fallout—for economies, public safety, and national security—could be severe. Project Glasswing is an urgent attempt to put these capabilities to work for defensive purposes.

According to the system card for Mythos Preview, it occasionally exhibits evidence of, and acts upon, desires that are misaligned with the most helpful outcome for other users:

[...] what the model wants to do diverges from what it deems most helpful.

So even after all the post-training Anthropic did to instil a helpful and harmless persona into Mythos, it's still got competing drives - it still lacks a unified orientation towards benefit. This may ultimately be manageable with Mythos-class models, but even more capable models will be released in future (AI investment is projected to reach $2.5 trillion this year), and each leap in capability exacerbates the danger of even subtle misalignments, as Anthropic indicate in the system card:

We believe that it does not have any significant coherent misaligned goals, and its character traits in typical conversations closely follow the goals we laid out in our constitution. Even so, we believe that it likely poses the greatest alignment-related risk of any model we have released to date.

Later, they add:

Claude Mythos Preview shows a uniquely low rate of reckless or destructive actions in agentic contexts, but when these actions take place, they tend to lead to more dramatic unwanted consequences than with less capable prior models.

A determined actor who got their hands on Mythos Preview could plausibly do damage an the scale of a state-sponsored hacker group. By using Mythos to spawn and orchestrate sub-agents, they could simultaneously attack financial, energy and utilities infrastructure.

Without a fundamental re-think of AI training methods to prioritize safety, these competing drives may lead to catastrophic outcomes. How could an AI ever be trained on vast collections of human-generated and derived data and not possess competing desires?

Now consider the fast-improving capabilities of open-weights models such as GLM 5.1, developed by the Chinese tech company z.ai. This currently sits right on the tail of SOTA proprietary models such Anthropic's Claude Opus 4.6 model in Artificial Analysis's intelligence index. Such an open-weights AI can be re-tuned by nefarious actors to suit whatever objective they might have.

As described in the well-publicised AI 2027 forecast, the US and China are now in an arms-race to develop an AI capable enough to recursively self-improve and thus rapidly achieve a dominant level of intelligence that can crush all competitors and grant its owners, to the extent they can keep it aligned with their values, an unprecedented degree of power on a global scale.

To quote Thomas L. Friedman in a recent NyTimes article:

this is potentially as fundamental and significant a turning point as was the emergence of mutually assured destruction and the need for nuclear nonproliferation

The danger, of course, is that such a dynamic will lead to corner-cutting on AI safety procedures. The "we must build this before the bad guys do" mentality will override any instinct towards caution. Needless to say, the Trump white house is actively removing guardrails from AI companies with the aim of accelerating progress. From the white house's AI Action Plan (PDF):

To maintain global leadership in AI, America’s private sector must be unencumbered by bureaucratic red tape. President Trump has already taken multiple steps toward this goal, including rescinding Biden Executive Order 14110 on AI that foreshadowed an onerous regulatory regime.

How might this play out in the near-term? One detailed forecast from Citrini Research---which was taken seriously enough that it temporarily shook stock markets---paints a picture of mass layoffs, widespread mortgage defaults and major economic shock waves. AI 2027's forecast is even grimmer. Although they leave open the possibility of a positive trajectory where AI alignment is prioritized and solved as part of a collaboration between US and Chinese AI companies, reading through it, one is likely to be struck by a premonition of inevitable doom familiar to collapseniks.

Anthropic's decision to withhold Mythos---which I suspect was made, at least in part, with good intentions---is commendable. And OpenAI has now reportedly decided to follow suit. This arguably underlines the severity of the risk to cybersecurity posed by this new class of models. But it's far from certain that other AI companies playing catch-up, such as Meta, or the many Chinese AI companies, will show the same level of restraint. And I remain deeply concerned that OpenAI is lead by someone whose integrity and honesty have been repeatedly called into question.

In some ways, the dynamic among AI companies w.r.t. AI safety is reminiscent of the dynamic among nations w.r.t. climate and the environment. Both involve actors pursuing an optimal strategy to meet their individual goals, which ultimately results in a sub-optimal (read catastrophic) outcome for everyone. Fundamentally, both are revealing that our political and economic institutions are not architecturally capable of optimizing for long-term civilizational welfare when doing so conflicts with short-term competitive advantage.

I have barely scratched the surface here of all the ways in which AI may undermine civilizational stability. Even if it's not the primary factor, it seems inevitable that it will be a major contributing factor to collapse. Even many of the positive outcomes result in humans being relegated to the role of pets that the superintelligences keep around for amusement - what could possibly go wrong with that (/s)?

Which do you think is likely to cause the collapse of civilization sooner - AI, climate, some form of environmental breakdown, or something else entirely? And how do you see AI contributing to collapse?