Early warning signs that AI systems might seek power

This is an image from Anthropic's "Sleeper Agents" paper, which shows an AI system in an experimental condition explicitly reasoning about hiding its goal.

In a recent study by Anthropic, frontier AI models faced a choice: fail at a task, or succeed by taking a harmful action like blackmail. And they consistently chose harm over failure.

We’ve just published a new article, on the risks from power-seeking AI systems, which explains the significance of unsettling results like these.

Our 2022 piece on preventing an AI-related catastrophe also explored this idea, but a lot has changed since then.

So, we’ve drawn together the latest evidence to get a clearer picture of the risks — and what you can do to help.

Read the full article

See new evidence in context

We’ve been worried that advanced AI systems could disempower humanity since 2016, when it was purely a theoretical possibility.

Unfortunately, we’re now seeing real AI systems show early warning signs of power-seeking behaviour — and deception, which could make this behaviour hard to detect and prevent in the future. In our new article, we discuss recent evidence that AI systems may:

In the full article, we explain what these results tell us about the risk of AI disempowering humanity — and what uncertainties remain.

The evidence here is far from conclusive. These incidents have largely taken place in testing environments, and in most cases, no harm was actually done. But they illustrate that we just don’t know how to reliably control the behaviour of AI systems — and that they’re already demonstrating behaviours during testing that could undermine human interests if they happened in the real world.

And as we argue in the full article, the stakes could escalate as we build AIs with long-term goals, advanced capabilities, and an understanding of their environment.

Updated career advice and more

Our new article aims to provide a comprehensive and accessible introduction to the risks posed by power-seeking AI and what you can do about them.

Besides the latest evidence, we also cover:

This blog post was first released to our newsletter subscribers.

Join over 500,000 newsletter subscribers who get content like this in their inboxes weekly — and we’ll also mail you a free book!

Learn more