AI-enabled power grabs

Summary
Advanced AI technology may enable its creators, or others who control it, to attempt and achieve unprecedented societal power grabs. Under certain circumstances, they could use these systems to take control of whole economies, militaries, and governments.
This kind of power grab from a single person or small group would pose a major threat to the rest of humanity.
Profile depth
Exploratory
This is one of many profiles we've written to help people find the most pressing problems they can solve with their careers. Learn more about how we compare different problems and see how this problem compares to the others we've considered so far.
Table of Contents
Why is this a pressing problem?
New technologies can drastically shift the balance of power in society. Great Britain’s early dominance in the Industrial Revolution, for example, helped empower its global empire.1
With AI technology rapidly advancing, there’s a serious risk that it might enable an even more extreme global power grab.
Advanced AI is particularly concerning because it could be controlled by a small number of people, or even just one. An AI could be copied indefinitely, and with enough computing infrastructure and a powerful enough system, a single person could control a virtual or literal army of AI agents.
And since advanced AI could potentially trigger explosive growth in the economy, technology, and intelligence, anyone with unilateral control over the most powerful systems might be able to dominate the rest of humanity.
One factor that enhances this threat is the possibility of secret loyalties. It may be possible to create AI systems that appear to have society’s best interests in mind but are actually loyal to just one person or small group.2 As these systems are deployed throughout the economy, government, and military, they could constantly seek opportunities to advance the interests of their true masters.
Here are three possible pathways through which AI could enable an unprecedented power grab:
- AI developers seize control — in this scenario, actors within a company or organisation developing frontier AI systems use their technology to seize control. This could happen if they deploy their systems to be used widely in the economy, military, and government while it retains secret loyalty to them. Or they could potentially create powerful enough systems internally that can gather enough wealth and resources to launch a hostile takeover of other centres of power.
- Military coups — as militaries incorporate AI for competitive advantage, they introduce new vulnerabilities. AI-controlled weapons systems and autonomous military equipment could be designed to follow orders unscrupulously, without the formal and informal checks on power that militaries traditionally provide — such as the potential for mutiny in the face of unlawful orders. A military leader or other actor (including potentially hostile foreign governments) could find a way to ensure the military AI is loyal to them, and use it to assert far-reaching control.
- Autocratisation — political leaders could use advanced AI systems to entrench their power. They may be elected or unelected to start, but either way, they could use advanced AI systems to undermine any potential political challenger. For example, they could use enhanced surveillance and law enforcement to subdue the opposition.
Extreme power concentrated in the hands of a small number of people would pose a major threat to the interests of the rest of the world. It could even undermine the potential of a prosperous future, since the course of events may depend on the whims of those who happened to have dictatorial aspirations.
There are also ways AI could likely be used to broadly improve governance, but we’d expect scenarios in which AI enables hostile or illegitimate power grabs would be bad for the future of humanity.
What can be done to mitigate these risks?
We’d like to see much more work done to figure out the best methods for reducing the risk of an AI-enabled power grab. Several approaches that could help include:
- Safeguards on internal use: Implement sophisticated monitoring of how AI systems are used within frontier companies, with restrictions on access to “helpful-only” models that will follow any instructions without limitations.
- Transparency about model specifications: Publish detailed information about how AI systems are designed to behave, including safeguards and limitations on their actions, allowing for external scrutiny and identification of potential vulnerabilities.
- Sharing capabilities broadly: Ensure that powerful AI capabilities are distributed among multiple stakeholders rather than concentrated in the hands of a few individuals or organizations. This creates checks and balances that make power grabs more difficult. Note though that there are also risks to having powerful AI capabilities distributed widely, so the competing considerations need to be carefully weighed.
- Inspections for secret loyalties: Develop robust technical methods to detect whether AI systems have been programmed with hidden agendas or backdoors that would allow them to serve interests contrary to their stated purpose.
- Military AI safeguards: Require that AI systems deployed in military contexts have robust safeguards against participating in coups, including principles against attacking civilians and multiple independent authorisation requirements for extreme actions.
For much more detail on this problem, listen to our interview with Tom Davidson.
Learn more
- AI-Enabled Coups: How a small group could use AI to seize power by Tom Davidson, Lukas Finnveden, and Rose Hadshar
- Podcast: Will MacAskill on AI causing a “century in a decade” — and how we’re completely unprepared
- Our career review on AI governance and policy
- Our problem profile on stable totalitarianism
- Sleeper agents: training deceptive LLMs that persist through safety training by Evan Hubinger et al. of Anthropic
Notes and references
- This article is largely based on the research paper AI-Enabled Coups: How a Small Group Could Use AI to Seize Power by Tom Davidson, Lukas Finnveden, and Rose Hadshar.↩
- Anthropic demonstrated in a paper that it’s possible to train AI systems to be “sleeper agents” — that is, they can appear to be friendly and function normally, but they retain secret “triggers” that will cause them to exhibit completely unexpected hostility. These triggers can persist even through the standard procedures meant to train the models to be helpful, harmless, and honest.
Though it wasn’t the purpose of the paper, the findings suggest that installing “secret loyalties” in models may be possible.↩