#215 – Tom Davidson on how AI-enabled coups could allow a tiny group to seize power

Throughout history, technological revolutions have fundamentally shifted the balance of power in society. The Industrial Revolution created conditions where democracies could dominate for the first time — as nations needed educated, informed, and empowered citizens to deploy advanced technologies and remain competitive.

Unfortunately there’s every reason to think artificial general intelligence (AGI) will reverse that trend.

In a new paper published today, Tom Davidson — senior research fellow at the Forethought Centre for AI Strategy — argues that advanced AI systems will enable unprecedented power grabs by tiny groups of people, primarily by removing the need for other human beings to participate.

Come work with us on the 80,000 Hours podcast team! We’re accepting expressions of interest for the new host and chief of staff until May 6 in order to deliver as much incredibly insightful AGI-related content as we can. Learn more about our shift in strategic direction and apply soon!

When a country’s leaders no longer need citizens for economic production, or to serve in the military, there’s much less need to share power with them. “Over the broad span of history, democracy is more the exception than the rule,” Tom points out. “With AI, it will no longer be important to a country’s competitiveness to have an empowered and healthy citizenship.”

Citizens in established democracies are not typically that concerned about coups. We doubt anyone will try, and if they do, we expect human soldiers to refuse to join in. Unfortunately, the AI-controlled military systems of the future will lack those inhibitions. As Tom lays out, “Human armies today are very reluctant to fire on their civilians. If we get instruction-following AIs, then those military systems will just fire.”

Why would AI systems follow the instructions of a would-be tyrant? One answer is that, as militaries worldwide race to incorporate AI to remain competitive, they risk leaving the door open for exploitation by malicious actors in a few ways:

  1. AI systems could be programmed to simply follow orders from the top of the chain of command, without any checks on that power — potentially handing total power indefinitely to any leader willing to abuse that authority.
  2. Systems could contain “secret loyalties” inserted during development that activate at critical moments, as demonstrated in Anthropic’s recent paper on “sleeper agents”.
  3. Superior cyber capabilities could enable small groups to hack into and take full control of AI-operated military infrastructure.

It’s also possible that the companies with the most advanced AI, if it conveyed a significant enough advantage over competitors, could quickly develop armed forces sufficient to overthrow an incumbent regime. History suggests that as few as 10,000 obedient military drones could be sufficient to kill competitors, take control of key centres of power, and make your success fait accompli.

Without active effort spent mitigating risks like these, it’s reasonable to fear that AI systems will destabilise the current equilibrium that enables the broad distribution of power we see in democratic nations.

In this episode, host Rob Wiblin and Tom discuss new research on the question of whether AI-enabled coups are likely, and what we can do about it if they are, as well as:

  • Whether preventing coups and preventing ‘rogue AI’ require opposite interventions, leaving us in a bind
  • Whether open sourcing AI weights could be helpful, rather than harmful, for advancing AI safely
  • Why risks of AI-enabled coups have been relatively neglected in AI safety discussions
  • How persuasive AGI will really be
  • How many years we have before these risks become acute
  • The minimum number of military robots needed to stage a coup

This episode was originally recorded on January 20, 2025.

Video editing: Simon Monsour
Audio engineering: Ben Cordell, Milo McGuire, Simon Monsour, and Dominic Armstrong
Camera operator: Jeremy Chevillotte
Transcriptions and web: Katy Moore

Highlights

"No person rules alone" — except now they might

Rob Wiblin: What is it structurally about AI as a technology that allows it to facilitate seizures of power by small groups?

Tom Davidson: The key thing in my mind is that it’s surprisingly plausible that you could get a really tiny group of people — possibly just one person — that has an extreme degree of control over how the technology is built and how it’s used.

Let me say a few things about why that might be the case. Today there are already massive capital costs needed to develop frontier systems: it costs hundreds of millions of dollars for the computer chips that you’d need. So that means that already there’s only a handful of companies that can afford to get into that game and large barriers to entry.

And I think that is going to increase as a factor over time, as these initial training runs are getting more and more expensive. And even with the move away from pretraining towards more agentic training like with o1, still we expect that there’s going to be a lot of compute used to generate that synthetic data to train the most capable systems.

There’s also kind of a broad economic feature of AI that it has these massive economies of scale, which means huge upfront development costs and then very small marginal costs of serving an extra customer. And also, AIs produced by different companies are pretty similar to one another. There are small differences between Claude and GPT-4, but not massive.

And economically speaking, those features tend to favour a kind of natural monopoly where there’s just one company that kind of serves the whole market. I’m not saying that these economic features will necessarily push all the way to there being just one frontier AI developer, but I think that there are these broad structural arguments to think that there could be a consolidation of that market — like what we’ve seen, for example, in the semiconductor supply chain over previous decades: now only TSMC is able to produce the smallest node chips.

So those are economic factors. I think there are some political factors that could lead to centralisation of AI development as well. People have raised reasonable national security grounds for centralising AI development. It could allow us to secure AI weights more against potential foreign adversaries. So I think there’s a chance that’s convincing.

People have also thought it might be good for AI safety to have just one centralised project, so you don’t have racing between different projects.

There’s also some more AI-specific reasons that you could have a real centralisation in terms of AI development. This idea of recursive improvement, which I talked about last time I was on the podcast, is the idea that at some point — I think maybe very soon — AI will be able to fully replace the technical workers at top AI developers like OpenAI. And when that happens, even if we were previously in a situation where the top AI developer is only a little bit ahead of the laggard, it could be that once you automate AI research, that gap quickly becomes very big — because whoever automates AI research first gets a big speed boost. So even if it seems like there’s multiple projects that are all developing frontier systems, then there could be a year over which actually now there’s only really one game in town, in terms of the very best system.

So this is all to say that we could very easily end up in a world where there’s just one organisation that is developing this completely general purpose, highly powerful technology.

Now, you might say that’s OK, because within that organisation there’ll be loads of different people, loads of checks and balances. But there’s actually a plausible technological path to that not being the case, which again relates to how AI could potentially replace the technical researchers at that company.

So today, there’s hundreds of different people that are involved in, for example, developing GPT-5. And if someone wanted to mess with the way that technology is built, so it kind of served the interests of a particular group, it would be quite hard to do, because there’s so many different people that are part of the process. They might notice what’s happening, they might report it.

But once we get to a world where it is technologically possible to replace those researchers with AI systems — which could just be fully obedient, instruction-following AI systems — then you could feasibly have a situation where there’s just one person at the top of the organisation that gives a command: “This is how I want the next AI system to be developed. These are the values I want it to have.” And then this army of loyal, obedient AIs will then do all of the technical work in terms of building the AI system. There don’t have to be, technologically speaking, any humans in the loop doing that work. So that could remove a lot of the natural inbuilt checks and balances within one of these — potentially the only developer of frontier AI.

So pulling that all together is to say that there is a plausible scenario where there’s just one organisation that’s building superhuman AI systems, and there’s actually potentially just one person that’s actually making the significant decisions about how it’s built. That is what I would consider an extreme degree of control over the system.

And even if there’s kind of an appearance that other employees are overlooking parts of the process, there’s still this risk that someone with a lot of access rights and are able to make changes to the system without approvals could secretly have a side project which does a lot of technical work. And that even if employees are overseeing some parts of the process, it could just be that that side project is able to have a significant influence over the shape of the technology without anyone knowing anymore.

Rob Wiblin: So I guess the key thing that distinguishes AI or AGI in this case from previous technologies, is the cliche about even dictators, even people who have seemingly enormous amounts of power: no person rules alone.

Even if you are Vladimir Putin or you’re seemingly controlling an entire country, you can’t go collecting the taxes yourself; you can’t actually hold the guns in the military yourself. You require this enormous number of people to cooperate with you and enforce your power. You have to care about the views of a broader group of people, because they might remove you if they think someone else would serve their interests better.

Tom Davidson: Exactly.

The 3 threat scenarios

Tom Davidson: I distinguish between three broad threat models here, although you can of course get combinations of all three.

Military coup is where there’s this legitimate, established military, but you subvert it by illegitimately seizing control — maybe using a technical backdoor or convincing the military to go along with your power grab. So that’s the first one: military coups.

The second one is something I call “self-built hard power,” which is just what it says on the tin: you’re kind of creating your own armed forces and broad economic might that allows you to overthrow the incumbent regime.

The third one is something that we’ve seen much more recently in more mature democracies, which I’m calling autocratisation. That’s the kind of standard term. The broad story there is that someone is elected to political office and then proceeds to remove the checks and balances on their power, often with a broad mandate from the people who are very discontented with the system as it is.

Rob Wiblin: So on one level, saying AI is going to enable a military coup or something like that sounds a little bit peculiar, a little bit sci-fi-ish. How sceptical should we be coming into this conversation? Are these things very abnormal or very rare? Or should we think of them as more common than perhaps we do on a day-to-day basis?

Tom Davidson: Across the second half of the 20th century, across the globe, military coups were very common. There were more than 200 successful military coups. Now, they were predominantly not in the most mature democracies; they tend to be in states that have some elements of democracy, but not kind of full-blown democracy.

But I do think that with AI technology, there will be new vulnerabilities introduced into the military that could enable military coups — so the historical trend that military coups haven’t happened in democracies may not continue to apply.

In terms of autocratisation, again, the most extreme cases of autocratisation really leading to full-blown authoritarian regimes haven’t started off in mature democracies like the United States. But for example, in Venezuela, that was a pretty healthy democracy for 40 years before Hugo Chavez came into power with a strong socialist mandate for reform — and then over the next 10, 20 years really pretty much removed all of the checks and balances. And now it’s just widely considered to be an authoritarian regime with just the smallest pretence of democracy today.

In terms of self-built hard power, the analogue I would point to there would be that historically new military technologies have enabled groups of people to gain large amounts of power. For example, the British Empire was largely built off the back of the Industrial Revolution, where there were lots of new technologies that created a broad economic and military might: the nuclear bomb, for example, gave decisive advantage in the Second World War; the longbow is a classic example people give that allowed this tiny English force to defeat the French in the Battle of Agincourt.

So it’s very typical for new military technologies to enable small groups of people to overpower larger groups of people. I think what is new is saying that, within a given country, there would be a quick process of developing new technologies that would then overthrow the incumbent regime. And that’s where some of the specifics of what AI might enable come into play.

Rob Wiblin: Yeah. And on the autocratisation, we see that happening reasonably often, or are kind of familiar with this happening in countries like Russia. So it’s happened over the last 20 years.

Tom Davidson: That’s right. Russia, Venezuela. Another good example is Hungary, which was thought to be a promising, pretty robust democracy. And then in the 2010s, with Viktor Orbán being elected to power, and a combination of just putting pressure on the media and fiddling with the electoral system, has just ended up in a position where it’s not really considered a democracy anymore.

Underpinning all 3 threats: Secret AI loyalties

Rob Wiblin: Through this conversation, we’re going to imagine that the alignment problem has largely been resolved, or at least for practical purposes the AI models are helpful: they do the thing that they’re instructed to do; they are trying to help out the person who is operating them and controls them and owns them.

And in that case, the person who runs the company or the person who is operating the model can basically tell the model to be loyal to them, and ensure that it always will remain loyal to them and will continue to follow their instructions. And if in the economy and in the military, it’s the AIs that are doing almost all of the useful work, then now they basically just have the entire loyalty of all of these groups, and they don’t have to care that much about what any other people think of them.

Tom Davidson: Yeah, exactly. And I’m particularly then applying that insight to the AI developer itself. I think one of the first types of work that might actually be automated is AI research itself, because it’s going to be so lucrative to do so, and because the people who are creating AI will be intimately familiar with that kind of work.

So I’m applying that insight that you won’t need to rely on humans anymore to the process of developing AI, and saying that in that context, it’s particularly scary — because now this whole new powerful general purpose technology can just be controlled by a tiny number of people.

And I actually want to distinguish between two types of extreme control you could have there.

The first type of extreme control is control over how superhuman AI is used. Once you develop these systems, it could be possible, just using the compute you used to develop them, to run hundreds of millions of copies of AIs that are as good or better than the top humans in these domains for gaining power.

If a leader of one of these organisations just used 1% of that compute — syphoned it off and used it without anyone knowing — to plot for ways of seizing power, that would be the equivalent of a million absolutely smart, hard-working people thinking as hard as they can about ways to seize power. There’s never been a situation where one person can get that kind of massive effort behind plotting and taking actions to seize power. So that’s the first type of extreme control: over use.

And the second — which I think is, if anything, even more scary — is an extreme degree of control over the way the technology is built. So it seems to me that it may well be technically feasible to create AI systems that appear, when you interact with them, to have the broad interests of society in mind, to respect the rule of law — but actually secretly are loyal to one person.

This is what I call the problem of “secret loyalties”: if there was someone who was powerful in an AI project and they did want to ultimately seize power, it seems like one thing that they could try to do is actually make it so that all superhuman AI that is ever created is actually secretly loyal to them. And then, as it’s deployed throughout the economy, as it’s deployed in the government, as it’s deployed in the military, as it’s deployed talking with people every day — advising them on their work, advising them on what they should do — it’s constantly looking for opportunities to secretly represent the interests of that one person and seize power.

And so between them — this possibility of secret loyalties and the possibility of using this vast amount of intellectual labour for the purposes of seizing power — it does seem to me scarily technologically feasible that you could have a tiny group or just one person successfully seizing power.

Is this common sense or far-fetched?

Rob Wiblin: To what extent should we worry that our imaginations are getting the best of us here, and we’re concerned about something that makes for a great story but perhaps isn’t the most likely thing to happen?

Tom Davidson: I think we can look back at history for support for the reality of these possibilities. Generically, new technologies have massively changed the power of different groups throughout history.

One example is the Arab Spring and the influence of social media there. Another example would be the printing press democratising the access to religious knowledge and reducing the influence of the Catholic leaders.

Going back as far as the introduction of agriculture: before agriculture, power was very distributed; people operated in relatively small groups. But with agriculture, groups stopped moving around so much. You could have much bigger societies, and then they became much more hierarchical, so you had much more of a concentration of power on the top.

And interestingly, the emergence of democracy itself was helped by the emergence of the Industrial Revolution — where now it was actually very advantageous for a country to have a well educated, free population that could create economic prosperity and therefore also a bigger military. So I think part of the thing that led to democracy emerging in the first place was this technological condition, where democracies were particularly competitive. And to the extent that different countries are kind of forcing other ones to adopt their systems or copying systems that seem to work, that’s probably one big driver of democracy being so popular.

And it’s actually interesting that in this context AI actually seems like it will reverse that situation. As we’ve discussed, it will no longer be important to a country’s competitiveness to have an empowered and healthy citizenship.

So with that context, and the context that historically military coups are common when people can get away with it, and the context that there could be a situation where a tiny number of people have an extreme degree of control over this hugely powerful AI technology, I don’t think that it is a kind of science-fiction scenario to think that there could be a power grab by a small group. Over the broad span of history, democracy is more the exception than the rule.

Rob Wiblin: Yeah, I think you mentioned in your notes that in the second half of the 20th century, there were 200 attempted coups, about 100 of which succeeded?

Tom Davidson: 400 attempted coups, over 200 of which succeeded.

Rob Wiblin: OK, so when people think they have a shot at taking over a country militarily, they do reasonably often take a crack at it.

I think the first time I heard this whole story, or I read about it, was a 2014 blog post by Noah Smith, the Economist commentator, who said that AI would potentially signal the end of people power. Basically the concern would be that you would no longer need people to serve in the military because it could operate autonomously as a set of machines. And then later on you would no longer need people for the economy because everything would be automated.

And at the point that leaders of a country no longer require a large number of human beings for military power or for economic power, it’s unclear why those people would retain so much political power. Maybe they would be able to scheme in order to do that, but they’re in a much more precarious position, because they no longer actually matter for any functional purpose in the way that the population currently does matter to rulers of a country.

Tom Davidson: Yeah, exactly. And this harks back to something we were saying earlier, where today they do matter and they do have real bargaining power. So I think the crucial thing is using that current day bargaining power to push itself forward into time — ensuring that as AI automation happens, it doesn’t concentrate wealth and political influence in the hands of a tiny number of people.

How to automate a military coup

Rob Wiblin: It’d be good to maybe dive in and think, step by step, how do these power grabs actually take place, so people can have more of an intuition about whether they think it sounds reasonable or not. Maybe the easiest one to talk about first is a military coup. How would that happen?

Tom Davidson: Right. So today, if you want to do a military coup in the US, you have to convince some number of the armed forces to support you, to seize control of key locations and so on, and you need to convince the rest of the armed forces not to oppose you.

But in the future we are going to end up in a world, I think, where you can’t be competitive militarily without automating large parts of the military — that is, AI-controlled robot soldiers, AI-controlled military systems of all kinds. And at that point, I think there are three new vulnerabilities that are introduced that could enable coups. I can go through them one by one.

The first is almost like a basic mistake we could make, where perhaps as we start to automate, initially the AI systems are only performing kind of limited tasks; they’re not that autonomous, so it makes a lot of sense to say that they should just follow the instructions of the human operator. And then, to the extent that the human gives orders that abide by the law, the AI system will do that. And if the human gives illegal orders, then the AI system will follow them, and that’s the human’s fault.

So there’s a possibility that the way that we automate is that we have the AI systems just follow human orders, whatever they are, and keep the humans liable in terms of the illegality of the military behaviour.

Rob Wiblin: And that would just be thinking of AI military applications the same way that we think about all other military equipment now. The guns don’t refuse orders, tanks don’t refuse orders. It’s the humans’ fault.

Tom Davidson: Exactly. But once AI systems become sufficiently autonomous, then it’s going to be really important that we change that. Because if we end up with, let’s say, AI-controlled robot soldiers that just follow any orders they get, if ultimately the chain of command finishes with the president, then they would then be following even illegal orders from the president to, for example, do a military coup.

And if they’re able to operate autonomously, then they could just follow those orders and literally you could get a military coup just because we built these systems that had this kind of, in hindsight, obvious vulnerability.

There’s obviously a big question of how slow does that process go? Are people going very slow and cautiously, or are we rushing because there seems to be competition with China or something? But I think in the fullness of time, we are going to get to that world where the vast majority of military power is now in fully automated systems.

I think you don’t even need the whole military to be automated. Historically in military coups, often it’s sometimes a handful of battalions that seize control of symbolic targets and kind of create a shared consensus that no one’s opposing this attempt. So we don’t even have to wait until full automation for this to be a risk.

Today, if there was a military coup, then there would be uproar throughout the nation and everything would grind to a halt because the new government wouldn’t be seen as legitimate. So if you’ve only automated the military, I think that would still happen.

There’s two ways in which I think you would still be able to kind of push over the line if you did do this military coup.

The first is that human armies today are very reluctant to fire on their civilians. So once there are these mass protests, it really does tie the hands of the people who’ve just done the coup, where their militaries literally will not fire on those protestors.

Rob Wiblin: Well, or they don’t know whether they will. They don’t know, if they give the order, whether they’ll fire or rebel against the people during the coup, I think. And so that just makes you cautious, and you anticipate being in this boat.

Tom Davidson: That’s right. Whereas again, if we got instruction-following AIs, then those military systems will just fire. So that’s a big change.

And the other thing is what we’ve discussed earlier, about how, to the extent you’re also getting robots and AI that can automate the broader economy, it doesn’t matter to you that everyone else is refusing to work, because you can just replace them with AIs and robots.

So for those reasons, I think you’re right that actually once you’ve largely automated the military, it is going to be pretty simple to then seize power.

Rob Wiblin: And I suppose people, even if they’re against it, at the point that they perceive it as hopeless to resist, then they have more reason to continue working even if they’re inclined to strike if they just think it’s futile. Do you really want to allow yourself to get killed? Why not just go along and hope for the best?

Tom Davidson: Yeah, or there could be a million drones that are able to follow individuals around and ensure they’re doing their work. So there could be the potential for much more fine-grained enforcement and monitoring than is possible today.

If you took over the US, could you take over the whole world?

Rob Wiblin: As we’re talking about all of these different scenarios, should people be picturing in their heads a small group trying to get power over the United States or the UK, or the whole world? What do you have in your mind when you’re thinking, “Does this sound plausible?”

Tom Davidson: I’m mostly thinking about the United States. Most of the stuff we’ll talk about today is about a tiny group seizing power over a country, with the United States as the key example — because it is leading on AI, so this is the country where this risk could first emerge, and it might be one of the most important countries in terms of the importance of ensuring that this doesn’t happen.

But I’m also interested in the risk of a small group getting power over the whole world. My current best guess is that if one person was trying to take over the world, their best strategy might well be to first try to seize power over the US in particular, because of the way that it’s particularly well suited to that — with it being where AI is being developed, and it being a very strong country already — and then, having taken over the US, use the US’s broad economic and military power and large lead in AI to take over the rest of the world.

Rob Wiblin: So we can imagine that in the fullness of time, this might lead to takeover of the entire world. But that would be a second stage, and would involve some other considerations of how you might go about doing that, and how you would avoid failing that, that we won’t focus so much on today.

Tom Davidson: I can say some brief thoughts there. The US is already very powerful on the global stage militarily, so with its big lead on AI, it could use AI to develop very powerful military technology that could allow it to potentially dominate other countries.

Again, you can draw the analogy with the British Empire and its lead in the Industrial Revolution allowing it to gain a lot of power globally. In fact, Carl Shulman has this interesting analysis where he points out that the British Empire in 1500 was 1% of world GDP; by 1900, it was 8%. That’s an eightfold increase.

The US is already 25% of world GDP, so if you had the same kind of relative increase in the US’s share of GDP — because it leads in AI, and AI accelerates economic growth in a comparable way to how the Industrial Revolution accelerated growth — then you actually end up in a situation where the US now is a supermajority of world economic output.

And then there are further arguments to think that you could kind of bootstrap from that level of economic supremacy to even greater degrees of economic supremacy by being the first country to go through a faster phase of growth because you’re leading on AI.

And one point that’s been made to me from William MacAskill is that the US wouldn’t necessarily have to dominate other countries directly to end up really dominating the world; it could simply be the first country to gain control of the rest of the energy in the solar system.

So only a tiny fraction of the sun’s energy falls on Earth. In the fullness of technological development, it will be possible to harness the rest of that energy. So one route to global ascendancy is just to use your temporary economic and military advantage from AI to be the first one to grab all of that additional energy from space — and then you would now be more than 99.99% of world economic production; you don’t have to have infringed on any other countries in any way whatsoever.

Secret loyalties all the way down

Tom Davidson: If all of the superhuman AIs in the world are already secretly loyal to one person, then the AIs that create these new automated military systems and create their AI controllers could insert secret loyalties into those military AIs — so that even if the official model specification says, “Of course they’re going to follow the law; they would never do a coup,” and all the tests say that, if there’s been a sophisticated insertion of secret loyalties, then that could be very hard to detect. And that could still result in a coup.

And those secret loyalties could potentially be inserted long before military automation actually occurs; it could be inserted at the point at which superhuman AI is first developed within an AI lab. It may be only years later that those secretly loyal AIs then pass on their secret loyalties to the automated military systems, and it may just be very hard at that point to detect.

Even if some people are saying they’re worried that these AIs in the military have secret loyalties, everyone else will say, “Where’s your evidence? This is a hypothetical worry and we’ve got a very real risk on our hands in terms of foreign adversaries building up their own militaries. So we’re going to proceed.”

We’ve talked about vulnerabilities from instruction following, vulnerabilities from secret loyalties. But a third vulnerability, which just means that this risk is more plausible in total, is the one you refer to in terms of being hacked. It seems likely that whoever controls the most powerful AI systems will also have access to the best cyber capabilities, because AI seems like it’s going to be particularly well suited to cyber; there’s quick feedback loops in terms of developing amazing cyber capabilities.

So if there is this one organisation which has access to better cyber capabilities than the rest of the world, then again, there’s a risk that they could hack multiple different military systems. Even if each military system has its own different cyber defences, ultimately, if there’s just a large imbalance in cyber capabilities, then there could still be a broad hacking and disabling or seizing control of those systems.

And so while I think there’ll be effort that goes into it, I don’t feel confident that we’ll absolutely nail defending against those three vulnerabilities.

Rob Wiblin: You’re saying you could have a loss of control that only becomes evident very late, but that begins at the very earliest point, when AI research is basically automated, and it’s possible for a small group of people to start giving instructions to the AIs that are doing the research. This could occur years before, where they instruct them to ultimately be loyal to them, or some agenda, and the AI just continues to pass this on.

And as the AI is recursively self-improving, they could get the assistance of the AI in figuring out how to obscure this loyalty as much as possible, such that it will be resistant to discovery by any of the scrutability, interpretability methods that are available to people at the time.

I’m just realising that if you can get in at that early point — I suppose later on we’re going to talk about how you can try to cut that off at the pass — but once that’s in place, it might be quite challenging to out if you have the assistance of the most capable AGI in preventing its revelation.

Tom Davidson: Exactly. And if there’s no similar access that defenders have to that level of AGI — because especially with the secret loyalties problem, any AGI that they get to help them detect if there are secret loyalties could itself be secretly loyal, and therefore not really helping them uncover the best techniques for discovering that.

Is it important to have more than one powerful AI country?

Rob Wiblin: So we’re in the UK, we’re in London. I think the government here recently released a plan to develop its own compute clusters, its own cutting-edge models that it would ensure that Britain had continued access to. I think there was both an economic thought here and a national security thought here.

Are you glad about that? Is that something that’s useful to have? Dissemination of capabilities not only within different groups within the US, but also between countries?

Tom Davidson: Yeah, I am glad about that. I think it gives an insurance policy against something awful like this happening in the US — in that if that did happen, then it means that the US would find it harder to totally dominate other countries. Because they would have, as you said, a lot of computer chips that they could use to run AI models that could give them cognitive labour, so there wouldn’t just be the US that has that.

I think if there is at some point a kind of a multinational project, it would also be good for the weights of the AI that’s being developed to be stored separately in multiple countries, so it’s not possible for the US to just seize the weights and then cut off access to other countries from that.

I also think that if other countries have more bargaining power from the get-go, then they’ll be able to influence the US in a more cooperative angle as it develops, because in terms of incremental attempts to increase its own power relative to other countries, that will seem less tempting if other countries continue to have bargaining power.

Rob Wiblin: I guess the US is likely to remain in a pretty dominant position; even if the UK tries to build its own compute clusters, it’s just a much smaller country to start with. I guess for that to be truly helpful defensively, you need to have a world that’s somewhat defence-dominant, where you don’t actually have to have parity in compute in order to be able to defend yourself. Maybe you get most of the returns from a relatively smaller amount of compute, and that can allow you to at least defend yourself against hostile actions.

Maybe this isn’t the right way of thinking about it, and instead we should think about it as more about bargaining between different groups that are somewhat friendly early on.

Tom Davidson: If the US does want to be aggressive and increase its own power, it will probably be unwilling to take massive economic hits from that. So even just trade is going to be a big lever here. The whole semiconductor supply chain is very distributed across the whole world, so it would be very costly for the US to go all-out and try and dominate itself, to the detriment of the rest of the world.

Even if it could in some sense succeed, just by being very militarily aggressive, that would be very costly from the perspective of the US — because it would probably delay its own growth by a long time, because it wouldn’t be able to rely on that existing semiconductor supply chain, and would kind of massively economically weaken it.

So I do think even marginal increases in the extent to which the rest of the world is influential in AI could make quite a big difference to the incentive landscape that faces the US.

What transparency actually looks like

Tom Davidson: Without transparency, there’s potential for the model specification to contain ambiguities or weaknesses that wouldn’t be highlighted.

So let’s take a really simple example, like the military use case. Maybe without a transparent model specification, the specification says that the AI should follow the president’s instructions, but it also says that it should follow the law. But it doesn’t drive into the details of how to resolve the potential tensions between those two parts of its model spec.

And currently the way that Anthropic’s model specification works is that there are all these different principles, but they don’t really nail down how they should be resolved in cases of conflict.

Whereas if the model spec was made public, in this example many people might realise, “Wait a minute, we’ve got these AIs controlling military systems, and the model spec doesn’t really say how they’re meant to resolve these two principles. That’s not really good enough. We need to change the model spec to be much more explicit about these edge cases.”

And similarly inside the company, maybe without transparency, the model spec just says, “Follow the instructions of the CEO,” because they’re ultimately in charge of the company when it comes to matters within the company. But it doesn’t say, “You should do that, even when they’re telling you to hack the internal computer system and introduce some vulnerabilities.” So again, if that’s made public and it can be scrutinised, then potential weak spots — that may be there completely accidentally — can be identified and then improved upon.

Rob Wiblin: OK, so we not only need the model spec to be transparent and to be published, we need it to be very thorough, so people can understand the thorniest cases that it’s going to have to deal with. And we also need external groups to be looking at it and reading it very closely and trying to figure out the weaknesses within this model spec that we need to be complaining about in order to get them fixed.

Tom Davidson: That’s right. And the hope is that a first step of making it transparent could lead to those further improvements. If it’s not thorough but it’s transparent, people can point that out. And if it’s transparent but people aren’t really looking into it, then it’s then relatively easy for someone else in the world to be like, “OK, I’m going to do a thorough analysis.”

So my hope is that actually this is an ask which is broadly very reasonable: we’re just saying, “Can you please disclose the information that you have about how your AI system is meant to behave so the rest of the world can understand?” It’s quite similar to asking someone who makes food to say what the ingredients are that go into this food, because that’s just relevant to consumers.

Rob Wiblin: I guess I have the same worry with this as I somewhat did about the internal safeguards: at the point that people were considering misusing the AI, why wouldn’t they just start lying on the model spec? It seems like you not only need transparency about the model spec, but you also need external auditing; lots of checking to make sure that it’s accurate; restrictions on people’s ability to be misleading or to obfuscate things in the model spec, or have one model that follows this specification but then they back out to a somewhat previous version.

We’ve said maybe people shouldn’t have access to the fully helpful model, but presumably there’ll be a bunch of different gradients and models that people do have access to, some of which are more helpful or less. And I guess it’s hard to write up a totally thorough model spec about every single intermediate model that you’re creating.

Tom Davidson: Yeah, I do think there’s wiggle room of this kind. And you’re right that merely publishing the model spec isn’t enough because of, as you say, the possibility of lying. I do think it’s going to be pretty costly for a would-be power-grab person to flat out lie about the model spec: at that early stage, if that lie gets discovered, it’s going to be very costly.

And so I sometimes think about how we’re kind of shaping the incentive landscape for someone who is just generically seeking power at one of these organisations. If there’s a strong norm of publishing the model spec, and these pretty-hard-to-argue-with reasons for it, then at that point it just becomes more costly to take those actions that would increase their own power — and they might go down a different path, where they never end up forming the explicit intention to create a secret loyalty or try and seize power.

Articles, books, and other media discussed in the show

Come work with us on the 80,000 Hours podcast team! We’re accepting expressions of interest for the new host and chief of staff until May 6 in order to deliver as much incredibly insightful AGI-related content as we can. Learn more about our shift in strategic direction and apply soon!

Tom’s work:

Others’ work in this space:

Other 80,000 Hours podcast episodes:

Related episodes

About the show

The 80,000 Hours Podcast features unusually in-depth conversations about the world's most pressing problems and how you can use your career to solve them. We invite guests pursuing a wide range of career paths — from academics and activists to entrepreneurs and policymakers — to analyse the case for and against working on different issues and which approaches are best for solving them.

Get in touch with feedback or guest suggestions by emailing [email protected].

What should I listen to first?

We've carefully selected 10 episodes we think it could make sense to listen to first, on a separate podcast feed:

Check out 'Effective Altruism: An Introduction'

Subscribe here, or anywhere you get podcasts:

If you're new, see the podcast homepage for ideas on where to start, or browse our full episode archive.