#235 – Ajeya Cotra on whether it’s crazy that every AI company’s safety plan is ‘use AI to make AI safe’

Every major AI company has the same safety plan: when AI gets crazy powerful and really dangerous, they’ll use the AI itself to figure out how to make AI safe and beneficial. It sounds circular, almost satirical. But is it actually a bad plan?

Today’s guest, Ajeya Cotra, recently placed 3rd out of 413 participants forecasting AI developments and is among the most thoughtful and respected commentators on where the technology is going.

She thinks there’s a meaningful chance we’ll see as much change in the next 23 years as humanity faced in the last 10,000, thanks to the arrival of artificial general intelligence. Ajeya doesn’t reach this conclusion lightly: she’s had a ring-side seat to the growth of all the major AI companies for 10 years — first as a researcher and grantmaker for technical AI safety at Coefficient Giving (formerly known as Open Philanthropy), and now as a member of technical staff at METR.

So host Rob Wiblin asked her: is this plan to use AI to save us from AI a reasonable one?

Ajeya agrees that humanity has repeatedly used technologies that create new problems to help solve those problems. After all:

  • Cars enabled carjackings and drive-by shootings, but also faster police pursuits.
  • Microbiology enabled bioweapons, but also faster vaccine development.
  • The internet allowed lies to disseminate faster, but had exactly the same impact for fact checks.

But she also thinks this will be a much harder case. In her view, the window between AI automating AI research and the arrival of uncontrollably powerful superintelligence could be quite brief — perhaps a year or less. In that narrow window, we’d need to redirect enormous amounts of AI labour away from making AI smarter and towards alignment research, biodefence, cyberdefence, adapting our political structures, and improving our collective decision-making.

The plan might fail just because the idea is flawed at conception: it does sound a bit crazy to use an AI you don’t trust to make sure that same AI benefits humanity.

But if we find some clever technique to overcome that, we could still fail — because the companies simply don’t follow through on their promises. They say redirecting resources to alignment and security is their strategy for dealing with the risks generated by their research — but none have quantitative commitments about what fraction of AI labour they’ll redirect during crunch time. And the competitive pressures during a recursive self-improvement loop could be irresistible.

In today’s conversation, Ajeya and Rob discuss what assumptions this plan requires, the specific problems AI could help solve during crunch time, and why — even if we pull it off — we’ll be white-knuckling it the whole way through.

This episode was recorded on October 20, 2025.

Video and audio editing: Dominic Armstrong, Milo McGuire, Luke Monsour, and Simon Monsour
Music: CORBIT
Coordination, transcriptions, and web: Katy Moore

The interview in a nutshell

Ajeya Cotra is now a member of technical staff at METR, and at the time of recording this interview she was a senior advisor at Coefficient Giving (formerly Open Philanthropy).

Ajeya argues that we may be approaching an “intelligence explosion” — a period when AI automates AI research and progress accelerates dramatically. She believes society needs better early warning systems, transparency requirements for AI companies, and a plan to redirect AI labour toward solving the problems AI creates.

Experts disagree by 1,000x on AI’s impact

Ajeya has noticed that people’s views on AI risk correlate strongly with how much they expect AI to speed up science and technology:

  • The mainstream view: By 2050, we’ll have somewhat better technologies, slightly longer lifespans — manageable change similar to the last 25 years. Perhaps AI enables scientific progress to keep going at the current pace, or even slightly faster, when otherwise it would have stagnated a lot.
  • The futurist view: By 2050, the world could look as different from today as today does from the hunter-gatherer era — 10,000 years of progress driven by AI automating all intellectual activity.

Why the disagreement persists:

  • Slower-camp reasoning: For 150 years, technological breakthroughs (electricity, radio, computers, internet) never showed up as upticks in economic growth beyond ~2%. New technologies sustain growth but don’t accelerate it. “Things are just always harder and slower than you think.”
  • Faster-camp reasoning: The 2% growth rate is historically anomalous — growth in 3000 BC was ~0.1% per year. The Industrial Revolution already accelerated growth 10x. Such growth was made possible from a feedback loop of human population increasing, which increased R&D on important scientific challenges, which then led to further population growth. If AI closes the loop of making more AI (cognitive and physical), there’s no reason 2% is a ceiling.

Each side has a story for why the other is systematically wrong:

  • Slower-camp people think “everyone always predicts revolution and is always wrong.”
  • Faster-camp people think “everyone always denies the premise that we could actually automate everything.”

Ajeya’s timeline: AI R&D automation in the early 2030s, then things move fast

Ajeya expects “top-human-expert-dominating AI” — systems better than the best humans at any remote computer task — in the early 2030s.

What happens next:

  • These AIs could use human physical labour to build robotic actuators for themselves.
  • Robotics is currently progressing quickly for the same reasons as cognitive AI: large models, lots of data, imitation learning. Within 1–2 years, AIs could close the loop: controlling factories that print chips, doing repair work, gathering raw materials — all without human labour.
  • At that point, progress could be limited only by physical constraints, not human labour constraints.

Three feedback loops to watch (per Forethought’s research):

  1. AIs automating AI software research
  2. AIs automating chip design and manufacturing
  3. AIs automating the entire physical supply chain down to raw materials

We need much better early warning systems

Why benchmarks aren’t enough:

  • Benchmarks always have an S-curve shape: they saturate, then harder benchmarks are created
  • Even 100% scores on current benchmarks wouldn’t indicate “this AI could take over the world”
  • The ideal signal is observed productivity: are AI companies discovering insights faster internally? (although it’s far from guaranteed we’ll be able to observe productivity improvements before it’s too late)

What Ajeya wants reported:

  • Capabilities indicators (including benchmark scores) at fixed calendar intervals (not just at product releases)
  • Fraction of pull requests mostly written and reviewed by AI (humans not involved on either side)
  • How much higher-level decision-making authority is given to AIs
  • Most concerning misalignment-related safety incidents (e.g., models lying and covering up logs)

Companies resist sharing this information for competitive, IP, and PR reasons, but Ajeya thinks public transparency may be essential — small government agencies can’t interpret evidence fast enough or build the common knowledge needed to sound an alarm.

Promising developments:

  • The Longitudinal Expert AI Panel (LEAP) tracks expert predictions on granular AI questions
  • Bills like the RAISE Act (NY) and SB 53 (California) focus on transparency and whistleblower protections

The plan: use early transformative AI to solve the problems it creates

This is essentially what all frontier AI companies say their safety plan for superintelligence is: when AI starts automating AI R&D, redirect substantial AI labour from further acceleration toward protective activities.

Problems AI could help solve during “crunch time”:

  • AI alignment: Ensuring current and future AI systems are honest, steerable, and motivated to help humans
  • Cyberdefence: Finding and patching vulnerabilities before bad actors exploit them
  • Biodefence: Scaling up pathogen detection, medical countermeasures, and PPE manufacturing
  • Epistemics and coordination: Helping humans find truth together, reach compromise policies, and avoid destructive conflicts
  • Value lock-in prevention: Avoiding scenarios where society gets permanently stuck with particular values

Why it might fail:

  • Companies won’t actually redirect AI labour from capabilities to safety due to tremendous competitive pressure.
  • AIs might be specifically good at AI R&D but bad at everything else — including AI safety research. You could have six months of warning but no useful AI labour to apply.
  • Not enough time: Ajeya thinks there’s maybe 12 months from AI R&D automation to uncontrollable superintelligence; she would prefer having much longer to sort out safety and societal adaptation.
  • If early transformative AIs have it out for us, they’d have incentives to undermine alignment research, biodefence, epistemics — anything that makes takeover harder.

Ajeya’s hope:

  • At each capability threshold, we do the safety and governance work needed to prepare for the next threshold.
  • This will most likely involve using AI labour to help us do the work fast enough — which would need to be redirected from AI capabilities or other uses.
  • On Ajeya’s views about takeoff speed, it’ll likely also involve coordination to step through the capability stages a lot more slowly than we would by default, so we have time to get the safety and preparation work done.
  • This isn’t one single “pause then unpause” — it’s stepping through the transition from AI R&D automation to superintelligent AI at a pace we can handle, doing protective work at each stage.

Before crunch time, we should focus on things that inherently take long lead times:

  • Physical infrastructure: Biodefence infrastructure like PPE stockpiles, vaccine manufacturing capacity, pathogen detection systems
  • Social consensus: Ideas need years to become “in the water” — like the concept of a US-China treaty to slow AI progress and redirect compute to solving problems
  • Government AI adoption: Regulators risk having “horses and buggies” while regulatees have “fast cars” due to red tape slowing government AI adoption

Reflections on effective altruism (EA)

Reflecting on her time in EA:

  • Ajeya thinks that EA’s unique comparative advantage lies in incubating high-stakes, speculative fields that academia and industry ignore. Just as EA incubated AI risk when it was considered fringe, it can now incubate “avant-garde” issues like digital sentience, value lock-in, and space governance.
  • As EA organisations became more prominent and political, they lost the “comically high integrity” that originally attracted Ajeya to the community, trading transparency for strategic caution.
  • She personally misses a “spiritual angle” in the community — a space for existential reflection on the radical future we face and the nature of goodness and morality, rather than just professional optimisation.

Highlights

The spectrum of expectations about AGI

Ajeya Cotra: I think there’s this expectation where, whether or not we get AGI in the next few years, a lot of people are starting to not really care about that question.

They still expect the next 25 years or the next 50 years to play out kind of like the last 25 years or the last 50 years, where there’s a lot of technological change between 2000 and 2025, but it’s like a moderate amount of change. They kind of expect that in 2050 there will be a similar amount of change as there was between 2000 and 2025. Even if they think that we’re going to get AGI in 2030, they think AGI is just what’s going to drive that continued mild improvement.

Whereas I think that there’s a pretty good chance that by 2050 the world will look as different from today as today does from the hunter-gatherer era. It’s like 10,000 years of progress rather than 25 years of progress driven by AI automating all intellectual activity.

Rob Wiblin: You’ve hinted at the fact that there is an enormously wide range of views on this, but can you give us a sense of just how large the spectrum is, and what the picture looks like on either end?

Ajeya Cotra: I would say on the sort of standard mainstream view, if you ask a normal person on the street what 2050 will look like, or if you ask a standard mainstream economist, I think they would think the population is a little bit bigger, we have somewhat better technologies. Maybe they have a few pet technologies that they’re most interested in. Maybe we have this one or that one. So slightly better medicine, people live slightly longer. It’s an amount of change that’s extremely manageable.

I think on the far extreme from there, on the other side, is a view described in If Anyone Builds It, Everyone Dies. In that worldview, at some point, probably pretty unpredictably, we sort of crack the code to extreme superintelligence. Like we invent a technology that rather suddenly goes from being like GPT-5 and GPT-6 and so on, to being so much smarter than us — that we’re like cats or mice or ants compared to this thing’s intelligence.

And then that thing can really immediately have really extreme impacts on the physical world. The classical, sort of canonical example here being inventing nanotechnology: the ability to precisely manufacture things that are really tiny and can replicate themselves really quickly and can do all sorts of things — like inventing space probes close to the speed of light and things like that.

I think there’s a whole spectrum in between, where people think that we are going to get to a world where we have technologies approaching their physical limits, we have like spaceships approaching the speed of light, and we have self-replicating entities that replicate as quickly as bacteria while like also doing useful things for us. But we’re going to have to go through intermediate stages before getting there.

But I think something that unites all of the people who are sort of futurists and concerned about AI x-risk is that they think in the coming decades we’re likely to get this level of extreme technological progress driven by AI.

Wildly different views about the economic effects of AI

Ajeya Cotra: I would say that the group that expects [AI economic impacts] to be a lot slower tends to lean on: for the last 100, 150 years in frontier economies we’ve seen 2% growth. And think of the technological change that has occurred over the last 100 or 150 years. We went from having very little, like electricity was just an idea to everywhere was electrified. We had the washing machine and the television, the radio, all these things happened, computers happened in this period of time. None of these show up as an uptick in economic growth.

And there’s this stylised fact that mainstream economists really like to cite, which is that new technology is sort of the engine that sustains 2% growth, and in the absence of that new technology, growth would have slowed. So they’re like, this is how new technologies always are. People think that they’re going to lead to a productivity boom, but you never see them in the statistics. You didn’t see the radio, you didn’t see the television, you didn’t see the computer, you didn’t see the internet — and you’re not going to see AI. AI might be really cool. It might be the next thing that lets us keep chugging along.

That’s one perspective. It’s an outside view they keep returning to. And also maybe a somewhat more generalised thing: things are just always hard and slow, just like way harder and slower than you think. … It’s like Hofstadter’s law: It always takes longer than you think, even when you take Hofstadter’s law into account. Or like the programmers’ credo, this is my favourite one: We do these things not because they are easy, but because we thought they would be easy.

So there’s just this whole cloud of it’s naivete to think that things can go crazy fast. If you write down a story that seems perfect and unassailable for how things will be super easy and fast, there’s all sorts of bottlenecks and all sorts of drag factors you inevitably failed to account for in that story.

Then I think the alternative perspective leans a lot on much longer-term economic history. If you attempt to assign reasonable GDP measures to the last 10,000 years of human history, you see acceleration. The growth rate was not always 2% per year at the frontier: 2% per year is actually blisteringly fast compared to what it was in like 3,000 BC, which was maybe like 0.1% per year. So the growth rate has already multiplied many fold — maybe an order of magnitude, maybe two.

I think that people in the slower camp tend to feel like the exercise of doing long-run historical data is just too fraught to rely upon. But people in both camps do agree that the Industrial Revolution happened and the Industrial Revolution accelerated growth rates a lot. And we went from having growth rates that were well below 1% to having 2% a year growth rates.

And I think that people in the faster camp tend to lean on the long run and on models that say that the reason that we had accelerating growth in the long run was a feedback loop where more people can try out more ideas and discover more innovations, which then leads to food production being more efficient, which then leads to a larger supportable population — and then you can rinse and repeat and you get super-exponential population growth.

Then that perspective says that if you can slot in AIs to replace not just the cognitive, but the cognitive and the physical, the entire package, and close the full loop of AIs doing everything needed to make more AIs, or AIs and robots doing everything needed to make more AIs and robots, then there’s no reason to think that 2% is some sort of physical law of the universe. They can grow as fast as their physical constraints allow them to grow, which are not necessarily the same as the constraints that keep human-driven growth at 2%.

The most dangerous AI progress might remain secret

Ajeya Cotra: I think there’s a whole spectrum of evidence about AI capabilities. On the one hand, the easiest to test but the least informative is benchmark results. And companies do release benchmark results when they release models right now. … I think that’s great that they do that. But in my ideal world, they would release their highest internal benchmark score at some calendar time cadence. So every three months they would say, “We’ve achieved this level score on this hacking benchmark, this level score on software engineering benchmark, this score on an autonomy benchmark.”

That’s because, as you said, danger could manifest from purely internal deployment, because if they have an AI agent that’s sufficiently good at AI R&D, they could use that to go much faster internally, and then other capabilities and therefore other risks might come online much faster than people were previously expecting. So it’s not ideal to have your report card for the model come out when you release it to the public, unless there’s some sort of guarantee that you’re not sitting on a product that’s substantially more powerful than the public product. …

Then there’s a bunch of other stuff that is not currently reported that ideally it would be really great to know. … One thing I’m interested in is what fraction of pull requests to your internal codebase were mostly written by AI and mostly reviewed by AI — so humans are not involved for the most part in both sides of this equation. I’d be very interested in watching that number climb up, because I think it’s an indication both of AI capabilities, and of how much deference they’re giving to AIs.

And eventually, if things are going to go crazy fast, the AIs have to be doing most things, including most management and approval and review, because if humans have to do that stuff, then things can only go so fast. So I really want to track how much higher-level decision-making authority is being given to the AIs in practice inside the companies. I think there are probably a bunch of other things that we could send basically as a survey: How much do you use AIs for this type of thing, for that type of thing? How much speedup subjectively do you think you get? If you’re running any internal RCTs, I would of course love to know the results of that. …

I don’t think that just benchmarks alone will actually lead anyone to sound the alarm. Because the thing with benchmarks is that they saturate. They always have the S-curve shape. And the benchmarks we have right now are harder than the previous generation of benchmarks. But it’s still far from the case that I feel confident that if your AI gets a 100% score on all these benchmarks, then it’s a threat to the world and it could take over the world. I still think the benchmarks we have right now are well below that. So what’s probably going to happen is that these benchmarks are going to get saturated, then there’s going to be a next generation of benchmarks people make, and then those benchmarks are going to tick up and then get saturated.

So I think we need some kind of real-world measure before we can start sounding the alarm. And the ultimate real-world measure is actually just observed productivity. Like if they are seeing internally that they’re discovering insights faster than they were before, then that’s a very late but also very clear signal. And that’s the point at which they should definitely sound the alarm, and we should know what’s happening.

White-knuckling the 12-month window after automated AI R&D

Rob Wiblin: The challenge we have is AI is becoming much smarter very quickly, and we feel very nervous about that. And the opportunity that’s created is that we have a lot more labour and we have much smarter potential researchers than we did before: so why don’t we turn that new resource towards solving this problem that at the moment we don’t really know how to fix?

I think some people who are not too worried about AI look at society as a whole, or they look at history and they say that technology has enabled us to do all kinds of more destructive things, but we don’t particularly feel like we’re in a more precarious situation now, or at much greater personal risk now than in 1900 or in 1800, because advances in destructive technology have been offset by advances in safety-increasing technology — and on balance, probably things have gotten safer.

So the idea is that potentially it’s going to be a vertiginous time, but perhaps we could pull off the same trick in this crunch time period?

Ajeya Cotra: Yeah, and I think that a lot of people who are more concerned about AI risk are very dismissive of this plan. It sort of sounds like a crazy plan. It’s really flying by the seat of your pants, like expecting the thing that’s creating the problem to solve the problem.

But in a sense, I do think humanity has repeatedly used general purpose technologies that created problems to solve those problems. Like automobiles, something as mundane as that: cars created the opportunity for there to be carjackings and for there to be drive-by shootings, and it empowered bad actors in various ways. But of course, if the police and law enforcement have cars as well, that is a balance. …

Similarly with computers. You can hack things with computers, but computers also enable you to do a lot of automated monitoring for that kind of hack and automated vulnerability discovery. And different kinds of law enforcement: you couldn’t imagine a police force not using computers.

So I do think the basic principle is sound, that if you’re worried about problems created by technology, one of the first things on your mind should be how can you use whatever that new technology is to solve those problems.

But I think that this is an especially narrow window to get this right. And you’re not imagining cars creating broad-based rapid acceleration of all sorts of new technologies and potentially just a 12-month window or two-year window or six-year window before everything goes totally crazy.

So I do think that it’s important to not blow through that window, to monitor as we’re approaching it and to monitor how long we have. But I think I’m fundamentally fairly optimistic about trying to use early transformative AI systems, like early systems that automate a lot of things, to automate the process of controlling and aligning and managing risks from the next generation of systems, who then automate the process of managing those risks from the generation after, and so on.

EA as an incubator for avant-garde causes others won't touch

Ajeya Cotra: I think there should be, and there is, a healthy, thriving “AI is going to be a big deal” ecosystem that does not take EA as a premise.

But at the same time, I think EA thinking and EA values probably do still have a lot to add. In the age of AI disruption, I think it’s going to be EAs for the most part who are thinking seriously about whether AIs themselves are moral patients and whether they should have protections and rights and how to navigate that thoughtfully against tradeoffs with safety and other goals. It’s going to be EAs that by and large are still the ones that take most seriously the possibility that AI disruption could be so disruptive that we end up locked into a certain set of societal values, that we gain the technological ability to shape the future for millions of years or billions of years and are thinking about how that should go.

There’s a lot of degrees of extremity to the AI worldview. Even if you accept that AI is going to disrupt everything in the next 10 or 20 years, the people who are thinking hardest about the most intense disruptions are going to be disproportionately EAs, because EA thinking challenges you to try and engage in that kind of very far-seeing, rigorous speculation. Even though there’s a lot of challenges with that and it’s very hard to know the future, I think EAs are the ones that try hardest to peek ahead anyway. …

In a sense, I think this might be what a healthy EA community is: it’s like an engine that incubates cause areas at a stage when they’re not very respected, they’re extremely speculative, the methodology isn’t firm yet; you just have to be extremely altruistic and extremely willing to do unconventional things — and then matures those cause areas to the point where they can stand on their own while also being a thing that many EAs work on.

Articles, books, and other media discussed in the show

Ajeya’s work:

Others’ work in this space:

Other 80,000 Hours podcast episodes:

Related episodes

About the show

The 80,000 Hours Podcast features unusually in-depth conversations about the world's most pressing problems and how you can use your career to solve them. We invite guests pursuing a wide range of career paths — from academics and activists to entrepreneurs and policymakers — to analyse the case for and against working on different issues and which approaches are best for solving them.

Get in touch with feedback or guest suggestions by emailing [email protected].

What should I listen to first?

We've carefully selected 10 episodes we think it could make sense to listen to first, on a separate podcast feed:

Check out 'Effective Altruism: An Introduction'

Subscribe here, or anywhere you get podcasts:

If you're new, see the podcast homepage for ideas on where to start, or browse our full episode archive.