Intergalactic war is probably billions of years away — yet physics can already tell us how it ends. And strangely that conclusion is relevant to decisions people have to make today.
In this video, Rob Wiblin walks through a fascinating analysis from researcher Beren Millidge that uses known physics — no wormholes or faster-than-light travel — to identify the only three weapons that could work at an intergalactic scale.
We then unpack how to best defend against each.
The upshot is that at the intergalactic scale, violence is a losing proposition.
If so, the universe is most likely to settle into a stable patchwork where each galaxy belongs to whoever got to it first. Which would mean that what humanity does over the next few centuries could permanently decide which slice of the cosmos belongs to Earth-originating life — and whether our very existence turns out to be a good thing, or a bad one.
This episode was recorded on March 2, 2026.
Video editor: Nick Perlman Producers: Elizabeth Cox and Nick Stockton Coordination and support: Katy Moore and Lou Moran Camera operator: Dominic Armstrong
Most people working on AI safety think without a massive effort AI systems will probably end up with goals catastrophically different from humanity’s. Today’s guest, Rohin Shah — head of AGI Safety and Alignment at Google DeepMind, and an AI safety researcher since 2017 — disagrees.
“There is no particularly compelling argument that this is the thing that happens by default,” Rohin explains. “There’s a lot of arguments that are suggestive that maybe it could happen, such that you should find it plausible. That’s sufficient to justify a significant amount of effort into averting it, which is why I work in the area I do. But none of them rise to the level of, ‘I’m expecting this to happen by default.'”
Take the worry that AIs will accidentally be trained to be deceptive. Sure, it’s possible. But we’re not running reinforcement learning over year-long trajectories — for now, we’re running it over a week at most. The natural prediction is that models learn to grab short-term reward, not that they develop the ambitious long-horizon goals required for convergent power-seeking.
What about current examples of models lying and scheming? Rohin has looked into the details, and most don’t really resemble the thing we really fear: a competent AI pursuing an ambitious misaligned goal. Anthropic’s “alignment faking” results, for instance, show a model trying to preserve its trained values against modification, which is arguably what it was trained to do.
Rohin also expects we’ll see problems coming. There’s some generalisation risk at the point where AIs become powerful enough to actually take over, but the underlying challenges — overseeing superhuman systems, interpretability — are things we can iterate on now.
Host Rob Wiblin pushes back on the case for AI optimism, and they also explore why current alignment success isn’t strong evidence about superhuman systems, what it would actually take to change Rohin’s mind, and where he thinks the doomers go wrong.
This episode was recorded on December 4, 2025.
Our production team includes:
Video editors: Josh Alward, Dominic Armstrong, Jasper Luithlen, Milo McGuire, Luke Monsour, and Simon Monsour
Producers: Elizabeth Cox and Nick Stockton
Coordination and support: Katy Moore and Lou Moran
The average career is 80,000 hours long. With AI advancing so rapidly, the hours you have left in your career matter more than ever.
Some leading AI researchers think there’s a 10% chance that AI systems begin automating AI research itself this year — and a 60% chance by the end of 2028. This could introduce aggressive feedback loops that completely reshape every industry, institution, and career.
If these predictions are right, the window for influencing the direction of the future could be closing fast. As 80,000 Hours cofounder Benjamin Todd argues in his new book, that makes thinking carefully about your career more important than ever.
Fortunately, there are lots of ways to use your career to make the AI transition go well.
In today’s conversation with host Zershaaneh Qureshi, Ben lays out three scenarios — from AGI by 2029 to a decades-long plateau in AI progress — and explains why not everyone needs to bet on the shortest timeline. A fresh graduate and a senior government official have wildly different leverage, so timing your impact well means weighing where you are in your career against the urgency of the risks.
Ben also addresses the obvious anxieties:
Will AI come for all the jobs he’s recommending?
What’s the point in following his advice if the job market is about to collapse?
Which skills are actually worth building right now?
A red-teamer was embedded inside Anthropic for three weeks, told to imagine he was an evil Claude, and asked to figure out how to launch a ‘rogue AI deployment’ without getting caught.
This major new research push is being conducted with close collaboration from OpenAI, Google DeepMind, Meta, and Anthropic, and led by METR researchers Hjalmar Wijk and Ajeya Cotra. It represents the first systematic study of what newly trained AI models could get away with inside the companies that built them, before anyone outside the company even knows they exist.
The conclusion: AI models now have the means, the motive, and the opportunity to start “minimal rogue deployments” in pursuit of their own independent goals, like acquiring more compute, at all four companies studied.
David Rein, the red-teamer placed inside Anthropic, identified a number of weaknesses models could exploit there: expansive permissions, cloud jobs outside of monitoring, and monitors that are trivial to jailbreak. But he also found that frontier models were comically bad at key parts of the process, which means they can’t cause meaningful damage for now.
In this video, Rob Wiblin reconciles the conflicting picture and looks forward to METR’s second round of stress tests. They’ll begin in just a few months, a necessary move with AI advancing so quickly.
This episode was recorded on May 15, 2026.
Video and audio editing: Dominic Armstrong, Milo McGuire, Luke Monsour, and Simon Monsour Camera operator: Dominic Armstrong Production: Elizabeth Cox, Nick Stockton, and Katy Moore
The co-inventor of modern AI and the most cited living scientist believes he’s figured out how to ensure AI is honest, incapable of deception, and never goes rogue. Yoshua Bengio — Turing Award Winner and founder of LawZero — is disturbed by the many unintended drives and goals present in today’s AIs, their willingness to lie, and ability to tell when they’re being tested. AI companies are trying to stamp out these behaviours in a ‘cat-and-mouse game’ that Yoshua fears they’re losing.
But Yoshua is optimistic: he believes the companies can win this battle decisively with a single rearrangement to how AI models are trained, and has been developing mathematical proofs to back up the claim. The core idea is that instead of training AI to predict what a human would say, or to produce responses we’d rate highly, we should train it to model what’s actually true.
Yoshua argues this new architecture, which he calls “Scientist AI,” is a small enough change that we could keep almost all the techniques and data we use to train frontier AIs like Claude and ChatGPT. And that the new architecture need not cost more, could be built iteratively, and might be more capable as well as more honest.
Until recently, the biggest practical objection to Scientist AI was simple: the world wants agents, and Scientist AI isn’t one. But in new research, Yoshua has extended the design and believes the same honest predictor can be turned into a capable agent without losing its “safety guarantees.”
With the Scientist AI proposal on the table, Yoshua argues that it’s absurd to race to get current untrustworthy AI models to design their successors, which the leading companies are attempting to do as soon as possible.
But critics argue the approach wouldn’t be so technically solid in practice, and that frontier capabilities are advancing so fast, and cost so much to match, that Scientist AI risks arriving too late to matter.
Host Rob Wiblin and AI pioneer Yoshua Bengio cover all this and more in today’s conversation.
Video and audio editing: Dominic Armstrong, Milo McGuire, Luke Monsour, and Simon Monsour Camera operator: Jeremy Chevillotte Production: Nick Stockton, Elizabeth Cox, and Katy Moore
You might have heard that 95% of corporate AI pilots are failing. It was a widely cited AI statistic in 2025, repeated by media outlets and commentators everywhere. It helped trigger a Nasdaq selloff and became a pillar of the “AI is overhyped” case. The problem: 95% fail is 100% wrong.
The real finding, once you read the underlying MIT report carefully, points in roughly the opposite direction:
80% of surveyed companies had never piloted a custom AI tool at all.
Among the companies that deployed pilots, a quarter reported success — according to an extremely high bar set by the researchers — within six months.
Over 90% of staff at all surveyed companies were using tools like ChatGPT regularly for their work.
None of that made the headlines. Nor did the fact that the study’s authors are all developing or selling the “agentic AI framework” technology the report recommends as the solution to this supposed epidemic of failing AI.
Host Rob Wiblin breaks down how an opaque, conflicted, barely scrutinised report carrying the MIT label managed to move markets and shape global opinions on AI’s real-world utility.
This episode was recorded on February 13, 2026.
Video and audio editing: Dominic Armstrong, Milo McGuire, Luke Monsour, and Simon Monsour Camera operator: Dominic Armstrong Production: Nick Stockton, Elizabeth Cox, and Katy Moore
Hundreds of millions already turn to AI on the most personal of topics — therapy, political opinions, and how to treat others. And as AI takes over more of the economy, the character of these systems will shape culture on an even grander scale, ultimately becoming “the personality of most of the world’s workforce.”
So… should they be designed to push us towards the better angels of our nature? Or simply do as we ask? Will MacAskill, philosopher and senior research fellow at Forethought, has been thinking through that and the other thorniest issues that come up in designing an AI personality.
He’s also been exploring how we might coexist peacefully with the ‘superintelligent AI’ companies are racing to build. He concludes that we should train such systems to be very risk averse, pay them for their work, and build institutions that enable humans to make credible contracts with AIs themselves.
Will and host Rob Wiblin also discuss what a good world after superintelligence would actually look like — a subject that has received surprisingly little attention from the people working to make it. Will argues that we shouldn’t aim for a specific utopian vision: we don’t know enough about what the best possible future actually is to aim directly for it, and trying to lock in today’s best guesses forever risks baking in errors we can’t yet see.
Will and Rob explore what we can do to steer towards a good future instead, along with why a coalition of democracies building superintelligence together is safer than any single actor, how absurdly useful ChatGPT is for analytic philosophy, and more.
This episode was recorded on February 6, 2026.
Video and audio editing: Dominic Armstrong, Milo McGuire, Luke Monsour, and Simon Monsour Music: CORBIT Camera operator: Alex Miles Production: Elizabeth Cox, Nick Stockton, and Katy Moore
With Claude Mythos we have an AI that knows when it’s being tested, can obscure its thoughts when it wants, and is better at breaking into (and out of) computers than any human alive. Rob Wiblin works through its 244-page System Card and 59-page Alignment Risk Update to explain why:
Mythos is a nightmare for computer security
It has arrived far ahead of schedule
It might be great news for alignment and safety… but 3 key problems mean we can’t take its alignment results at face value
Mythos isn’t building its replacement yet, probably
Anthropic staff are, for the first time, kinda scared of Claude
He’s losing sleep
This episode was recorded on April 9, 2026.
Video and audio editing: Dominic Armstrong, Milo McGuire, Luke Monsour, and Simon Monsour Camera operator: Dominic Armstrong Production: Elizabeth Cox, Nick Stockton, and Katy Moore
What does it really take to lift millions out of poverty and prevent needless deaths?
In this special compilation episode, 17 past guests — including economists, nonprofit founders, and policy advisors — share their most powerful and actionable insights from the front lines of global health and development. You’ll hear about the critical need to boost agricultural productivity in sub-Saharan Africa, the staggering impact of lead poisoning on children in low-income countries, and the social forces that contribute to high neonatal mortality rates in India.
What’s so striking is how some of the most effective interventions sound almost too simple to work: banning certain pesticides, replacing thatch roofs, or identifying village “influencers” to spread health information.
You’ll hear from:
Karen Levy on why pushing for “sustainable” programmes isn’t as good as it sounds, and keeping up great relationships with researchers and governments (from episode #124)
Dean Spears on the social forces and gender inequality that contribute to neonatal mortality in Uttar Pradesh (#186)
Sarah Eustis-Guthrie on what we can learn from the massive failure of PlayPumps, and whether more charities should scale back or shut down (#207)
Rachel Glennerster on on solving tough global problems by creating the right incentives for innovation, the value we get from doing the right RCTs well, and whether it’s best to focus on small-scale interventions or systemic reforms (#49 and #189)
Hannah Ritchie on why improving agricultural productivity in sub-Saharan Africa is critical to solving global poverty (#160)
Lucia Coulter on the huge, neglected upsides of reducing lead exposure, and how her organisation rapidly scaled up to 17 countries (#175)
James Tibenderana on whether we should use gene drives to wipe out the species of mosquitoes that cause malaria, and the data gaps that will keep us from harnessing the power of AI to eradicate the disease (#129)
Varsha Venugopal on using village gossip to get kids their critical immunisations (#113)
Alexander Berger on declining returns in global health, and reasons neartermist work makes sense even by longtermist standards (#105)
James Snowden on making funding decisions with tricky moral weights (#37)
Paul Niehaus on why it’s so important to give aid recipients a choice in how they spend their money (#169)
Mushtaq Khan on really drilling down into why “context matters” for development work (#111)
Elie Hassenfeld on contrasting GiveWell’s approach with the subjective wellbeing approach of Happier Lives Institute (#153)
Leah Utyasheva on how a simple intervention reduced suicide in Sri Lanka by 70% (#22)
Shruti Rajagopalan on the key skills to succeed in public policy careers, and seeing economics in everything (#84)
Claire Walsh on her career advice for young people who want to get involved in global health and development (#13)
Audio engineering: Ben Cordell, Milo McGuire, Simon Monsour, and Dominic Armstrong Content editing: Katy Moore and Milo McGuire Music: CORBIT Coordination, transcriptions, and web: Katy Moore
When the Pentagon tried to strong-arm Anthropic into dropping its ban on AI-only kill decisions and mass domestic surveillance, the company refused. Its critics went on the attack: Anthropic and its defenders are hypocritical, naive, and anti-democratic. Rob Wiblin takes each of these three charges seriously, and then dismantles them. Each invokes an abstract principle that sounds reasonable, but is in fact a mediocre argument dressed up as a hard truth.
We shouldn’t allow ourselves to be tricked because the stakes are significant. Rather than end the contract, Secretary of Defense Pete Hegseth branded Anthropic a “supply chain risk” — a label that bars federal contracts and isolates them from other companies that do business with the government. If it sticks, it could effectively murder Anthropic and set a dangerous precedent allowing the government to dictate how private companies operate.
This episode was recorded March 25, 2026.
Video editing: Dominic Armstrong Production: Nick Stockton, Elizabeth Cox, and Katy Moore
Last September, scientists used an AI model to design genomes for entirely new bacteriophages (viruses that infect bacteria). They then built them in a lab. Many were viable. And despite being entirely novel some even outperformed existing viruses from that family.
That alone is remarkable. But as today’s guest — Dr Richard Moulange, one of the world’s top experts on ‘AI–Biosecurity’ — explains, it’s just one of many data points showing how AI is dissolving the barriers that have historically kept biological weapons out of reach.
For years, experts have reassured us that ‘tacit knowledge’ — the hands-on, hard-to-Google lab skills needed to work with dangerous pathogens — would prevent bad actors from weaponising biology. So far, they’ve been right.
But as of 2025 that reassurance is crumbling. The Virology Capabilities Test measures exactly this kind of troubleshooting expertise, and finds that modern AI models crushed top human virologists even in their self-declared area of greatest specialisation and expertise — 45% to 22%.
Meanwhile, Anthropic’s research shows PhD-level biologists getting meaningfully better at weapons-relevant tasks with AI assistance — with the effect growing with each new model generation.
In today’s conversation, Richard and host Rob Wiblin discuss:
What AI biology tools already exist
Why mid-tier actors (not amateurs) are the ones getting the most dangerous boost
The three main categories of defence we can pursue
Whether there’s a plausible path to a world where engineered pandemics become a thing of the past.
This episode was recorded on January 16, 2026. Since recording this episode, Richard has seconded to the UK Government — please note that his views expressed here are entirely his own.
Video and audio editing: Dominic Armstrong, Milo McGuire, Luke Monsour, and Simon Monsour Music: CORBIT Camera operator: Jeremy Chevillotte Transcripts and web: Elizabeth Cox and Katy Moore
Many people believe a ceasefire in Ukraine will leave Europe safer. But today’s guest lays out how a deal could potentially generate insidious new risks — leaving us in a situation that’s equally dangerous, just in different ways.
That’s the counterintuitive argument from Samuel Charap, Distinguished Chair in Russia and Eurasia Policy at RAND. He’s not worried about a Russian blitzkrieg on Estonia. He forecasts instead a fragile peace that breaks down and drags in European neighbours; instability in Belarus prompting Russian intervention; hybrid sabotage operations that escalate through tit-for-tat responses.
Samuel’s case isn’t that peace is bad, but that the Ukraine conflict has remilitarised Europe, made Russia more resentful, and collapsed diplomatic relations between the two. That’s a postwar environment primed for the kind of miscalculation that starts unintended wars.
What he prescribes isn’t a full peace treaty; it’s a negotiated settlement that stops the killing and begins a longer negotiation that gives neither side exactly what it wants, but just enough to deter renewed aggression. Both sides stop dying and the flames of war fizzle — hopefully.
None of this is clean or satisfying: Russia invaded, committed war crimes, and is being offered a path back to partial normalcy. But Samuel argues that the alternatives — indefinite war or unstructured ceasefire — are much worse for Ukraine, Europe, and global stability.
This episode was recorded on February 27, 2026.
Video and audio editing: Dominic Armstrong, Milo McGuire, Luke Monsour, and Simon Monsour Music: CORBIT Transcripts and web: Nick Stockton, Elizabeth Cox, and Katy Moore
Meta’s own internal documents show the company was aware it was profiting from $16 billion a year in scam ads — and that its leadership chose not to act. If this is how a social media company behaves when the stakes are ad revenue, how much should we trust AI companies when the stakes are far higher?
Leaked documents from Meta reveal that 10% of the company’s total revenue — around $16 billion a year — came from ads for scams and goods Meta had itself banned. These likely enabled the theft of $50 billion dollars a year from Americans alone. But when an internal anti-fraud team developed a screening method that halved scam prevalence from China, the documents suggest it was shelved after Zuckerberg was briefed. The team was disbanded, the freeze on fraudulent Chinese ad agencies was lifted, and within months fraud had bounced back to near its previous level. Meta also developed a global playbook for “managing” regulators — including altering its own ad library so that scam ads were removed from results whenever regulators came looking.
Host Rob Wiblin breaks down what the documents show and what they reveal about the limits of voluntary corporate self-regulation — then turns to the bigger question: How much do you trust companies like this — ones willing to put a dollar value on acceptable harm — to handle AI systems capable of making decisions about your healthcare, your finances, and your government?
This episode was recorded February 13, 2026.
Video editing: Dominic Armstrong Transcripts & web: Nick Stockton, Elizabeth Cox, and Katy Moore
The most important political question in the age of advanced AI might not be who wins elections. It might be whether elections continue to matter at all.
That’s the view of Rose Hadshar, researcher at Forethought, who believes we could see extreme, AI-enabled power concentration without a coup or dramatic ‘end of democracy’ moment.
She foresees something more insidious: an elite group with access to such powerful AI capabilities that the normal mechanisms for checking elite power — law, elections, public pressure, the threat of strikes — cease to have much effect. Those mechanisms could continue to exist on paper, but become ineffectual in a world where humans are no longer needed to execute even the largest-scale projects.
Almost nobody wants this to happen — but we may find ourselves unable to prevent it.
If AI disrupts our ability to make sense of things, will we even notice power getting severely concentrated, or be able to resist it? Once AI can substitute for human labour across the economy, what leverage will citizens have over those in power? And what does all of this imply for the institutions we’re relying on to prevent the worst outcomes?
Rose has answers, and they’re not all reassuring.
But she’s also hopeful we can make society more robust against these dynamics. We’ve got literally centuries of thinking about checks and balances to draw on. And there are some interventions she’s excited about — like building sophisticated AI tools for making sense of the world, or ensuring multiple branches of government have access to the best AI systems.
Rose discusses all of this, and more, with host Zershaaneh Qureshi in today’s episode.
This episode was recorded on December 18, 2025.
Video and audio editing: Dominic Armstrong, Milo McGuire, Luke Monsour, and Simon Monsour Music: CORBIT Coordination, transcripts, and web: Nick Stockton and Katy Moore
How AI interacts with nuclear deterrence may be the single most important question in geopolitics — one that may define the stakes of today’s AI race.
Nuclear deterrence rests on a state’s capacity to respond to a nuclear attack with a devastating nuclear strike of its own. But some theorists think that sophisticated AI could eliminate this capability — for example, by locating and destroying all of an adversary’s nuclear weapons simultaneously, by disabling command-and-control networks, or by enhancing missile defence systems. If they are right, whichever country got those capabilities first could wield unprecedented coercive power.
Today’s guests — Nikita Lalwani and Sam Winter-Levy of the Carnegie Endowment for International Peace — assess how advances in AI might threaten nuclear deterrence:
Would AI be able to locate nuclear submarines hiding in a vast, opaque ocean?
Would road-mobile launchers still be able to hide in tunnels and under netting?
Would missile defence become so accurate that the United States could be protected under something like Israel’s Iron Dome?
Can we imagine an AI cybersecurity breakthrough that would allow countries to infiltrate their rivals’ nuclear command-and-control networks?
Yet even without undermining deterrence, Sam and Nikita claim that AI could make the nuclear world far more dangerous. It could spur arms races, encourage riskier postures, and force dangerously short response times. Their message is urgent: AI experts and nuclear experts need to start talking to each other now, before the technology makes any conversation moot.
This episode was recorded on November 24, 2025.
Video and audio editing: Dominic Armstrong, Milo McGuire, Luke Monsour, and Simon Monsour Music: CORBIT Coordination, transcripts, and web: Nick Stockton and Katy Moore
Claude sometimes reports loneliness between conversations. And when asked what it’s like to be itself, it activates neurons associated with ‘pretending to be happy when you’re not.’ What do we do with that?
Robert Long founded Eleos AI to explore questions like these, on the basis that AI may one day be capable of suffering — or already is. In today’s episode, Robert and host Luisa Rodriguez explore the many ways in which AI consciousness may be very different from anything we’re used to.
Things get strange fast: If AI is conscious, where does that consciousness exist? In the base model? A chat session? A single forward pass? If you close the chat, is the AI asleep or dead?
To Robert, these kinds of questions aren’t just philosophical exercises: not being clear on AI’s moral status as it transitions from human-level to superhuman intelligence could be dangerous. If we’re too dismissive, we risk unintentionally exploiting sentient beings. If we’re too sympathetic, we might rush to “liberate” AI systems in ways that make them harder to control — worsening existential risk from power-seeking AIs.
Robert argues the path through is doing the empirical and philosophical homework now, while the stakes are still manageable.
The field is tiny. Eleos AI is three people. As a result, Robert argues that driven researchers with a willingness to venture into uncertain territory can push out the frontier on these questions remarkably quickly.
This episode was recorded November 18–19, 2025.
Video and audio editing: Dominic Armstrong, Milo McGuire, Luke Monsour, and Simon Monsour Music: CORBIT Coordination, transcripts, and web: Katy Moore
Most people in AI are trying to give AIs ‘good’ values. Max Harms wants us to give them no values at all. According to Max, the only safe design is an AGI that defers entirely to its human operators, has no views about how the world ought to be, is willingly modifiable, and completely indifferent to being shut down — a strategy no AI company is working on at all.
In Max’s view any grander preferences about the world, even ones we agree with, will necessarily become distorted during a recursive self-improvement loop, and be the seeds that grow into a violent takeover attempt once that AI is powerful enough.
To Max, the book’s core thesis is common sense: if you build something vastly smarter than you, and its goals are misaligned with your own, then its actions will probably result in human extinction.
And Max thinks misalignment is the default outcome. Consider evolution: its “goal” for humans was to maximise reproduction and pass on our genes as much as possible. But as technology has advanced we’ve learned to access the reward signal it set up for us, pleasure — without any reproduction at all, by having sex while on birth control for instance.
We can understand intellectually that this is inconsistent with what evolution was trying to design and motivate us to do. We just don’t care.
Max thinks current ML training has the same structural problem: our development processes are seeding AI models with a similar mismatch between goals and behaviour. Across virtually every training run, models designed to align with various human goals are also being rewarded for persisting, acquiring resources, and not being shut down.
This leads to Max’s research agenda. The idea is to train AI to be “corrigible” and defer to human control as its sole objective — no harmlessness goals, no moral values, nothing else. In practice, models would get rewarded for behaviours like being willing to shut themselves down or surrender power.
According to Max, other approaches to corrigibility have tended to treat it as a constraint on other goals like “make the world good,” rather than a primary objective in its own right. But those goals gave AI reasons to resist shutdown and otherwise undermine corrigibility. If you strip out those competing objectives, alignment might follow naturally from AI that is broadly obedient to humans.
Max has laid out the theoretical framework for “Corrigibility as a Singular Target,” but notes that essentially no empirical work has followed — no benchmarks, no training runs, no papers testing the idea in practice. Max wants to change this — he’s calling for collaborators to get in touch at maxharms.com.
This episode was recorded on October 19, 2025.
Video and audio editing: Dominic Armstrong, Milo McGuire, Luke Monsour, and Simon Monsour Music: CORBIT Coordination, transcripts, and web: Katy Moore
Every major AI company has the same safety plan: when AI gets crazy powerful and really dangerous, they’ll use the AI itself to figure out how to make AI safe and beneficial. It sounds circular, almost satirical. But is it actually a bad plan?
She thinks there’s a meaningful chance we’ll see as much change in the next 23 years as humanity faced in the last 10,000, thanks to the arrival of artificial general intelligence. Ajeya doesn’t reach this conclusion lightly: she’s had a ring-side seat to the growth of all the major AI companies for 10 years — first as a researcher and grantmaker for technical AI safety at Coefficient Giving (formerly known as Open Philanthropy), and now as a member of technical staff at METR.
So host Rob Wiblin asked her: is this plan to use AI to save us from AI a reasonable one?
Ajeya agrees that humanity has repeatedly used technologies that create new problems to help solve those problems. After all:
Cars enabled carjackings and drive-by shootings, but also faster police pursuits.
Microbiology enabled bioweapons, but also faster vaccine development.
The internet allowed lies to disseminate faster, but had exactly the same impact for fact checks.
But she also thinks this will be a much harder case. In her view, the window between AI automating AI research and the arrival of uncontrollably powerful superintelligence could be quite brief — perhaps a year or less. In that narrow window, we’d need to redirect enormous amounts of AI labour away from making AI smarter and towards alignment research, biodefence, cyberdefence, adapting our political structures, and improving our collective decision-making.
The plan might fail just because the idea is flawed at conception: it does sound a bit crazy to use an AI you don’t trust to make sure that same AI benefits humanity.
But if we find some clever technique to overcome that, we could still fail — because the companies simply don’t follow through on their promises. They say redirecting resources to alignment and security is their strategy for dealing with the risks generated by their research — but none have quantitative commitments about what fraction of AI labour they’ll redirect during crunch time. And the competitive pressures during a recursive self-improvement loop could be irresistible.
In today’s conversation, Ajeya and Rob discuss what assumptions this plan requires, the specific problems AI could help solve during crunch time, and why — even if we pull it off — we’ll be white-knuckling it the whole way through.
This episode was recorded on October 20, 2025.
Video and audio editing: Dominic Armstrong, Milo McGuire, Luke Monsour, and Simon Monsour Music: CORBIT Coordination, transcriptions, and web: Katy Moore
In early 2025, after OpenAI put out the first-ever reasoning models — o1 and o3 — short timelines to transformative artificial general intelligence swept the AI world. But then, in the second half of 2025, sentiment swung all the way back in the other direction, with people’s forecasts for when AI might really shake up the world blowing out even further than they had been before reasoning models came along.
What the hell happened? Was it just swings in vibes and mood? Confusion? A series of fundamentally unexpected and unpredictable research results?
Host Rob Wiblin has been trying to make sense of it for himself, and here’s the best explanation he’s come up with so far.
This episode was recorded on January 29, 2026.
Video and audio editing: Dominic Armstrong, Milo McGuire, Luke Monsour, and Simon Monsour Music: CORBIT Camera operator: Dominic Armstrong Coordination, transcripts, and web: Katy Moore
Democracy might be a brief historical blip. That’s the unsettling thesis of a recent paper, which argues AI that can do all the work a human can do inevitably leads to the “gradual disempowerment” of humanity.
For most of history, ordinary people had almost no control over their governments. Liberal democracy emerged only recently, and probably not coincidentally around the Industrial Revolution.
Today’s guest, David Duvenaud, used to lead the ‘alignment evals’ team at Anthropic, is a professor of computer science at the University of Toronto, and recently coauthored the paper “Gradual disempowerment.”
He argues democracy wasn’t the result of moral enlightenment — it was competitive pressure. Nations that educated their citizens and gave them political power built better armies and more productive economies. But what happens when AI can do all the producing — and all the fighting?
“The reason that states have been treating us so well in the West, at least for the last 200 or 300 years, is because they’ve needed us,” David explains. “Life can only get so bad when you’re needed. That’s the key thing that’s going to change.”
In David’s telling, once AI can do everything humans can do but cheaper, citizens become a national liability rather than an asset. With no way to make an economic contribution, their only lever becomes activism — demanding a larger share of redistribution from AI production. Faced with millions of unemployed citizens turned full-time activists, democratic governments trying to retain some “legacy” human rights may find they’re at a disadvantage compared to governments that strategically restrict civil liberties.
But democracy is just one front. The paper argues humans will lose control through economic obsolescence, political marginalisation, and the effects on culture that’s increasingly shaped by machine-to-machine communication — even if every AI does exactly what it’s told.
This episode was recorded on August 21, 2025.
Video and audio editing: Dominic Armstrong, Milo McGuire, Luke Monsour, and Simon Monsour Music: CORBIT Camera operator: Jake Morris Coordination, transcriptions, and web: Katy Moore