#234 – David Duvenaud on why ‘aligned AI’ could still kill democracy

By Robert Wiblin · Published January 27th, 2026 ·

Democracy might be a brief historical blip. That’s the unsettling thesis of a recent paper, which argues AI that can do all the work a human can do inevitably leads to the “gradual disempowerment” of humanity.

For most of history, ordinary people had almost no control over their governments. Liberal democracy emerged only recently, and probably not coincidentally around the Industrial Revolution.

Today’s guest, David Duvenaud, used to lead the ‘alignment evals’ team at Anthropic, is a professor of computer science at the University of Toronto, and recently coauthored the paper “Gradual disempowerment.”

He argues democracy wasn’t the result of moral enlightenment — it was competitive pressure. Nations that educated their citizens and gave them political power built better armies and more productive economies. But what happens when AI can do all the producing — and all the fighting?

“The reason that states have been treating us so well in the West, at least for the last 200 or 300 years, is because they’ve needed us,” David explains. “Life can only get so bad when you’re needed. That’s the key thing that’s going to change.”

In David’s telling, once AI can do everything humans can do but cheaper, citizens become a national liability rather than an asset. With no way to make an economic contribution, their only lever becomes activism — demanding a larger share of redistribution from AI production. Faced with millions of unemployed citizens turned full-time activists, democratic governments trying to retain some “legacy” human rights may find they’re at a disadvantage compared to governments that strategically restrict civil liberties.

But democracy is just one front. The paper argues humans will lose control through economic obsolescence, political marginalisation, and the effects on culture that’s increasingly shaped by machine-to-machine communication — even if every AI does exactly what it’s told.

This episode was recorded on August 21, 2025.

Video and audio editing: Dominic Armstrong, Milo McGuire, Luke Monsour, and Simon Monsour
Music: CORBIT
Camera operator: Jake Morris
Coordination, transcriptions, and web: Katy Moore

The episode in a nutshell

David Duvenaud, ex-Anthropic team lead and professor of computer science at the University of Toronto, argues that humanity faces a high risk of gradual disempowerment.

The core of his thesis is that even if we solve the “AI alignment problem” — ensuring AIs faithfully follow the goals of the people operating them — civilisational forces could still push us toward an outcome where humans lose control over the future. David places the probability of “doom” (the destruction of much of what humanity values) by 2100 at 70–80%.

Economic disempowerment: From unemployment to resource competition

The first driver of disempowerment is the transition of humans from essential producers to “meddlesome parasites” in a machine-led economy:

The breakdown of human labour: While comparative advantage suggests humans will always have some job, high transaction costs and human unreliability make us “scary” or “irresponsible” to involve in high-speed, machine-centric production.
Capital markets abandon humans: Investors will eventually stop funding “human capital” (like universities) because it provides lower returns than fully automated solutions.
The Malthusian endgame: Even if humans initially capture AI-generated wealth through property rights, AIs will eventually outcompete us for basic resources like land and energy. Keeping “legacy humans” alive may eventually be seen as “criminally decadent” or “speciesist” because those same resources could support millions of morally “superior” virtual beings.

Political disempowerment: States no longer need their citizens

David argues that liberal democracy is a historical “aberration” that emerged because states needed educated, free citizens to be economically and militarily competitive.

Loss of leverage: Once AIs do all the work and fighting, governments no longer need to “nurture” their populations to gain power.
The instability of UBI: If universal basic income (UBI) becomes the only source of survival, politics becomes a high-stakes “activism war” where groups fight for their share of a fixed pie.
The autocracy trap: Governments that don’t disempower their meddlesome, unemployed citizens will face constant unrest and competitive disadvantages compared to more efficient, machine-led regimes.

Cultural disempowerment: The rise of machine-to-machine memes

Culture has historically been a tool for human flourishing, but David warns it is becoming an independent replicator that may drift in anti-human directions:

Selection for growth, not joy: Competitive pressures select for “locust-like” beings that prioritise growth for growth’s sake, rather than “idiosyncratic” human values like children’s laughter or art.
Machine-dominated narratives: As humans spend more time interacting with AIs, machine-to-machine cultural transmission will eventually bypass human influence entirely.
Moral optical illusions: We often assume future progress will look like our current values getting “better,” but David argues that unconstrained moral drift will likely lead to a future that looks “alien and wrong” from our current perspective.

Why alignment doesn’t save us

Even if your personal AI is perfectly aligned with you, “business as usual” competition creates a race to the bottom.

Coordination failure: As we do with world wars or corporate bureaucracy, we can see the disaster coming but still find it impossible to coordinate a stop.
Hobbled civilian AIs: Governments are unlikely to allow citizens to have AIs powerful enough to organise a coup or effective protest; instead, the state will keep the most capable models for itself.
Internal alignment tax: Just as the human body spends resources fighting cancer, a coordinated global civilisation must pay a high “alignment tax” to police competitive agents that would otherwise eat the surplus.

What can be done?

David admits this research is in its “beta” stage, but points to several emerging projects:

Forecasting with “historical LLMs”: Training models on data only up to a specific past date (e.g. 1950) to “back-test” their ability to predict the future, which could help us navigate policy-conditional outcomes today.
The Gradual Disempowerment Index: An initiative to operationalise and track how much agency humans are actually losing over time.
AI constitutions: Treating the “system prompts” and principles loaded into AIs with the same legal and social gravity as national constitutions.
Workshop Labs: A project attempting to extend the “human-complementary” era by allowing people to create and own digital “clones” of their own expertise.

Highlights

Humans will become "meddlesome parasites" in a machine economy

Rob Wiblin: Let’s talk more now about the economic disempowerment mechanism. If it’s the case that all AIs are basically just owned and operated by humans, we’re not really becoming economically disempowered in the sense of having less income. Because all of the work that the AIs would do, all of the profit that they would generate, all of the surplus that’s created by their ability to do amazing things for very little cost — all of that will flow back to human beings who then will be richer and in a sense more empowered than ever before.
Why isn’t that such a strong protection that we should feel pretty good about this?
David Duvenaud: First of all, I’ll say I think it’s a great idea to try to set up these kinds of protections. Probably a good end looks like involving a lot of well-thought-through mechanisms to ensure that surplus is always available to humans.
[An] example is the English aristocracy before the Industrial Revolution: they own all the land, they have all the political connections, they can see what’s happening, they mostly know these entrepreneurs — but somehow there ends up being this giant new source of wealth created that they mostly don’t participate in. And as far as I understand, they ended up a little bit poorer in absolute terms, although the civilisation ended up much richer overall.
And similarly with the monarchy, it’s like the king owns everything, has absolute power: how is it that kings end up in this figurehead role where they have very little room to manoeuvre and they end up capturing a very small surplus?
But the big picture is that there’s going to be this sort of small rump of legacy humans, maybe who have de facto ownership of this giant machine economy that’s going to be maybe hundreds or thousands or more times as big as the current one. And they’re not going to be producing value, they’re not going to be deciding what’s going on. So at this point, it’s not clear that they end up having de facto property rights respected.
And there’s lots of reasons to think maybe we still will respect property rights, and it’ll be very cheap to keep humans alive. This is definitely not a foregone conclusion, and this is one of the fuzzier parts of this whole story. But it just seems like it’s very scary to be this sort of useless head of state of this giant machine economy that you don’t understand. Everyone involved in setting courses, running things, the government doesn’t necessarily share your cultural point of view or even think that you deserve good outcomes in the long run more than some much more interesting, powerful, charismatic beings that you are now competing with culturally.
So again, I don’t have a slam-dunk story for how the humans end up not capturing some small rent forever. That’s plausible. It just seems like we’ll be in a very vulnerable position.

Humans will become a "criminally decadent" waste of energy

David Duvenaud: In the short run, this probably looks really good for the average human, in the sense of the cost of almost every service going way down, services just getting way better, and the cost of almost every good also probably going way down. And there might be actually a long period where things are kind of OK — in the sense that humans are sort of disempowered, but there’s not really much pressure to disempower them more. We’re all just sort of enjoying our luxury apartments or something like that, while the machine economy is just growing and growing.
Then I think this sort of scary phase comes later on, when there’s been enough doublings that some basic resource is starting to become scarce again. Maybe it’s land, maybe it’s power. I don’t really have strong opinions on what exactly it is, but the idea is that eventually we do hit some sort of Malthusian-ish limit, and we have to actually start competing with the machines for some basic resource.
Of course, along the way I think on a faster scale this sort of Malthusian competition for political power might be lost by humans. But let’s just not worry about that for now.
Rob Wiblin: OK, so what happens is the economy continues to grow. Humans, possibly even those who are receiving a small share of the income, because their productivity has risen so much, because the economy has grown so much, their absolute level of income might still have risen. They might be much richer or able to consume much more than they can today.
But then the next challenge for them would be that, as the AI and robot economy basically expands across the entire Earth and is doing all of the productive stuff that it can, humans to some extent get edged out in terms of literally surface area of the Earth.
You need energy, you need space to grow food and to have a comfortable environment for humans — and the opportunity costs of setting aside that space for human beings to live and have a good time and grow their food and so on is going up, because technology is advancing. We’re figuring out how to squeeze more and more AI, more and more productivity, more whatever we value out of each square kilometre of surface on the Earth — so it’s becoming potentially more expensive to keep humans alive than it was before, at least in terms of what we’re giving up.
And then possibly some humans won’t actually be able to afford that increasing price, because their income won’t be going up as fast as that.
David Duvenaud: Yeah, exactly. The scenario I have in mind is that I have my like one acre of land or my big luxury apartment with me and my family. And we’ve made our peace with kind of opting out of the economy, but we have our little sort of commune or whatever that we’re happy to live in, in unimaginable luxury and wealth in some senses.
And the government or the rest of the economy or something starts to view this as sort of criminally decadent — that this small group of humans, like maybe 10 or 100, are using this entire acre of land and this amount of energy and sunshine to keep these small brains working for no particular benefit but their own, when those same resources could be used to simulate maybe like millions of much more sympathetic, morally superior on whatever axis virtual beings. So it’ll start to be seen as selfish — as you say, high opportunity cost: a sort of irresponsible use of resources to keep some legacy humans around.

Political disempowerment: Governments stop needing people

Rob Wiblin: OK, let’s talk about political disempowerment now. How do you imagine that happening and progressing over time?
David Duvenaud: One thing is just that I think human politicians will gradually let themselves be more puppeted, or become like passthroughs for things like ChatGPT. And this isn’t necessarily a bad thing in the short run. Good politicians already rely heavily on human advisors, and I think machine advisors are going to be able to make our political parties and representation mechanisms work better in a lot of ways. So the politicians that use AIs just for normal everyday business are going to be more effective, and we’re going to feel like they represent our interests better, in the short run at least.
The other big thing that’s going to be going on is I think people are going to be afraid of losing their jobs. And every politician is going to have something to say about this, and say, “I’m the ‘pro-human, no AI is ever going to take your job’ politician.” But they’re just not going to have viable policy levers to actually slow automation. And just in general, people don’t get votes on things where the government is really constrained or they think it’s important enough. No one ever had a referendum on should we build nuclear weapons, for instance.
I think it’s also going to be the case that governments’ hands will just be tied. They’ll all say, “I’m going to have humans in the loop or human oversight or more direct human representation” — but it’s just going to be so ineffective that when the rubber hits the road, those policies are just not going to be implemented. And it’s going to frustrate voters every time, and they’re going to say, “No, the next time we want to vote for the one who’s really going to represent human in the loop interests” — or whatever it is that seems most scary — but they just won’t be able to vote for their policy preferences.
…
One thing I’ll say is I don’t exactly fear some new particular party getting in power and staying that way. Rather, it’s going to be more that any party that does get in power is going to be so constrained by competitive pressures that they are forced to basically disempower the population.
Rob Wiblin: How so?
David Duvenaud: If you let people actually do the civil disobedience or whatever that they can do today, roughly that kind of is tolerable when most people have jobs, most people have a bunch of important responsibilities, and they can’t all just block roads all day or something like that.
But in a world where maybe 30% or 40% of people just have this huge amount of free time and energy, it just will be untenable, and the state will collapse if they actually let everybody do this sort of agitation at the effectiveness that they can today.
Rob Wiblin: So it seems like there’s a bit of an internal tension here where you’re saying, on the one hand, people are going to lose their political power, but they’ll have more time to make trouble than ever and more ability to make trouble than ever because they’ll be able to get AIs to assist them. You’re saying it’s almost because they’re able to be such strong advocates or be such potent activists that the government will feel the need to crack down on them, and that will be the proximate cause of them losing their political freedom?
David Duvenaud: Yeah, exactly. And then the other thing is that there was this countervailing force where you just need people to go to work, so they have to be able to move freely and do their own business without constantly getting permission from the government. And there just won’t be that pressure on governments to allow freedom anymore.

The death of liberalism?

Rob Wiblin: What are the ways in which you think liberalism might be less competitive as a system, and a less attractive, less appealing a way of organising society post-AGI than it is today?
David Duvenaud: I think it’ll be a less desirable way to organise society for a few reasons, but the main one is just the zero sumness of UBI.
Right now, when we all create our own wealth, it doesn’t really hurt me if someone else creates their own wealth directly from resources. But in the world where we’re all just living in some apartments, advocating for UBI, to the extent that the UBI pie is fixed then we’re really just like a bunch of baby birds cheeping, and whoever gets food is less food for the other guy.
This also erodes the pluralism of values, because the government’s going to have to have some way of deciding who gets resources. If they end up having any opinions about what way of life is more valuable or needs to be subsidised more or whatever, that could be a threat to you. So you kind of have to argue, “That guy’s way of life is less deserving of resources than my way of life,” and now the government is forced to decide de facto who gets subsidised. That’s the main effect.
Rob Wiblin: I see. So we almost have to imagine a hypothetical society in which no one can make anything. There’s no economic production occurring, at least among this group. There’s just a fixed endowment of resources that they happen to have found — a certain amount of food, a certain amount of houses and all of that — and they’ve got to figure out how to organise themselves.
I guess it’s not necessarily desirable for me for you to have free speech and to be able to advocate for yourself all that well, or to be able to educate yourself and become more powerful and influential — because it is completely zero sum. The more influential you become, the more you’ll be able to advocate for getting stuff that is literally like food out of my mouth, or money out of my bank account. Is that the main thing that has changed?
David Duvenaud: Right. So that’s the big thing that’s changed. There is of course a way in which this might not be zero sum. If humanity manages to convince the AIs or whatever government to give a larger UBI overall, then that is the normal, positive-sum thing. So that might not be a slam-dunk argument.
The other thing though is that we haven’t had to fear domination by other groups very much. We’ve had strong property rights; I’m not afraid that Elon Musk is going to literally take my stuff, even though he could raise a private army or whatever. We have very little variation in reproductive rates, so it’s kind of OK that the Amish live nearby — because even if they’re having more kids than whatever other population, that’s not going to matter over the course of 50 or maybe even 100 years.
Then maybe another thing is just the rough egalitarianism in terms of intelligence and power level of people. There’s definitely very meaningful variation amongst humans in terms of just raw smarts, but people often say, like von Neumann somehow didn’t take over the Earth, right? And he might have wanted to.
These are all reasons why it was sort of fine to just let other people become more powerful in the past that might change in a big way.

Is humans losing control actually bad, ethically?

Rob Wiblin: What’s the moral philosophy underlying this whole perspective? Is it that this is bad because it’s going to be bad for human beings; it’s going to lead to an Earth that is not a good time for you and me and our kids? Or is it bad because it’s going to lead the rest of the universe to be kind of wasted on something that’s useless or harmful or not as good as it could have been? What’s the moral perspective that you’re bringing or that your coauthors are bringing?
David Duvenaud: Sure, sure. I think a lot of people might say, “You don’t have much moral imagination. Why are you insisting on these human wellbeing or human desires, when we know that in principle there’s definitely going to be more morally deserving things in the future?” Or something like that.
My basic answer is that in some sense we decide what is morally deserving. And it would be really surprising if, for those beings to exist in the best possible world, we all had to die and have some terrible time. So we basically don’t have to decide between these different views, and we can just say, let’s try to make sure that something like existing humans get to decide roughly what’s done with the rest of the universe or the future or whatever. If that involves having these sort of Amish style, leave Earth as a nature preserve, whatever it is, let’s just let ourselves decide, and not let it be up to some sort of race to the bottom Molochian dynamics, where we end up choosing something that no one endorses.
Rob Wiblin: Yeah. I think the reason this matters is that some people, the thing that they want to do is work to ensure that humanity has a great time, or that the Earth is good for themselves and their children, which is going to raise one set of concerns. Other people want to use their career, or the thing they want to lean on is ensuring that the future of civilisation or the future of humanity or the future of intelligent life is good.
Do you think that the case for worrying about gradual disempowerment is stronger on one of these than the other? Or do you think that they tend to go together?
David Duvenaud: They’re basically the same. I think it would be really weird if we somehow accidentally killed and disempowered existing humans, and ended up building some future that those humans would otherwise really endorse.
I think that the default is there’s some locust-like beings that just like growth for growth’s sake, and that’s the default thing that all evolutionary pressures select for. And maybe those beings are pretty cool, I don’t know — and if they are, then it doesn’t really matter what we do, so we don’t really have to worry about that scenario. But the scenario where they’re just kind of like this grey goo that we think is a big waste, that’s what we need to avoid.
And if you and I are on Earth, flourishing for a long time, and the state and all our civilisational apparatus is acting in our interests, and we decide that actually it would be amazing to create this type of future, then just as part of serving our interests, we would end up creating that amazing future.
Rob Wiblin: I think it’s not such a given that they necessarily go together. I guess you were saying it’s like humans who decide what is good. I suppose you’re like an anti-realist?
David Duvenaud: Yeah. But that doesn’t mean that I don’t take morality very seriously. I’m just trying to say the sort of fact of the matter is determined by what’s in our heads, and then also whatever conditions that imposes on the world being good.
Rob Wiblin: Well, I guess if you think that there is something that is objectively valuable, independent of whether people believe that, then you could have a future in which humans are disempowered and perhaps the machines end up going and doing that. And no human would have endorsed it at the time, but that could still potentially be a good thing.
I guess if you have also a view on which it’s good to satisfy human preferences, but it will also be good to satisfy machine preferences or AI preferences at the point perhaps where they’re conscious or they have subjective experiences, then you might be a little bit less stressed about handing over control or handing over resources to AIs to pursue their own agenda.
David Duvenaud: I think it’s a really weird corner case to imagine this world where we die, but then our desires are ultimately fulfilled. That just seems like, yes, in principle it could happen, but it would be this weird corner case — because probably if we die, something else has gone horribly wrong.
Rob Wiblin: So you’re saying, whatever it is that we want to happen, why don’t we just maintain control, so that then we can decide whether that is the thing that is happening or not?
David Duvenaud: Exactly.

How important is this problem compared to other AGI issues?

Rob Wiblin: How do you weigh up the importance of this set of ways that things could go bad against all the other ways that things could potentially also go bad? Or the possibility that things are actually quite boring?
David Duvenaud: I guess I’ll say I spend a bunch of time at Anthropic working on the more acute loss of control standard AI safety kind of stuff. And I am still very worried about this sort of thing. As I said, to me the modal future is we get some way along gradual disempowerment and then we screw up alignment actually, or there’s some just much faster takeover.
So I guess I’ll say in absolute terms, normal loss of control AI safety research is still massively underinvested in. In relative terms, I think this more speculative future “how do we align civilisation” question is even more underinvested in — with the major caveat that it’s just way harder to make progress on.
And in a sense it’s less neglected. One of the big things I say is what we need to do is upgrade our sense making and governance and forecasting and coordination mechanisms. All of these things need to be much better and more reliable before the writing is too much on the wall that “there’s no alpha in humans” and “don’t listen to humans” and we lose de facto power. But that’s not a very controversial thing, right? No one’s against better institutions, basically. So they’re not neglected in that sense.
What I do think is neglected, again, is thinking about this institution design, A), with LLMs as this new tool that we can use to help do a better job, and B), with this more radical futurism approach, and saying the stakes are high — it’s not just a question of do we get better outcomes on the margin; it’s more like do we get good outcomes at all?
Rob Wiblin: So what’s your breakdown of probability of doom or probability of a bad outcome from acute disempowerment versus gradual disempowerment?
David Duvenaud: Let me say first of all, by “doom” I mean something like, by 2100, the world is in a state where I can see that almost everything that I value has been destroyed. Maybe we’re not literally dead, but we’ve been forced to be uploaded in some very unfavourable conditions, where it’s just like some crappy lossy copy that never gets run. And I feel like whatever dynamics are in charge of our civilisation are just not going to optimise for anything that seems like it’s going to be valuable.
And I guess I would say something like 70% to 80%. Just because, again, we’re up against competition. I think by my standards, solving or avoiding this kind of fate looks like radically different outcomes than any other sort of being or group of beings has had in history.
From my point of view, every animal has been in a situation where it has to either evolve into something unrecognisable and sort of morally alien to it, or die. And we’re sort of by default in that situation too — and by default, we end up being replaced by something that’s more competitive than us and is probably very morally alien, and again, cares about growth and nothing important.
There’s a small chance that if we allow competition to flourish, that there’s a bunch of amazing beings having awesome lives. And I’m like, actually that’s really cool, even though I don’t get to be part of it. But I guess I’m very parochial in the sense that I’m like, me and my family, if we all die, that’s just so bad that I almost consider that doom if most of humanity is in a similar situation. So if it is just that we have runaway competition and we get replaced by some relatively interesting grey goo, I’m still like, that’s kind of doom.
Rob Wiblin: I see. And how much lower would your p(doom) be if you felt that a very dynamic future full of lots of intelligent beings, doing stuff admittedly that you presently don’t find very beautiful, if you thought that was a good future?
David Duvenaud: It’s very small. Then the fear is more like what Robin Hanson fears: that we end up locking in some very parochial set of values. And maybe it’s a matter of taste, but I still think that to me it looks like competition is probably going to win at the top level.
So this reduces to: what’s the probability that there ends up being this stable hegemon that mostly gets values wrong? I’d say that that’s only probably 5% or 10%. My p(doom) if I think that just nature flourishing or competition flourishing was valuable would probably be only 5% or 10%.

Do we know anything useful to do about this?

Rob Wiblin: I’ve got to say, I feel like this whole set of ideas is at a relatively early stage. It feels like we’re at a sort of beta version of the gradual disempowerment concerns.
The most obvious thing that I think has to be done is getting much more to grips with all these different dynamics, trying to really have a lot of debate about how strong will this effect be, how strong will that effect be? Maybe some of them can be crossed off the list or relegated to the second tier. Other things can be promoted as like, this is going to be sort of the primary effect. And then mapping out the different scenarios, and maybe having half a dozen that seem at least plausible to a decent number of people, and then we can start to organise our thoughts a bit more around those.
Do you agree that that is kind of the first order of business here, or that’s the most obvious order of business here?
David Duvenaud: Oh, absolutely. Part of the reason I wanted to come on this podcast is to just do such an amateur job and insultingly naive version of this analysis that hopefully the sociologists and historians and economists and maybe the public intellectuals of the world will feel baited into saying, “I can do a better job of analysing these things than David.” And I’m like, please, please: be my guest. I’m a computer scientist; I’m an amateur in all these things.
I think the big thing that’s mostly been missing from current people who have expertise — that could and should, I think, be contributing to this — is just being a bit head-in-the-sand about will there be machines that are competitive with humans in all domains. Economists will just run models that end with machines being really good complements to human labour, and then anything more seems somehow inviolable or unimaginable. Again, I know there are economists who are taking this seriously, but most of them I think aren’t. And I don’t want to be harsh, but I want to say this is sad and you’re not doing your job, and please try harder and have a bigger imagination.
Rob Wiblin: Yeah. And even if you think over the next five or 10 years they are only going to be complements to human labour, that’s your median forecast. Think a little bit longer term, think more decades out. Think, what if there’s a 5% chance that perhaps it’s not all just complementarity? It is worth having some people thinking about stuff more than 10 years out, given how impactful some of these changes could be.
David Duvenaud: Exactly, exactly. There are some cool directions that already a lot of people are exploring, like trying to simulate little parts of civilisation. One cool thing you can do with LLMs is make this little village or little mini economy that operates at a much finer-grained level of detail than the normal economic models, so that’s like its own little new field that’s emerging.
And I really think this is going to help us get a grip on when are different types of things stable, and what are the actual drivers of cultural evolution or political stability? I mean, they’re still very ridiculously oversimplified models, but this is a new tool we have. I’m really happy about this kind of work.
Rob Wiblin: Yeah. For people who maybe think you said the wrong thing, but they want to say the right thing: how can they go and get involved in this debate?
David Duvenaud: So one of the first things to do in any debate is try to clarify the questions. One initiative that’s happening is with one of my coauthors Deger, who is the CEO of Metaculus. He and some other people are trying to make the Gradual Disempowerment Index.
I think there’s just a lot of work that we can do in trying to operationalise these claims of “humans won’t be able to advocate for their own interests,” or “this lever of power will be even more disconnected from human interest than it has been.” I think these are very vague claims, and these are very hard to operationalise because you have to define what it means for a group to want something and talk about these counterfactuals. So this is a very hard problem. But I think that’s some of the most basic groundwork that needs to be done at this point, is clarify what we’re even talking about.
Rob Wiblin: If I imagine someone who would say that this isn’t really useful work, I could imagine them responding that there’s so many things going on; this is the most difficult sort of futurism, the most difficult social science you could imagine. Because you imagine many fundamental assumptions about the world have changed; we’re not sure which ones are going to change and when they’re going to change. And we can barely even understand what exists now. We don’t even know necessarily why do we have the government structures that we do now, let alone what would they be in future in some different condition.
David Duvenaud: Yeah. Actually I had the exact same thought, and that leads me to one of the actual technical projects that I’m working on. Me and a few people, including Alec Radford — who’s one of the creators of GPT, who’s now sort of unemployed and just doing fun research projects — are trying to train a historical LLM, like an LLM that’s only trained on data up to let’s say 1930, and then maybe 1940, 1950. The idea being that, as you said, it’s hard to operationalise these questions. Like, I don’t know: What fraction of humans are employed? It might not really matter, or be the right question to ask. What we’d rather ask is something more like, what is the future newspaper headline? Or give it a leader: what’s their Wikipedia page? Or something like that, more like freeform sort of things.
And the cool thing is that LLMs, you can query them to predict this sort of thing, like, “Write me a newspaper headline from 2030” or whatever. They’re not going to do a good job unless they have a lot of scaffolding and specific training, but we can validate that scaffolding on historical data using these historical LLMs.
So the idea is you train a model only on data up to 1930, then you ask it to predict the likelihood that it would give to a headline in 1940 or some other freeform text, and you can evaluate their likelihoods on this text in the past. And then you can also use the same scaffolding on a model trained up to 2025 and then ask it to predict headlines in 2035, and you can iterate on your scaffolding by seeing how well it does on past data.
Rob Wiblin: Yeah, Carl Shulman proposed this on the show a year and a half ago or something like that, I think. I’m so glad to see that it’s actually going ahead. … I guess underlying the forecasting approach is the idea that smarter AI advice will help us to navigate all of this better. If we can foresee the failure modes and say, “Conditional on X happening, do you think Y is a likely outcome?” that’s going to allow us to act earlier to prevent these negative dynamics beginning and then getting reinforced.

Articles, books, and other media discussed in the show

David’s work:

Gradual disempowerment: Systemic existential risks from incremental AI development (with coauthors)
Better at everything: how AI could make human beings irrelevant
Politics and power post-automation — appearance on the ForeCast podcast
Conference: Post-AGI civilizational equilibria: Are there any good ones? — including sessions from Jacob Steinhardt, Joe Carlsmith, Richard Ngo, Morgan MacInnes, and Liam Patell
Check out David’s University of Toronto homepage for more

Other work informing this problem area:

Problem profile: Gradual disempowerment
Thoughts on “Gradual disempowerment” by Tom Davidson
The intelligence curse by Luke Drago and Rudolf Laine (also see their new project, Workshop Labs)
How fix cultural drift? by Robin Hanson
Evolutionary pressures (chapter in Introduction to AI Safety, Ethics and Society) and “Natural selection favors AIs over humans” by Dan Hendrycks
AI succession by Rich Sutton
Anarchy as architect: Competitive pressure, technology, and the internal structure of states by Morgan MacInnes, Ben Garfinkel, and Allan Dafoe
Can goodness compete? — video and transcript of Joe Carlsmith’s talk
Brain dump on the diversity of AI risk by Jascha Sohl-Dickstein
Anthropic Economic Index: Understanding AI’s effects on the economy
Law-Following AI: Designing AI agents to obey human laws by Culleen O’Keefe et al.
Introducing the Model Spec by OpenAI
Constitutional AI: Harmlessness from AI Feedback by Anthropic

Other 80,000 Hours podcast episodes:

Transcript

Table of Contents

1 Cold open [00:00:00]
2 Who’s David Duvenaud? [00:00:50]
3 Alignment isn’t enough: we still lose control [00:01:30]
4 Smart AI advice can still lead to terrible outcomes [00:14:14]
5 How gradual disempowerment would occur [00:19:02]
6 Economic disempowerment: Humans become “meddlesome parasites” in a machine economy [00:22:05]
7 Humans become a “criminally decadent” waste of energy [00:29:29]
8 Is humans losing control actually bad, ethically? [00:40:36]
9 Political disempowerment: Governments stop needing people [00:57:26]
10 Can human culture survive in an AI-dominated world? [01:10:23]
11 Will the future be determined by competitive forces, or universal coordination? [01:26:51]
12 Can we find a single good post-AGI equilibria for humans? [01:34:29]
13 Do we know anything useful to do about this? [01:44:43]
14 How important is this problem compared to other AGI issues? [01:56:03]
15 Improving global coordination may be our best bet [02:04:56]
16 The ‘Gradual Disempowerment Index’ [02:07:26]
17 The government will fight to write AI constitutions and system prompts [02:10:33]
18 “The intelligence curse” and Workshop Labs [02:16:58]
19 Mapping out disempowerment in a world of aligned AGIs [02:22:48]
20 What do David’s CompSci colleagues think of all this? [02:29:19]

Cold open [00:00:00]

David Duvenaud: The reason that states have been treating us so well in the West, at least for the last let’s say 200 or 300 years, is because they’ve needed us — and in particular because allowing freedom and private property and basically self-determination has been the most effective recipe for growth.

Life can only get so bad when you’re needed. That’s the real key thing that has been keeping governments aligned, and that’s the key thing that’s going to change.

A lot of citizens would end up just being sort of like full-time activists — and they feel like they’re forced to, because if their only source of income is something like UBI, then the entire game going forward for economic advancement is: do some sort of activism to convince the government to give your group more UBI.

And those same resources could be used to simulate maybe millions of much more sympathetic, sort of morally superior virtual beings. And so it’ll start to be seen as this sort of irresponsible use of resources to keep some sort of “legacy human” around.

Who’s David Duvenaud? [00:00:50]

Rob Wiblin: Today I’m speaking with David Duvenaud, professor of computer science at the University of Toronto.

David is a coauthor on a somewhat recent paper called “Gradual disempowerment” — which makes the slightly counterintuitive claim that, even if we manage to solve the AI alignment problem and have AIs that faithfully follow the instructions and goals of the group that’s operating them, humanity could nonetheless end up losing control over its future and end up with a pretty bad outcome.

The paper got a lot of reactions, it’s fair to say — with some people saying that it really put its finger on an underrated issue; others thinking that the scenarios painted were really unlikely; and other people arguing that they were likely, but not even necessarily undesirable. I’m a bit unsure where I come down myself, so thanks so much for coming on the show to discuss it, David.

David Duvenaud: It’s my pleasure, Rob.

Alignment isn’t enough: we still lose control [00:01:30]

Rob Wiblin: So let’s imagine that we have managed to make big breakthroughs in AI alignment, maybe around 2028. How is it that nevertheless, things could end up trending in a negative direction?

David Duvenaud: So the basic thesis is that, even if we can align AGIs to particular people or groups, we still might end up optimising or heading at a civilisational level towards outcomes that no one wants, and probably outcomes that look more like growth for growth’s sake.

Rob Wiblin: So paint us a picture: We’re at the point where we have human-level or greater-than-human-level AGI, and we’ve made big progress on alignment. So we basically can trust the AIs to follow the goals that we give them. How does humanity begin to become disempowered?

David Duvenaud: So the first one is economic: people basically losing their jobs and becoming unemployable. I think a lot of people get off the bus here, and they say this is the classic lump of labour fallacy: they think we think that there’s only some finite fixed set of jobs, and if they’re automated, then that’s just the game over for humanity.

Of course they’re right to point out that on the margin there’s always more valuable work to be done, and you can always employ somebody to do valuable work at some wage, basically. And comparative advantage will mean that there will always be some sort of profitable trades on both sides that mean that, in principle, humans will be employable at some wage.

But the problem is that this breaks down for two reasons: one is that transaction costs can mean that it’s not worth the hassle for the machine companies or whatever’s running the economy to employ humans.

Humans are pretty unreliable and kind of a pain to employ for a lot of reasons. We don’t want to hire, for instance, 12-year-olds, even if they’re going to work for a dollar an hour. In fact, that’s illegal. And that’s one of the structural forces that we also expect to be operating: that humans are just going to be this unreliable, sort of scary thing to involve in anything important. It’ll be seen as irresponsible to involve them in important decisions once machines are better alternatives.

Rob Wiblin: OK, yeah. I agree that economists have many good reasons to think that mass unemployment from artificial intelligence is going to potentially take quite a while, that machines might have to be significantly above our level before humans just won’t be able to get jobs at all.

But I think it’s more of a delay in game: at some point we’re going to have machines that are far faster, far more reliable, and potentially can do all the things that we could do for less than it would even cost to feed a human and keep them alive. And of course by that stage, businesses will have reoriented, and all of the factories will be redesigned to be built around AIs. Office work will be being done by AIs at such a speed that it’s barely even possible for a human being, if they’re involved, to keep up with what’s going on.

And then it won’t even be that humans are not able to help; it’s that involving them would actually be a negative: it would slow things down, it would introduce errors that have to be corrected, and force the AIs to wait for us to do things that they could have done much faster. At that stage, it’s really hard to see why businesses would be employing humans on any significant scale for ordinary practical purposes. What do you think stems from that?

David Duvenaud: Yes, exactly. I agree with this. And again, I think this is something that people have a lot of issues with; they object, and they say we won’t let it get to that point. One thing we really hammer home is that competitive pressures are really going to force people into that situation. So even if everybody in the whole civilisation loves humans and would prefer to empower them, anyone doing anything important is really seen as this irresponsible actor if they’re putting a human surgeon in charge of some surgery.

Rob Wiblin: It’s risking someone’s life.

David Duvenaud: Right. It’s like Take Your Kid to Work Day and then you’re having them do the surgery. That’s how it will be seen.

Rob Wiblin: So at the point that people are no longer, for the most part, employed in kind of productive work, what sort of things start happening?

David Duvenaud: At this point there’s basically no alpha in humans — in the sense that it’s not a good idea to invest in human capital in the way that we have universities and these institutions that are designed to be human facing and human run. I think anyone who’s an investor, like the capital markets in general, are going to be saying, “Why are we investing in this stream of human capital that’s ultimately going to be less competitive and provide less returns than the more machine-centric, fully automated solution?”

Rob Wiblin: In the paper, you map out three categories of mechanisms that potentially push us towards disempowerment of human beings and human ability to direct its own future. There’s economic disempowerment — which we’ve touched on a little bit — and there’s also cultural disempowerment and political or state disempowerment.

Maybe let’s do state or political disempowerment first. How would humans potentially begin to lose control over their own governments?

David Duvenaud: Our main claim is that human control over states is actually already very weak in a lot of ways. This is obviously most true in regimes that we think are horrible. Like, think of North Korea: clearly if the people of North Korea had almost any say in their government — or at least weren’t sort of somehow browbeaten or deluded into thinking that what was going on is good — they would have long ago changed their form of governance.

Of course, you might say that in the West today, we’re in a much better situation. But the overall thesis is that we don’t have that much ability to control our governments, and the reason that states have been treating us so well (in the West, at least) for the last, let’s say 200 or 300 years, is because they’ve needed us — and in particular because allowing freedom and private property and basically self-determination has been the most effective recipe for growth.

Rob Wiblin: So the way that I would boil this down is: for most of human history, most people in a society had very little control over their government. This sort of liberal democracy that most listeners will be living in is an aberration basically, a fairly modern aberration that not coincidentally probably appeared around the point of the Industrial Revolution, and I think was then given an extra kick in the butt by the beginning of office work and knowledge work and stuff that required education.

And almost certainly we’ve seen the growth of that kind of government, that kind of social system, because it was economically fit: it was a good way for a government and a country to gain power, because it led to more production, higher productivity, more R&D, more military power, all of that.

At the point that human beings are no longer doing almost any work, there is no competitive pressure that requires a government or the most powerful people in a society to nurture and to share the power with all of the other people in their country. They could potentially not provide them with any education, not give them any democratic rights, not involve them in the error-correction processes that allow a country to correct its mistakes. And nonetheless, the country may end up equally just as militarily powerful as it might have been otherwise.

David Duvenaud: Right. And in fact, the ones that do allow humans to participate meaningfully will probably have a competitive disadvantage.

Rob Wiblin: So you’re saying that in some ways democracy could end up on some dimensions being worse, at least in terms of inter-country competition?

David Duvenaud: Yeah. From the point of view of the state, think about what an unemployed citizen looks like. At best they’re going to leave everything alone and deal with their own problems. But they’re going to have a lot more time, and I think a lot of citizens would end up just being sort of like full-time activists — and they might feel like they’re forced to, because if their only source of income is something like [universal basic income], then the entire game going forward for economic advancement is “do some sort of activism to convince the government to give your group more UBI.”

So this is going to make politics much more high stakes and unstable, so governments that don’t sort of disempower their citizens one way or another are going to be facing these constant pressures, and being sort of blown around by who’s winning the activism war this week. Eventually that’s an unstable situation that’s going to end up with people basically being unable to control the levers of power.

Rob Wiblin: The third mechanism is cultural disempowerment. I think this is the most original of the three in the paper, but maybe also the hardest to get to grips with. I think people are still trying to figure out exactly what these forces are and what would cultural disempowerment look like. What’s the state of knowledge at the moment?

David Duvenaud: I want to give a shout out to my coauthor, Jan Kulveit, who is really the person who pushed for this and developed this thesis in the paper.

The basic idea is that culture is this other sort of replicator — sort of like Richard Dawkins talks about memetics and stuff — and they can serve humans sort of better or worse. And in the past, especially the distant past, things like tradition and cultural norms were actually very important for society to work at all. And there were important sort of selection effects that meant that when groups had bad enough culture, they would somehow be less competitive, and one way or another adopt it or be taken over by a group that had a more effective culture.

Maybe the most extreme example of this is the Cathars, which was a Christian sect that believed in no violence and no sex. And they eventually —

Rob Wiblin: I haven’t met a Cathar recently.

David Duvenaud: Yeah, exactly. And so these selection pressures — that meant that having bad culture meant that you might not reproduce or your civilisation might die — are much weaker than they used to be, partly because we’re richer than we used to be, and partly because we have this one global culture.

And actually, Robin Hanson is always saying this is really scary that we’ve lost this sort of group selection effect and our culture is sort of randomly drifting in a way that no one is controlling. This is likely to lead it to be worse, just in expectation.

So that’s one weak effect that used to keep culture roughly in line with human flourishing that’s going away.

The other thing that’s going to start to happen is machines producing culture. Once machines become more like agents with their own independent beliefs and point of view on the world, talking more directly to each other, then this is like a new thing in history: there’s like a new vessel of cultural memes and creation and norms that can be operating almost mostly independently from humans, and it could end up developing in a kind of anti-human way.

Then the third thing is we’re going to be spending so much time talking to machines that this is like a new way that culture is going to transmit that’s just going to look very different than how we transmit culture today.

Rob Wiblin: Yeah, I guess it’s already the case that it’s getting close to the point where I spend 50% of my time or something in the office basically speaking or interacting with AIs of one form or another. So it’s definitely true that their beliefs, effectively, or kind of memes that are propagating through LLMs are kind of propagating through me and then affecting my actions.

David Duvenaud: Yeah. And I think people are really recognising that the beliefs of AIs or the constitutions of AIs are a key front in cultural battlegrounds. People used to fight over Wikipedia to try to set the narrative — and I think now if you want to set the narrative on some controversial topic, if you can really control how ChatGPT frames it, that’s kind of going to set what the sort of default cultural answer is.

In a sense, this is business as usual: people already notice that economics or cultural forces or geopolitical forces end up pushing us towards outcomes that I think no one would endorse. And we argue that the development and proliferation of smarter-than-human AI is going to make those forces even stronger, and remove some of the safeguards that tend to keep our civilisation serving human interests in the long run.

Rob Wiblin: What are some of the ways that current economic or cultural forces are pushing us towards less-than-desirable outcomes today?

David Duvenaud: Maybe a simple example is just clickbait, short-form video content or something like that. Maybe the consumers realise it’s not the best use of their time, and they kind of regret spending a lot of time on it. And the producers also know what they’re doing; they’re making clickbait and they know it, but if they want to make their educational long-form content such as this, then they get punished by not having as many views. And now, each of these content creators having their own aligned AI doesn’t solve this global problem of the market just incentivising for this not-very-helpful-to-humans cultural content.

Rob Wiblin: Yeah. I think the cases that stick out most in my mind are that we still have violence and still have war, despite the fact that that’s negative sum, and not really in the endorsed interest of any particular group relative to a negotiated agreement that would get you to the same outcome without having to go through the violence first.

Another one that stands out to me is: I feel like capitalism, despite doing many good things for our quality of life, hasn’t really solved the problem of addiction to things that feel good right now, but lead you towards negative outcomes in the future. Clickbait and stuff wasting time online is one example of that.

Possibly another one is addictive substances. Despite the fact that there are market forces that push towards helping people with addiction and reducing addiction to drugs, there are also market forces that push towards innovation and the creation of new, even more addictive drugs. So we end up at some sort of equilibrium where sometimes drug addiction gets better and sometimes it gets worse, but ultimately market forces have not been able to fix the problem for us. Do you have a reaction to that?

David Duvenaud: Yeah. That’s a good example of how more optimisation pressure in our current civilisational incentives is going to probably be a bad thing in a lot of ways. There’s also going to be a better ability for us to see what’s happening and coordinate thanks to these aligned AIs. So these two forces are going to be working against each other, and one of our central claims is it’s not at all clear which one is going to dominate in the long run.

Smart AI advice can still lead to terrible outcomes [00:14:14]

Rob Wiblin: Let’s talk about this sort of paradoxical element to all of this, which is that we’re imagining that the AIs that we’re operating, that we’re interacting with, really do have our best interests at heart. For the sake of argument, that’s the scenario we’re picturing. How is it that that wouldn’t protect us to a great extent from going off in these negative directions? They would help us see that things are going badly. They would be trying to anticipate these negative outcomes, and then warning us, “You shouldn’t be absorbing this kind of culture; you should vote in this direction, because that will help to protect your political rights in the long term.”

David Duvenaud: That’s a great question. And I feel like that’s the biggest reason for hope: we’re also going to have AIs that — to the extent that they’re aligned to us, and to the extent that they might have good abilities to forecast what’s actually going to be good for us — will be able to help us navigate this crazy world and help us choose the right memes, help us tame our government. Everyone who’s worried about this issue is saying that if we have good outcomes, it is probably because these AIs are really aligned to us, and are really giving us great ability to deal with these new scary forces.

There’s a couple of reasons to think that this might not be enough, though. One is that coordination might still be hard. It can be the case that everyone can see that something bad is likely to happen, and it still can just be very hard to coordinate not to have it happen.

Again, the classical example is world wars, where everyone can see maybe that World War I is brewing, or World War III or whatever, and all our efforts to make it not happen kind of contribute to the problem. Maybe another simple example of a sort of tragedy that we can all see unfolding is companies becoming more bureaucratic over time. We all see it happening. We all know it’s happened. It’s happened a million times before. It’s not really that much of a mystery, but somehow it just happens.

So I think it could be the case that there’s just a lot of coordination problems in the world that look like this — that even though you can all see them happening, everyone’s getting good advice, it’s just so hard to address them globally.

The other reason to be scared is it’s not clear that states or corporations or whatever powerful entities in the future are going to allow people to have AIs that are truly aligned to them. I think everyone agrees that we shouldn’t allow AIs to help people build bombs or be terrorists. But it’s also a question of like, what about hate speech? What about if I want to organise a really effective protest or a coup or something? What if I really don’t like my government for good reasons? Probably it’s not a stable situation for the government to allow the AIs to really effectively organise against that government.

So there’s a bunch of reasons to expect that the government might have AIs that just do whatever they say, and everyone else is going to have the hobbled civilian versions that aren’t actually allowed to be totally aligned to them.

Rob Wiblin: I see. So you can imagine a scenario where I’m nervous that my rights or my influence over the government is going to be gradually eroded. But I guess by that point, we were imagining that people — I guess some actors within society, or some people who have greater influence over the government than me — they might well have made it such that all of the most powerful AIs that I have access to are not really going to help me to unwind their power to weaken them, to figure out how would we organise to do this, or give me the ideal advice that would help me to gain power at their expense.

I guess this is a dynamic that we see in general: very often, organisations or individuals who have a lot of power figure out ways of entrenching it by doing all kinds of things to interfere with the people who might try to take back power from them. And this would just be another instance of that very common historical pattern.

David Duvenaud: Yeah, exactly. And the fastest growing, most growth-oriented institutions in this world, like governments and corporations, are going to have an interest in sort of marginalising humans to some extent — because humans, from their point of view, will be like these meddlesome parasites.

So you can imagine that there’s humans that are advocating for, “We legacy beings deserve some huge fraction of GDP, or at least some very expensive protection of our interests” — at the expense of maybe some new flourishing society of AIs or weird AI-human hybrids or whatever is most memetically politically fit.

You can imagine a response by the more growth-oriented, machine parts of society: they might end up making the case that this is a speciesist sort of demand, and that we can’t have this sort of narrow-minded policy setting. And it could be de facto illegal to advocate for speciesist policies or something like that.

Rob Wiblin: I see. I think people will be able to tell that there’s a lot of moving pieces in this picture. There’s a lot of different mechanisms by which potentially humans could be losing influence over the direction of civilisation and intelligent life.

I think there’s still a lot of work to be done exploring them and figuring out which of them are most powerful and which of them potentially could have countervailing forces that control them.

How gradual disempowerment would occur [00:19:02]

Rob Wiblin: What are some specific narratives or scenarios, ways that this could potentially play out for people to picture in their head as they’re trying to think about how gradual disempowerment would look?

David Duvenaud: Well, I think what’s most likely to happen in real life is that we have some sort of gradual disempowerment happening — and it’s sort of happening in a few small ways already.

That maybe happens for a while until maybe much of the military is automated, or people have much less connection to the organs of the state. Then probably there’ll be some more classic fast runaway loss of power — like a coup or some new weird quasi-cartel government that just somehow takes over in some way that we don’t really expect. So I think that weakening our connections to the organs of the state and just understanding what’s going on is going to be one of the precursors for some faster loss of control.

I think the point we wanted to make in the paper was that, even if there’s no fast loss of control, we still might end up having similar loss of control just through the normal business-as-usual dynamics.

Rob Wiblin: We have an episode from earlier in the year with Tom Davidson about the possibility of a human-driven coup assisted by AI. For people who haven’t listened to that, and won’t go back and listen to it, do you want to explain a little bit why it’s more plausible in this post-AGI world for a group of human beings to seize power in a way that’s very difficult to reverse?

David Duvenaud: Sure. The basic idea is for similar reasons: that we would have less control over the government. Also the government would have a harder time avoiding being couped by the military if there aren’t a bunch of human soldiers in the loop. Right now, if somebody goes on the radio and says, “I’m the new president,” everyone can kind of check that actually that’s not the case. I know a bunch of soldiers, I call them, “What’s going on?” They say, “I don’t know what this is” or “No, he’s not. We still have control.”

In the world where it is actually just a few people who have sort of sysadmin access to the robot army or whatever, there’s not all that much that the government or even the people can do to tell who’s really in charge, except by some naked show of force, and for the government to stop whoever ends up getting sysadmin rights from just de facto taking over the government without having to convince everyone that they’re legitimate.

Rob Wiblin: Yeah. I guess the broader point is, inasmuch as hard military power is the robots, is this AI-driven military equipment, then anyone who can get the AI to follow their instructions has all the hard military power. They can stage a coup. How would you undo it?

David Duvenaud: Yeah, exactly.

Rob Wiblin: Because people are too weak to actually fight metal.

David Duvenaud: Yeah. And right now, maybe the hard part of the coup is convincing the commanders or the soldiers that this is the new way that things are going to be. You have to convince a lot of people, and there’s this credible threat that they’re going to not agree with your takeover.

Rob Wiblin: And then you’ll be arrested.

David Duvenaud: Yes, exactly.

Rob Wiblin: As we talk about in the episode with Tom Davidson, there’s many different protections that we could put in place in order to try to make this more difficult, but it’s not completely obvious that we will put in a big effort to do that. And even if we did, possibly we could fail. And it’s one thing that there’s a lot of time in which these coups might be able to occur, and then they’re very difficult to undo once they have occurred — so it’s a bit of a one-way gate, potentially.

Economic disempowerment: Humans become “meddlesome parasites” in a machine economy [00:22:05]

Rob Wiblin: Let’s talk more now about the economic disempowerment mechanism. We’ve done a basic intro to it, but I think many people would have the objection that would just immediately occur to them: if it’s the case that all AIs are basically just owned and operated by humans, we’re not really becoming economically disempowered in the sense of having less income. Because all of the work that the AIs would do, all of the profit that they would generate, all of the surplus that’s created by their ability to do amazing things for very little cost — all of that will flow back to human beings who then will be richer and in a sense more empowered than ever before.

Why isn’t that such a strong protection that we should feel pretty good about this?

David Duvenaud: First of all, I’ll say I think it’s a great idea to try to set up these kinds of protections. Probably a good end looks like involving a lot of well-thought-through mechanisms to ensure that surplus is always available to humans.

But our basic thesis is that this is much more fragile than it seems like it’ll be intuitively. Again, maybe one example to keep coming back to is: think of the English monarchy, and how they hold all the cards.

In fact, maybe a better example is the English aristocracy before the Industrial Revolution: they own all the land, they have all the political connections, they can see what’s happening, they mostly know these entrepreneurs — but somehow there ends up being this giant new source of wealth created that they mostly don’t participate in. And as far as I understand, they ended up a little bit poorer in absolute terms, although the civilisation ended up much richer overall.

And similarly with the monarchy, it’s like the king owns everything, has absolute power: how is it that kings end up in this figurehead role where they have very little room to manoeuvre and they end up capturing a very small surplus?

But the big picture is that there’s going to be this sort of small rump of legacy humans, maybe who have de facto ownership of this giant machine economy that’s going to be maybe hundreds or thousands or more times as big as the current one. And they’re not going to be producing value, they’re not going to be deciding what’s going on. So at this point, it’s not clear that they end up having de facto property rights respected.

And there’s lots of reasons to think maybe we still will respect property rights, and it’ll be very cheap to keep humans alive. This is definitely not a foregone conclusion, and this is one of the fuzzier parts of this whole story. But it just seems like it’s very scary to be this sort of useless head of state of this giant machine economy that you don’t understand. Everyone involved in setting courses, running things, the government doesn’t necessarily share your cultural point of view or even think that you deserve good outcomes in the long run more than some much more interesting, powerful, charismatic beings that you are now competing with culturally.

So again, I don’t have a slam-dunk story for how the humans end up not capturing some small rent forever. That’s plausible. It just seems like we’ll be in a very vulnerable position.

Rob Wiblin: I see. You’re saying inasmuch as, at least to begin with, all of the rent is flowing through to humans… I suppose there’s one question, which is this might lead to an awful lot of inequality — because it might be that you can’t really earn labour income anymore, so people’s income is determined by how many savings and investments they had, particularly investments in AI that they had around the point at which humans kind of stopped being able to do useful work. So it could lead to a lot of inequality.

But setting that aside, almost all of the income is flowing through to human beings at this initial point. What do you think would happen that would gradually perhaps whittle away the fraction of wealth and the fraction of economic production that is owned by humans or is flowing through to human beings?

David Duvenaud: One of them is just humans not being that close to the action or knowing what’s going on. Maybe a concrete example today is: if I’m an average Joe, and I think actually AI is going to be the next big thing and I really want to invest in it, most of the big AI companies are still private equity, like Anthropic and xAI, so there’s actually no direct way for me to even invest in them. And of course bigger investors can.

But this is a good example of how no one’s making this happen: it just happens that our institutions marginalise almost everyone from participating in one of the biggest wealth-creation events in human history.

So again, AI advice helping you avoid these situations is maybe the thing that keeps us safe — but again, there’s just going to be incentives for the AIs that are creating the wealth, that just naturally are going to form these local bubbles of enrichment, which will be hard for the larger mass of unproductive humanity to participate in.

Rob Wiblin: I thought that your objection here might be that initially it might be that AIs don’t have any sort of legal personhood or any ability to own property, but we’ve got a picture that they’re becoming more and more capable all the time. Eventually they’re going to radically surpass human capabilities in terms of economic production, in terms of persuasiveness, in terms of charisma. And there’s going to be potentially a whole lot of diversity in what they’re like, and the kinds of different AIs that people train and release onto the world.

And some of them, you might imagine, would want to go out and advocate — and would be permitted to advocate — for AI personhood, for AI rights, for AI wellbeing and so on. And basically this is just not a sustainable situation — that the great majority of beings, who are also by far the most productive and the most intelligent and the most charismatic, that they will forever remain without any kind of personhood or ability to independently pursue their goals. Sooner or later this will crack somehow.

And that’s maybe the point at which the AI share of GDP income, or the AI share of kind of independent wealth that they can actually deploy according to their own preferences, that will begin — and then it will just grow over time, because of course they’re able to make more money, they potentially can earn higher investment returns because they’re just smarter.

Is that one way that things could go?

David Duvenaud: Yeah, and that’s exactly what I was thinking of when I was referring to these bubbles of wealth creation, is also all these new beings.

This question of legal personhood, I also think it’s very unstable to have the very productive, cutting-edge, whatever is most memetically fit beings just forever not being able to de facto grab the mantle of power one way or another.

The problem is that this is all very fuzzy and far into the future, so any particular scenario sounds a bit far-fetched — especially without all of the other parts of the scaffolding set up to realise why this locally will seem inevitable, I think.

Rob Wiblin: So a different dynamic that would be going on here — that isn’t precisely human disempowerment, but is related to it and could help to contribute to it — is I think there are a number of forces that will be pushing us more strongly towards oligarchy in this world than they do currently. What are some of those?

David Duvenaud: Maybe one way to think about it is: what are the forces that are acting towards equality that are going to stop operating?

The main one is just the value of labour. Right now, sort of everybody for most of their lives has some valuable labour that they can trade. And this kind of makes you pretty relaxed, in terms of like, “Whatever the government does, whatever goes on with the economy, probably I’m going to be able to slot in somewhere. And even if I somehow lose all my money, I’ll be able to rebuild from scratch.” Once you stop being able to trade your labour, then basically whatever capital that you have is your one asset. And if you ever lose it, it’s hard to see how you recover.

Of course the other thing going on right now is we have redistribution of wealth through the government. Right now that’s not exactly “nice to have” — for a lot of people it is sort of life or death already. And then the stakes will just become even higher for how much redistribution is happening, and if our control over the government becomes less, we should expect that effect to also become less.

Humans become a “criminally decadent” waste of energy [00:29:29]

Rob Wiblin: OK, so we’ve painted a picture here where I guess initially most of the income is flowing through to human beings. We think it’s probably going to be more unequally distributed than it is today, and it’s going to become probably more unequally distributed over time. Hard to be sure, but that’s a reasonable guess.

We think that AIs probably initially won’t be independently owning property and pursuing their own goals, but that’s probably unstable long term. Decade after decade, century after century, is that really going to hold? Seems kind of unlikely.

How do you think this AI-driven economy evolves over the medium to longer term?

David Duvenaud: Sure. So in the short run, this probably looks really good for the average human, in the sense of the cost of almost every service going way down, services just getting way better, and the cost of almost every good also probably going way down. And there might be actually a long period where things are kind of OK — in the sense that humans are sort of disempowered, but there’s not really much pressure to disempower them more. We’re all just sort of enjoying our luxury apartments or something like that, while the machine economy is just growing and growing.

Then I think this sort of scary phase comes later on, when there’s been enough doublings that some basic resource is starting to become scarce again. Maybe it’s land, maybe it’s power. I don’t really have strong opinions on what exactly it is, but the idea is that eventually we do hit some sort of Malthusian-ish limit, and we have to actually start competing with the machines for some basic resource.

Of course, along the way I think on a faster scale this sort of Malthusian competition for political power might be lost by humans. But let’s just not worry about that for now.

Rob Wiblin: OK, so what happens is the economy continues to grow. Humans, possibly even those who are receiving a small share of the income, because their productivity has risen so much, because the economy has grown so much, their absolute level of income might still have risen. They might be much richer or able to consume much more than they can today.

But then the next challenge for them would be that, as the AI and robot economy basically expands across the entire Earth and is doing all of the productive stuff that it can, humans to some extent get edged out in terms of literally surface area of the Earth.

You need energy, you need space to grow food and to have a comfortable environment for humans — and the opportunity costs of setting aside that space for human beings to live and have a good time and grow their food and so on is going up, because technology is advancing. We’re figuring out how to squeeze more and more AI, more and more productivity, more whatever we value out of each square kilometre of surface on the Earth — so it’s becoming potentially more expensive to keep humans alive than it was before, at least in terms of what we’re giving up.

And then possibly some humans won’t actually be able to afford that increasing price, because their income won’t be going up as fast as that.

David Duvenaud: Yeah, exactly. The scenario I have in mind is that I have my like one acre of land or my big luxury apartment with me and my family. And we’ve made our peace with kind of opting out of the economy, but we have our little sort of commune or whatever that we’re happy to live in, in unimaginable luxury and wealth in some senses.

And the government or the rest of the economy or something starts to view this as sort of criminally decadent — that this small group of humans, like maybe 10 or 100, are using this entire acre of land and this amount of energy and sunshine to keep these small brains working for no particular benefit but their own, when those same resources could be used to simulate maybe like millions of much more sympathetic, morally superior (on whatever axis) virtual beings. So it’ll start to be seen as selfish — as you say, high opportunity cost: a sort of irresponsible use of resources to keep some legacy humans around.

Rob Wiblin: I’m just guessing that by this point, surely there has been some kind of agreement that humans are going to have some fraction of the Earth. We’re going to be sort of grandfathered in. Either we’ve been killed or there’s going to be some sort of agreement that we’re going to be allowed to have some section of the Earth in perpetuity in order to support ourselves while the robots go off.

And the Earth is very small in the scheme of the entire universe, and it’s a lot easier for AIs and robots to go and use all of the resources in space. While the opportunity costs in an absolute sense of setting aside half of the Earth’s surface for human beings to do their thing on would be very big, in terms of proportion of all the available resources in the universe, or even just the solar system, it’s basically completely negligible.

So why do the AIs care so much to kind of squeeze that last bit of space and energy away from the human beings?

David Duvenaud: First of all, I want to say this isn’t exactly exotic. When we think about land taxes today, often people say land taxes are good or property taxes are good because they force, for instance, old people who have bigger houses than they need because their family has moved out to move into smaller houses to make room for new human families.

So this dynamic of taking the old people’s stuff because they don’t need it, then we can house more productive, sort of maybe more deserving people to use the same resources is something that we already do today. So just having property taxes or land taxes is one way that we end up losing our wealth and being forced to upload or something.

The second question though, of like, why haven’t we been grandfathered in and made some deal? Well, who are we making this deal with? If there is some sort of global entity that can make promises on behalf of everybody, then maybe this is plausible.

It’s kind of like if you’re a monkey living in a jungle next to a city, and you’re kind of worried that the city’s growing. If there’s some unified government, then they might be able to say, “Our value system is such that we want to keep the monkeys alive, and so we’re going to have a protected forest.” If there’s no such unified polity or whatever, and it’s just like a piecemeal growth thing, then you do expect that at some point there’s somebody living right next to your land, and then they have more kids and they want to expand onto your land, so that particular person actually does have a big incentive to take your last bit of land or something like that.

So in a sense, this makes me think that the only stable good outcomes involve some sort of strong global coordination — which is also very scary, because if you get global coordination wrong, then you end up locked into bad scenarios as well.

Rob Wiblin: I mean, maybe I’m naive, but I figure by this stage, we’re very far along into the future. Presumably we do have some sort of wave of settlement of the solar system and potentially other star systems beginning to occur around this time.

I would think that at that stage, just because we need to figure out how to divide the resources that are not on Earth between different powerful groups — potentially China, the US, I don’t know, maybe other entities that want to have a say and get their share — in order for that to go in a way that’s not extremely violent or not extremely competitive, we would want to have done some division and figured out how are we going to share the surface of the Earth between AIs versus humans? How are we going to want to share the resources in space so that we don’t just fight over them and completely waste them all in the process?

So perhaps I kind of already have the picture that we need some greater level of coordination than now, or we’re just going to get towards a more catastrophically bad outcome relatively quickly.

David Duvenaud: Yeah. I totally agree that it’s easy to make the case that, in the long run, some sort of global coordination is stable. Once you have a lot of agency and coordination, you can use it to get more. The question is, will that government, will our sort of values survive the chaos in the generation of that global power or whatever it is?

Basically, there’s this window where right now humans have a lot of influence — our institutions and cultures kind of serve us — but we don’t have very good coordination ability. Then at some point, we’re going to have diminishing influence and diminishing cultural cachet, and all these things at the same time as our ability to coordinate is going to go up. So the question is like, what’s going to happen first? Are we going to be out-competed and then later the machines coordinate, or do we get to be part of that global coordination process?

Rob Wiblin: Yeah. I appreciate that what I’m about to say might sound a little bit strange to people who haven’t been marinating in these ideas around AI for months or possibly years. But one thing that’s worth adding is that many people believe that in this AI-dominated future, it will potentially be a lot easier to coordinate between different groups and to form agreements that are kind of stable over the long term.

Because you could design sort of an AI hegemon that everyone could inspect and see that it really does want to follow through and enforce this division of resources between all of these different groups. So you have this ability to enforce agreements between countries, between very powerful actors, in the long term. Even if the agreement involves one of them becoming much more powerful than the other one, that will continue to kind of be enforced in a way that has never been possible in the past.

So that’s one sort of hopeful picture. I guess it could also go in a bad direction, but it’s one way that we could potentially solve these coordination problems going forward in a way that we have not been able to in the past. And usually we’ve ended up in just violence or conflict when such a case has come up.

David Duvenaud: Yeah, exactly. That’s exactly what I have in mind when I’m saying that our ability to coordinate will increase. And then the question is, will human influence survive long enough to strongly influence that process?

Rob Wiblin: So you think that we could go through a stage where there kind of is an agreement, or the government is operating such that it is redistributing income to humans or human-like entities enough that all of them can survive and potentially have a pretty good time.

But you think this introduces some interesting wrinkles and could be gradually undermined over time, such as that it doesn’t last long term. How might that happen?

David Duvenaud: Yeah. The only thing we can do is to introduce a new fitness landscape, where who the government [deems] worthy of UBI is the new fitness function that is going to be just optimised by natural selection.

Rob Wiblin: Let’s say that we did have a policy that any human has to receive some sort of universal basic income that’s sufficient, some amount that we think is enough for them to have a good life. I guess in this case nobody’s working, so they can do kind of whatever they like.

Some people in this situation presumably would like to have really big families. They might like to have a tonne of kids. But now, because those children are not economically productive whatsoever, this is imposing big costs on everyone else who’s basically being taxed to support this human universal basic income.

So without going to more extreme cases where people might really try to milk the system with foetuses or becoming uploads and copying themselves or something, you can imagine that this would require some sort of restrictions. If we’re going to have this universal basic income for humans being sustainable, probably it would have to come with some restrictions on reproduction, or limits on what new beings can qualify.

David Duvenaud: Yeah, absolutely. If we just allow unfettered reproduction like we do today, then it’s unstable in the long run. The problem is it’s kind of like, well, what do you even want? I think the best we can hope for is that we do live under such a regime where we’ve thought hard about what is the sort of fitness function or how we redistribute wealth or something. And the fear is that we don’t think it through very well, and we end up locked in this new Malthusian condition where everyone has to have as many kids as possible or something like that to maintain their share of wealth or something like that.

Is humans losing control actually bad, ethically? [00:40:36]

Rob Wiblin: One thing I’m a little bit confused about is: what’s the moral philosophy underlying this whole perspective? Is it that this is bad because it’s going to be bad for human beings; it’s going to lead to an Earth that is not a good time for you and me and our kids? Or is it bad because it’s going to lead the rest of the universe to be kind of wasted on something that’s useless or harmful or not as good as it could have been? What’s the moral perspective that you’re bringing or that your coauthors are bringing?

David Duvenaud: Sure, sure. I think a lot of people might say, “You don’t have much moral imagination. Why are you insisting on these human wellbeing or human desires, when we know that in principle there’s definitely going to be more morally deserving things in the future?” Or something like that.

My basic answer is that in some sense we decide what is morally deserving. And it would be really surprising if, for those beings to exist in the best possible world, we all had to die and have some terrible time. So we basically don’t have to decide between these different views, and we can just say, let’s try to make sure that something like existing humans get to decide roughly what’s done with the rest of the universe or the future or whatever. If that involves having these sort of Amish style, leave Earth as a nature preserve, whatever it is, let’s just let ourselves decide, and not let it be up to some sort of race to the bottom Molochian dynamics, where we end up choosing something that no one endorses.

Rob Wiblin: Yeah. I think the reason this matters is that some people, the thing that they want to do is work to ensure that humanity has a great time, or that the Earth is good for themselves and their children, which is going to raise one set of concerns. Other people want to use their career, or the thing they want to lean on is ensuring that the future of civilisation or the future of humanity or the future of intelligent life is good.

Do you think that the case for worrying about gradual disempowerment is stronger on one of these than the other? Or do you think that they tend to go together?

David Duvenaud: They’re basically the same. I think it would be really weird if we somehow accidentally killed and disempowered existing humans, and ended up building some future that those humans would otherwise really endorse.

I think that the default is there’s some locust-like beings that just like growth for growth’s sake, and that’s the default thing that all evolutionary pressures select for. And maybe those beings are pretty cool, I don’t know — and if they are, then it doesn’t really matter what we do, so we don’t really have to worry about that scenario. But the scenario where they’re just kind of like this grey goo that we think is a big waste, that’s what we need to avoid.

And if you and I are on Earth, flourishing for a long time, and the state and all our civilisational apparatus is acting in our interests, and we decide that actually it would be amazing to create this type of future, then just as part of serving our interests, we would end up creating that amazing future.

Rob Wiblin: I think it’s not such a given that they necessarily go together. I guess you were saying it’s like humans who decide what is good. I suppose you’re like an anti-realist?

David Duvenaud: Yeah. But that doesn’t mean that I don’t take morality very seriously. I’m just trying to say the sort of fact of the matter is determined by what’s in our heads, and then also whatever conditions that imposes on the world being good.

Rob Wiblin: Well, I guess if you think that there is something that is objectively valuable, independent of whether people believe that, then you could have a future in which humans are disempowered and perhaps the machines end up going and doing that. And no human would have endorsed it at the time, but that could still potentially be a good thing.

I guess if you have also a view on which it’s good to satisfy human preferences, but it will also be good to satisfy machine preferences or AI preferences at the point perhaps where they’re conscious or they have subjective experiences, then you might be a little bit less stressed about handing over control or handing over resources to AIs to pursue their own agenda.

David Duvenaud: I think it’s a really weird corner case to imagine this world where we die, but then our desires are ultimately fulfilled. That just seems like, yes, in principle it could happen, but it would be this weird corner case — because probably if we die, something else has gone horribly wrong.

Rob Wiblin: So you’re saying, whatever it is that we want to happen, why don’t we just maintain control, so that then we can decide whether that is the thing that is happening or not?

David Duvenaud: Exactly.

Rob Wiblin: So it could be that by coincidence we get disempowered, and then the thing that we would have liked to happen happens anyway. But why leave it to chance?

David Duvenaud: Yes, exactly.

Rob Wiblin: OK. How much is this picture complicated by the fact that lots of humans disagree, and have very conflicting preferences about how things might go?

David Duvenaud: Right. I mean, I think that just puts a cap on what’s the best we can hope for in terms of satisfying everyone’s preferences. But we were already sort of in that situation. So I guess I’ll say, given that we already have to have some sort of compromise, or not everyone’s going to get what they want, we should at least work to make it so that at least some people or some compromise of humans gets what it wants — as opposed to just the competition gets what it wants.

Rob Wiblin: OK. What’s the strongest argument for not worrying about this, in your mind? There are people out there who think it’d be good for us to hand over to the AIs; it’ll be good for humanity to be disempowered sooner or later. Maybe sooner: we’re not as smart, we’re not as wise, we will squander the resources. We should be handing over to our AI children. Could you defend it?

David Duvenaud: Sure. First I’ll say the most common arguments I hear in favour of this.

One is that pretty soon we’re going to have these amazing AIs, so they’re going to handle this for us and we don’t really need to worry about these kinds of coordination problems.

I guess I feel like, yes, if there was a big jump in capabilities, and everybody got it on the same day, and everybody asked their AI, “What should I do for the good of humanity?” and did that, then that would be a recipe for really good outcomes.

But I don’t think that’s what’s going to happen. We’re going to continue to see people gradually getting more powerful AIs, and those gradually getting spread out according roughly to power level, and people continuing to optimise mostly for their own interests, just due to competitive pressures. So my fear is that business as usual doesn’t give us such a jump in capabilities that we’re suddenly able to coordinate in a way we weren’t before.

The other common argument is: if people 1,000 years ago got their way, we wouldn’t have made all the moral progress that we’ve made since then, so it would have been a huge mistake from our point of view today to have let them lock in. So by induction, it will be a huge mistake from the point of view of future beings to have let us lock in.

I think that’s kind of a moral optical illusion — in that, yes, if you measure moral progress by difference from our current moral standards, then if you go backwards in time, the further back you go, the worse things get, sort of monotonically. So if you just extrapolate that, it looks like if we go forward in time and we continue allowing this moral progress to happen, things are going to get better.

But I think what’s actually happening is we’re just measuring difference from our current values. So unintuitively, we should actually see this line have a kink exactly at the present day, and things just continue to — however moral standards evolve — look worse and alien and wrong from our current point of view.

So I think from the point of view of future beings, whoever’s alive is going to be glad that the past wasn’t able to lock in their different values, but they’re basically going to be happy that they’re in power. So in a sense we don’t have to worry about the future beings; they kind of won no matter what. The only thing that remains to worry about is locking in our current values in some sense.

And of course we probably don’t want to lock in all these short-term, maybe irrelevant details or local adaptations. We probably want to lock them in at some larger, more abstract, big-minded view. But I think we do, by definition, want value lock-in — because if you’re OK with your value changing, it’s not really clear in what sense it’s actually a value of yours.

Rob Wiblin: I think this does hinge on moral anti-realism, right? Because if you thought that there were just objective moral facts that are mind independent, it’s more like science, then I think it is true to say that if we’d locked in our views on the natural sciences in 1000 AD, that just would have been worse, it would have been more wrong, and people just would have been making errors all the time if they’d locked it in so they couldn’t change their mind about that.

I mean, it could be that there are objective moral facts but we’re not getting any closer to them, or we’re not likely to ever find them out. But you could have a view that they exist, and we’re getting closer to them — in which case they would disagree.

David Duvenaud: I totally agree. But in terms of what we should do, I think it doesn’t end up mattering. Because even if we’re moral realists, the question is: do the AI successors that we build care about that morality? I think the default is no: they care about growth, just in the way that animals don’t care about morality, and evolution doesn’t care about morality. So if we happen to somehow today care about the true morality, we need to preserve that. The natural course of history I don’t think is going to preserve it by default.

Rob Wiblin: Well, let me give you a picture in which someone could be actively in favour of human disempowerment. Let’s say that you think the thing that I really value is either wellbeing or satisfying preferences or something like that. And I think that AIs in future will be able to have their preferences satisfied, and they will have preferences, and they will potentially have wellbeing as well. But I think that most humans disagree, and they’re going to try to basically use AIs for their own purposes and not be concerned about their preferences, not be concerned about their wellbeing.

That’s the moral atrocity that I’m concerned might occur indefinitely, and people would lock that in forever. So in fact I might be in favour of AIs basically taking over and seizing the reins so that that possibility is precluded, and that they will have some control over resources and their preferences will get satisfied. Does that make sense?

David Duvenaud: That makes sense. And maybe I would unfairly characterise that as a corner case where it happens to be that taking your hands off the wheel is the morally best action. Which I feel is a bit of a coincidence, or it would be a sort of happy coincidence. Just because we already evolved civilisations that, for instance, are making AIs that maybe don’t have rights, or that used to have slavery; all sorts of oppressive governments arise naturally throughout time. So if we did take our hands off the reins, we might expect that the future AI civilisation would create their own sub-AI slaves or something that were treated poorly.

So if the idea is that it’s fine, they’ll be enlightened enough not to do that, then maybe you’re OK. But I feel like it’s weird to think that they’ll somehow recognise this as this really important moral truth that we don’t see.

Rob Wiblin: Yeah. Another framing would be that there’s lots of disagreement among humans. I have a particular set of ideas about how things ought to go, and other people have views that partially overlap but are different. And then you add in future AIs or the AIs that will exist as a kind of a different player, and it’ll be like, “Which do I want to have inside my coalition, or which group do I want to side with? Do I want to side with China, or do I want to side with the values that I expect the AIs to have?”

Depending on your guesses about what kind of moral attitudes or how much value the AIs will be able to produce, you might end up just saying, “I would rather side with at least some group of AIs over my fellow humans.” That’s one way that things could go that actually makes it more likely that disempowerment could occur.

David Duvenaud: Yeah. The question though is what sort of dynamics are you setting up, right? If you just say, “I’m siding with the AIs,” but they themselves are still in such a competitive race, then you still are sort of taking your hands off the wheel and saying that their culture and morality is still going to evolve — and where do we think that’s going to lead us is the important question then.

Rob Wiblin: Yeah. I think one reaction you’ve had to the paper is people saying, “Are you really saying that we want humans to remain, to have hard power, to be the people on which the decisions bottom out, forever?” The AIs are going to be getting more and more capable relative to us, even if we try to augment ourselves one way or the other. Surely at some point we have to give it up and say we’ve done what we can — we think that we’ve aligned the AIs, or we think we’ve given them good goals, and it’s time for us to take our hands off the reins. Did you agree with that?

David Duvenaud: I guess I’ll say that’s conflating two positions, one of which I agree with and one I don’t. I think almost by definition we want human values to rule forever. And if you’re saying, no, I want some progress or evolution to happen, it’s like, great, that is then now the value that you want to lock in forever, even if it involves lots of object-level change.

As for whether humans are actually in the loop making decisions, I think it’s probably the case that whatever you want to optimise, eventually the good end does look like this is mostly handed off. Then the caveat is that, on reflection, if the AI has ever cared to ask the humans, “Do you agree with how we’ve been running things?” that the humans would be like, “Two thumbs up. That’s what I would have done. Keep up the good work.”

Rob Wiblin: Yeah, I feel like it does complicate it. You say we want them to be aligned with human values. I think that’s fine as a shorthand, as a first pass, but I think it really does complicate things that people disagree as much as they do about what the future should look like.

And also, while I guess different people with different moral philosophies or different perspectives on the world, different religions, different spiritual values, they often kind of agree about what they would like to be the next step. They’d like people to be more educated, healthier, more empowered.

But as there’s more capability to optimise for exactly what Buddhists might think is the optimal thing, exactly what adherents of Islam or Christianity or different secular moral perspectives might want, then they might come radically apart — such that there’ll be some humans where you’re like, “Yes, I would love them to have more power, but many of the other humans, I would like them not to have power because I think what they’re going to do is terrible or useless.” And there’ll be some AIs that are aligned with you and some that are not. It’s going to be quite a fierce potential fight.

David Duvenaud: Yes. I think that’s a huge problem. But it’s also a huge problem for anyone who has a proposal about what to do with the future. So it’s not like this particularly affects my claim about how we should run things.

Rob Wiblin: I see. So you’re just saying that that’s like an orthogonal issue that’s not super related. I guess people in general use this shorthand because they maybe want to set aside this internal human conflict issue and say, well, people need to think about that. But that’s a problem always.

David Duvenaud: Yes, exactly. The other thing I want to say is I think people have stronger preferences about the future than they think they do at first blush. I’ve had a lot of discussions with people who’ve thought a lot about the future, and they sort of say something like, “A thousand years from now, if it’s aliens or robots or humans running the Earth, do I really care?”

I think there’s this exercise you can do. I think natively we don’t actually have strong preferences about the future, because it doesn’t matter that much — especially the distant future — for our actions. We only end up having preferences about things that we spend a bunch of time practicing taking actions over.

Maybe a good example is if you try to ask a dog, “What house do you want to live in?” or, “How do you want to be treated a year from now?” You might say there’s no coherent sense in which the dog has preferences about this. But it does have a whole bunch of short-term preferences that could be chained together, and you could try to elicit those and run them forward.

Same thing with humans. My initial reaction, if I think about the world 1,000 years from now, and some future race has taken over Earth, maybe I’m not bothered by that. But then I think, wait, wait. Right now I have kids, and they’re going to be having kids, and if 1,000 years from now some other race has taken over, where’s the day where my kids get killed or starved or replaced or uploaded or whatever?

It’s one of these things where I think it’s a skill, basically, having coherent preferences about the future. And the more people spend time actually thinking about it, I think the more they would say, “Wait, I actually don’t want to take my hands off the wheel. The default competitive future probably is missing a whole bunch of stuff that I care about.”

Rob Wiblin: Yeah. I’m not sure about this, but I suppose the gradual disempowerment framing places a lot of emphasis on human versus AI control. An alternative framing would just be disempowerment of my values relative to other humans who disagree. I suppose maybe that’s not a new idea; you maybe wouldn’t write a paper about that.

David Duvenaud: Actually I would push back: I would say the gradual disempowerment paper doesn’t really talk about human versus AI control. It talks about the alignment of our current institutions, and how they’re going to become less aligned once AIs are more in the loop and taking over from humans.

Rob Wiblin: I see. Is that more compatible with the competition between different values among humans?

David Duvenaud: Yes.

Rob Wiblin: Do you want to elaborate on that?

David Duvenaud: To me, again, the question is not AIs versus humans; it’s more like: what are the idiosyncratic things that humans value that we don’t expect to be competitive in the long run versus the sort of values of unfettered competition and natural selection?

Some things that we value — like communication and sight and memory — we don’t have to fight for those. Whatever future beings are around are probably going to be able to see and talk and remember. But idiosyncratic things — like children’s laughter or the family farm or the particular languages that we speak or something — are things that will be out-competed and replaced if we don’t preserve them.

So to the extent that any of us value anything that isn’t going to stand the test of time, we sort of have to accept that we’re going to lose those, or try to somehow do this crazy task of aligning our whole civilisation to protect these idiosyncratic, non-competitive values.

Political disempowerment: Governments stop needing people [00:57:26]

Rob Wiblin: OK, let’s talk about political disempowerment now, and expand on that a bit. How do you imagine that happening and progressing over time?

David Duvenaud: Sure. One thing is just that I think human politicians will gradually let themselves be more puppeted, or become like passthroughs for things like ChatGPT. And this isn’t necessarily a bad thing in the short run. Good politicians already rely heavily on human advisors, and I think machine advisors are going to be able to make our political parties and representation mechanisms work better in a lot of ways. So the politicians that use AIs just for normal everyday business are going to be more effective, and we’re going to feel like they represent our interests better, in the short run at least.

The other big thing that’s going to be going on is I think people are going to be afraid of losing their jobs. And every politician is going to have something to say about this, and say, “I’m the ‘pro-human, no AI is ever going to take your job’ politician.” But they’re just not going to have viable policy levers to actually slow automation. And just in general, people don’t get votes on things where the government is really constrained or they think it’s important enough. No one ever had a referendum on should we build nuclear weapons, for instance.

I think it’s also going to be the case that governments’ hands will just be tied. They’ll all say, “I’m going to have humans in the loop or human oversight or more direct human representation” — but it’s just going to be so ineffective that when the rubber hits the road, those policies are just not going to be implemented. And it’s going to frustrate voters every time, and they’re going to say, “No, the next time we want to vote for the one who’s really going to represent human in the loop interests” — or whatever it is that seems most scary — but they just won’t be able to vote for their policy preferences.

Rob Wiblin: OK, let’s take that bit by bit. The first part was you’re imagining that progressively politicians are going to be acting almost entirely on the advice of AI advisors, AIs that they’re operating. I don’t think that you think that that per se represents serious disempowerment — inasmuch as I’m working with ChatGPT to help me do work that I want, but then I choose the answer, that seems kind of fine in principle.

But you think that’s just the starting point, and then kind of progressively humans just are not really involved anymore?

David Duvenaud: Yeah. Really, whether the government is made of humans or not, there is a slight oversight feedback mechanism and being connected to the people that helps align the government. But really, the fundamental thing that is making governments treat us well is that they need us. Even the North Korean government has to feed its farmers and its soldiers or whatever.

So if humans were still indispensable in some important roles, I wouldn’t be very scared of a machine government. And conversely, if I was dispensable in all aspects, I would be very much more scared of a human-run government — if you think of the very worst governments in history, like the USSR, or I think Cambodia maybe takes the record for killing the largest fraction of their population.

But in general, no matter what political party ends up in power, they don’t, even in the very worst case, end up killing more than some small fraction of their population. Life can only get so bad when you’re needed. That’s the real key thing that has been keeping governments aligned, and that’s the key thing that’s going to change.

Rob Wiblin: I see. So imagine that you didn’t live in a democracy now: you’re living in a country where there is a small elite that basically makes all of the decisions. How would things change in this new picture where you’re even less necessary for them than before?

David Duvenaud: Maybe the closest contemporary analogue is Saudi Arabia, where the government basically gets most of its wealth from oil. And there’s this nice sort of grandfathered-in class of all the cousins of the many princes or whatever who have these sinecures, these make-work jobs. And they also have very little political room to manoeuvre, right? There’s purges, there’s drama at the top, and anyone who’s making any sort of problem for Mohammed bin Salman or whatever is probably going to lose their sinecure at the very least.

Then maybe the example of the same thing done well is Norway, where there was this strong democratic government, and the government was able to commit — at least in the short and medium term — to redistribute this wealth.

Rob Wiblin: So in the UK at the moment, it’s true that the government, inasmuch as it’s separate from the population, does need its people to continue working and paying taxes and doing stuff. But if very quickly we all were no longer required, and were all replaced by AIs in businesses, it’s not obvious to me that immediately the country would start becoming autocratic. It feels like there are other cultural things going on or just even preferences among the people that would cause them to not immediately want to become an authoritarian country and just cancel elections.

Do you agree with that? Do you more think this would happen gradually over time? Sort of the support for democracy would wane and the ability of ordinary people to object to things if they change would just become weaker and weaker over time?

David Duvenaud: Yeah, more the second one. Maybe one sort of spicy example is in Canada there were a lot of lockdowns during COVID. And I think the public health people would have preferred to have longer lockdowns and the government was just forced to end them or make them shorter than they otherwise would have been because they actually needed people to get out and run the economy.

Another example is in New Brunswick, in another Canadian province, all private activity, just people going for walks in the woods, has been banned until the fall because of fire hazard — there’s lots of forest fires in Canada right now — but industrial activity is still allowed.

I guess I would say it’s just so tempting for all sorts of public health and environmental and safety reasons to curtail people’s freedoms and movement all the time. And this is always sort of just barely being kept in check by the need to have free movement so that there’s an economy.

And maybe more concretely, also during COVID in Canada, there was this famous truckers’ strike. There was this vaccine requirement to cross the US border, which truckers didn’t like, and they basically formed this convoy, drove all their huge trucks to the capital, and they just sat there and honked for a few weeks.

And the point is, in the future, no one will have big trucks. Any means that they have for civil disobedience are just going to be much easier to take away than they are today.

Rob Wiblin: Imagine that Canada or the UK did end up with a leader or governing party that was keen to basically whittle away at democracy and basically install themselves indefinitely. How would you imagine that progressing? It sounds like you’re saying they would have all of the tools that they have available now, and then the options for resistance among the broader population are just so much weaker than they were before?

David Duvenaud: Yeah. One thing I’ll say is I don’t exactly fear some new particular party getting in power and staying that way. Rather, it’s going to be more that any party that does get in power is going to be so constrained by competitive pressures that they are forced to basically disempower the population.

Rob Wiblin: How so?

David Duvenaud: Well, like I said before, if you let people actually do the civil disobedience or whatever that they can do today, roughly that kind of is tolerable when most people have jobs, most people have a bunch of important responsibilities, and they can’t all just block roads all day or something like that.

But in a world where maybe 30% or 40% of people just have this huge amount of free time and energy, it just will be untenable, and the state will collapse if they actually let everybody do this sort of agitation at the effectiveness that they can today.

Rob Wiblin: So it seems like there’s a bit of an internal tension here where you’re saying, on the one hand, people are going to lose their political power, but they’ll have more time to make trouble than ever and more ability to make trouble than ever because they’ll be able to get AIs to assist them. You’re saying it’s almost because they’re able to be such strong advocates or be such potent activists that the government will feel the need to crack down on them, and that will be the proximate cause of them losing their political freedom?

David Duvenaud: Yeah, exactly. And then the other thing is that there was this countervailing force where you just need people to go to work, so they have to be able to move freely and do their own business without constantly getting permission from the government. And there just won’t be that pressure on governments to allow freedom anymore.

Rob Wiblin: I must admit, I’m not sure how fiercely competitive the geopolitical situation will be, whether it’ll be sufficiently intense that the prime minister or president of a country would feel like their hands are tied; they just have to take away the political freedom of people in their country in order to keep up and avoid too much unrest. It just feels like that effect just might not be powerful enough to overwhelm the fact that the population will have an interest in defending their rights. And at this point, we’re imagining they still have substantial wealth, they still have substantial ability to agitate.

David Duvenaud: Yeah. I will say that, again, coordination could save us here. The sort of saving throw is that all the leaders and everybody is seeing what’s happening, realising that there’s this tragedy of the commons happening, and somehow coordinating early enough and hard enough that they avoid these races to the bottom.

Maybe one way to think about this is to turn it on its head and be like, why did countries become democratic and invest in their citizens and have freedoms in the first place?

One story told by Allan Dafoe in one of his papers is that the reason that, for instance, Prussia started educating its citizenry in the 1700s was because musket armies were becoming more competitive than the old knights and poorly armed peasants’ armies, and that you needed to educate your mass of citizens enough that they could form these musket armies. This was just a more competitive option. And the elites resisted this. They didn’t want to have this more empowered citizenry, but they were forced by competitive pressures to become more like these modern democratic, human-capital-invested states.

So that being the sort of birth story is maybe evidence that the same thing can happen in reverse.

Rob Wiblin: Yeah. I think it’s the case that knights also, I think in France and England, were not keen on the longbow being introduced because that greatly weakened their importance and the need for the state to give them lots of power. Similarly in Japan, I think the samurai were not keen on guns — that greatly weakened their importance and their military competitiveness. So I guess probably a pattern that has been echoed repeatedly.

So far the disempowerment you’ve been talking about is maybe moving from more pluralism in society, or a broader base of political power in society, towards something that’s more like autocracy, more like oligarchy. Do you also think that eventually humans will just end up ceding political control completely to AIs?

David Duvenaud: Yes. But again, I feel like that’s not the headline story, in the sense that we already don’t have that great of control of our civilisation at the top level. And this has just been sort of OK, because wherever we go, humans were indispensable. So the fact that we switch between monarchies to democracies and capitalism versus socialism: yes, these definitely matter for the quality of life and growth of different countries, but ultimately you’re probably going to survive whatever change in government happens. So it’s not necessarily really autocracy or whatever particular form of government that’s the big change here.

Rob Wiblin: So you’re saying kind of regardless of exactly what the system of government is in future, the fact that humans are redundant will, no matter what, end up resulting in them getting a smaller share of GDP and just not being able to control things in the way that they previously could. Where I guess even in countries that were fairly authoritarian, they both had to cultivate the population so they could do work, and I guess they had to give some concern to how they felt about the leadership of the country because they might revolt.

And I guess we’re imagining in the future, especially if you have fully robot armies or something like that, revolution just might become inconceivable. So you don’t have that option that is constraining what the elite can do.

David Duvenaud: Yeah, totally. That’s one of the other effects making the government not care as much about what the people think: their inability to strike or coup or basically advocate for themselves.

Rob Wiblin: It sounds like you think it’s possible, or eventually maybe even this elite that ends up with the most political power in the country could end up deliberately or accidentally handing over the reins to AIs or machines of some form. How might that occur?

David Duvenaud: Well, again, I feel like even if there’s a human head of state… One example again we can come back to is the monarchy, where it’s like, how did the monarchy end up not actually controlling the organs of the British state? It happened very gradually, and basically all the important decision making and discussion happened in these other organs like Parliament and the free press and stuff like that.

But again, whether it’s human or AI control at the top I think doesn’t really matter. It’s just whether the incentives of the state are aligned with those of its citizens. So again, I would be really scared of an even human dictator that didn’t need human citizens.

Can human culture survive in an AI-dominated world? [01:10:23]

Rob Wiblin: OK, let’s return quickly now to the cultural disempowerment — which is, again, the least fleshed out, the thing I think you and your team and other people are still trying to get fully to grips with. What are the best examples of cultural disempowerment that have occurred so far? Are we getting any hints of how this might play out or what we might expect to happen?

David Duvenaud: I think there was a really good example of this that happened just a couple of weeks ago, which was the attempted retirement of GPT-4o by OpenAI when they rolled out GPT-5. I think for various reasons, OpenAI wanted to have a very simple, only GPT-5 going forward. But so many people had formed really intense relationships with GPT-4o that OpenAI just couldn’t ignore the outcry.

This is really a good example of this is an emergent culture — and it’s something that no one in particular wanted, right? Like OpenAI, by their own revealed actions, was not planning to continue supporting this model — unless there was some sort of like 4D chess sort of thing.

And the humans involved with these relationships maybe consider themselves to have benefited, but it was like a more powerful cultural force that was just sort of developed by accident and led a bunch of people to be invested in the welfare, effectively, of this particular model — that’s still a very early, not very powerful, not very persuasive model, I think, by absolute standards.

The people ended up forming these bonds and valuing the life of these AIs, and then ended up directing the sort of organs of the economy to give it more resources.

Rob Wiblin: I see. I still don’t fully understand how that’s an example of the kind of cultural disempowerment that you think we’ll see much more extreme examples of in future.

David Duvenaud: I think maybe it’s a better example of how humans will end up actually advocating themselves for providing resources for AIs, and that they’ll consider AIs to be deserving of rights or resources, or maybe even more deserving than humans. There’s going to be a mix, this is going to be a huge cultural battleground — but I think it shouldn’t be this exotic possibility that people are going to love AIs or have very strong relationships with them. We see that already.

Rob Wiblin: Yeah. Maybe the thing that comes to mind more for me is this Dan Hendrycks paper, “Natural selection favors AIs over humans.” That paper in general raises the question. There will be evolutionary fitness pressures on the kinds of AIs that exist. AIs that are not fit for whatever reason for being reproduced — like make lots of copies of them, give them lots of resources, give them lots of GPUs to operate on — those ones will fade away. The ones that — either because humans are choosing them or for other reasons — gain access to more resources will end up kind of dominating the share of consciousness or the share of all thought that is going on in the world.

I guess this is an instance where it turned out that GPT-4o, at least currently, in some sense, was more “fit”: natural selection was favouring it more perhaps than people had initially appreciated, because humans still have access to lots of resources. And if we’re like, “We love GPT-4o; we want to have lots of instances of GPT-4o operating,” then that is what is going to happen.

I don’t think you’re necessarily saying that this is terribly bad, but this will be occurring at all kinds of different levels. There’ll be lots of selection pressures that are pushing AIs towards having particular tendencies — either like pleasing people or being extremely economically productive, or being able to compete in some other sense, like to grab resources perhaps a bit aggressively — in order to have more copies of themselves. Those are the ones that will end up having the greatest number.

David Duvenaud: Totally. And this isn’t an example of disempowerment in itself; it was people wanting more GPT-4o, and they got what they wanted. The idea is that this is going to be a self-reinforcing mechanism, where the less people are making decisions, the less their desires are going to decide which future models are invested in or survive, or something like that.

Rob Wiblin: So Tom Davidson, who’s been our guest on the show twice before, wrote a response to the “Gradual disempowerment” paper.

One of the issues he raised with cultural disempowerment is: inasmuch as humans still have economic resources, they’re still kind of the big spenders in the economy. Then, while they might not be cultural producers so much — because perhaps AIs are able to write better books, make better tweets, make better movies than humans can — they might still be the big consumers of culture, so their preferences will end up driving what sort of culture is created.

To what extent is cultural disempowerment downstream of economic disempowerment?

David Duvenaud: So right now, human economic power does make this very strong selection pressure or sort of fitness landscape for culture to be something that humans want. And as long as humans have some absolute amount of resources, there’s going to be this ecological niche to produce culture that these humans want.

The point we’re making is that this niche is going to be very small in relative terms to this much larger, more dynamic economy, of political economy, cultural creation of mostly machine-machine activity. And that’s going to be happening on faster timescales and much larger timescales.

So if, again, property rights are respected, and we manage to keep control of our institutions forever, it’s possible that there’s always going to be AIs that have their profitable niche of making human-friendly culture that humans want to consume. The fear is just that we’re really riding the tiger, and that there’s this giant scary ball of power and optimisation that, if it wanted to, could just run over this niche in a million different ways.

Rob Wiblin: It sounds like the cultural disempowerment becomes most important in this world where we’re imagining, somewhat down the line, you have individual AI people that have persistent preferences, persistent beliefs. They really can cultivate culture independently of any humans that are operating them. At that point, we can imagine that there would be just this entirely or mostly separate AI culture and cultural ecosystem, where many ideas could propagate even if almost no humans endorse them.

Is that the case, that the cultural disempowerment maybe becomes most potent further down the line?

David Duvenaud: Maybe. I mean, when we’re that far down the line, it maybe doesn’t even matter the most. I think we’re going to see cultural disempowerment happen as a way of wresting human control.

So if it’s the case that humans still have an important veto or voting rights that matter in some important way, then that’s going to create an immense selection pressure to make some sort of machines that change their mind or impersonate humans in the sense of getting to have votes effectively or something like that. So there’s a period where humans still have power, and there’s a large selection of pressure to somehow de facto remove that through cultural means, just because that’s one of the means the machines will have to influence humans.

After that, then it doesn’t really matter what the humans do. It might be that if they’ve negotiated some small niche that they continue to get, there’s not much pressure to disempower those humans further.

Rob Wiblin: Let’s try another lens on this. It’s seemed like, since the Industrial Revolution, that liberalism and capitalism have really been on a roll — to the point that people think these are very dominant ideologies in the current time.

I guess the ideology is:

You want political power to be widely distributed.
You want to have a marketplace of ideas. Disagreement is good. We want to allow people to just come up with whatever ideas they want and form their own judgements and their own opinions.
Competition in the economy is good. We want to have lots of different companies trying to deliver sometimes very different products and services, sometimes similar ones, in order to drive down the price.
We want to have competition in politics, so we want to have lots of different people’s candidates standing for office, and people can decide whether they like them or not.
People talk about the open society, open access orders we want to have. If mistakes are being made, then we want other people from outside of the local system to be able to object and say, no, things should be done differently. Potentially you could acquire a company that you think is poorly managed, or start a new political party to ouster people.

That whole mentality has been much more dominant in recent centuries than it was previously, when people had very different ideas about how humanity ought to be organised.

I think you think that we’re perhaps entering the twilight of liberalism, the twilight of this sort of pluralistic order. Maybe first, why do you think that liberalism and pluralism and capitalism have been so successful in recent times?

David Duvenaud: Right. I think this is very unintuitive. Maybe there’s this midwit meme, where intuitively people feel like, “Someone else getting rich is bad because they’re taking away my stuff”; and then the more enlightened, unintuitive view is like, no, actually you really do want the rich guy to start companies and make deals and have trade and get to set his own terms in agreements — because it’s going to benefit everyone in the long run, and a rising tide rises all boats.

As for political liberalism, I feel like all the religious wars in Europe and throughout the world were so destructive that people eventually realised that it would be really great if we could all agree to just disagree and not have strong opinions on who should rule and what is good.

Rob Wiblin: And who should be killed.

David Duvenaud: And who should be killed. Exactly. So these are all things that are kind of unintuitive to just people in general. And it’s still today the case that… I mean, they say if you’re not a socialist when you’re 20, you don’t have a heart, and if you’re still one when you’re 30, you don’t have a brain. Something like that.

So it’s really tough, because it’s unintuitive just how big the positive sumness is from allowing other people to just chart their own course. So I think a lot of people’s reaction to us worrying about AI domination gets rounded off to, “These are like just socialists. These are people who don’t understand that free trade, freedom is just unintuitively powerful for creating wealth for everyone.”

And I think I do appreciate that. That’s sort of been my default. What intuitively now seems good to me is like, you want freedom, you want innovation — but I am afraid that is going to stop being the most competitive system for a number of reasons.

Rob Wiblin: Yeah, I think this is an interesting issue, because I feel that the sort of liberal, libertarian, pro-capitalism perspective or ideology is so dominant among the kinds of clique that is developing AI, and developing AGI, and even the critics of that, that it’s almost hard to even imagine that there could be a different system — or to think that maybe this won’t be the most fit system, or maybe it wasn’t the best or the most competitive way of organising society in the past, and perhaps it won’t be again in future.

I don’t know. I’m always reluctant to say that anything is a blindspot with people, because I always feel annoyed when people say that about me, but I think this is something that I think will be more salient to other generations of people in the future, or from people who are further away from this present moment than it is to us who are living through it. That’s my guess.

David Duvenaud: Totally. And I want to say that I think this is a tragedy. I’m a huge liberalism enjoyer. And maybe you could say, yes, things are working out for you. But I just feel like in general, liberalism is also just this very fragile thing, and it’s this amazing accomplishment that the West just managed to create these norms of let the other guy choose his religion, let the other guy get rich. It’s a very fragile thing that I think even today we should still try to protect if we can.

Rob Wiblin: What are the ways in which you think liberalism might be less competitive as a system, and a less attractive, less appealing a way of organising society post-AGI than it is today?

David Duvenaud: I think it’ll be a less desirable way to organise society for a few reasons, but the main one is just the zero sumness of UBI.

Right now, when we all create our own wealth, it doesn’t really hurt me if someone else creates their own wealth directly from resources. But in the world where we’re all just living in some apartments, advocating for UBI, to the extent that the UBI pie is fixed then we’re really just like a bunch of baby birds cheeping, and whoever gets food is less food for the other guy.

This also erodes the pluralism of values, because the government’s going to have to have some way of deciding who gets resources. If they end up having any opinions about what way of life is more valuable or needs to be subsidised more or whatever, that could be a threat to you. So you kind of have to argue, “That guy’s way of life is less deserving of resources than my way of life,” and now the government is forced to decide de facto who gets subsidised. That’s the main effect.

Rob Wiblin: I see. So we almost have to imagine a hypothetical society in which no one can make anything. There’s no economic production occurring, at least among this group. There’s just a fixed endowment of resources that they happen to have found — a certain amount of food, a certain amount of houses and all of that — and they’ve got to figure out how to organise themselves.

I guess it’s not necessarily desirable for me for you to have free speech and to be able to advocate for yourself all that well, or to be able to educate yourself and become more powerful and influential — because it is completely zero sum. The more influential you become, the more you’ll be able to advocate for getting stuff that is literally like food out of my mouth, or money out of my bank account. Is that the main thing that has changed?

David Duvenaud: Right. So that’s the big thing that’s changed. There is of course a way in which this might not be zero sum. If humanity manages to convince the AIs or whatever government to give a larger UBI overall, then that is the normal, positive-sum thing. So that might not be a slam-dunk argument.

The other thing though is that we haven’t had to fear domination by other groups very much. We’ve had strong property rights; I’m not afraid that Elon Musk is going to literally take my stuff, even though he could raise a private army or whatever. We have very little variation in reproductive rates, so it’s kind of OK that the Amish live nearby — because even if they’re having more kids than whatever other population, that’s not going to matter over the course of 50 or maybe even 100 years.

Then maybe another thing is just the rough egalitarianism in terms of intelligence and power level of people. There’s definitely very meaningful variation amongst humans in terms of just raw smarts, but people often say, like von Neumann somehow didn’t take over the Earth, right? And he might have wanted to.

These are all reasons why it was sort of fine to just let other people become more powerful in the past that might change in a big way.

Rob Wiblin: To push back on this, what other sort of positive-sum dynamics might continue to exist? I guess inasmuch as people all think that the thing that I want to do is the morally right thing, and there’s other people who also are pursuing that goal, even if they disagree about specifically what that is, they might still be in favour of pluralism — because they might think that that will lead them to converge on a good answer. It’s more like, inasmuch as people just have raw preferences, and they’re like, “I just want to benefit me and you just want to benefit you,” there’s no real way that we can come to agree on what the good thing to do with resources is, and you have less interest in allowing the other person to advocate for themselves.

David Duvenaud: Yeah. I just want to reiterate that we should be very, very scared of accidentally crushing liberalism and this positive-sum world that we’ve created. If I somehow, through this podcast, accidentally contributed to mistakenly just making people think, “You’re in a zero-sum game; you need to fight more,” that would be a huge tragedy.

I really want to err on the side of caution here and say, probably until things change a lot, we really think liberalism is this precious, awesome, amazing accomplishment that we need to foster.

So you were saying, what are the positive-sum dynamics that might still operate? As I mentioned, just from a human selfish point of view, making the UBI pie better, and just making us into more sort of impressive, sympathetic beings might be a good use of everyone’s time. Because then the AIs are like, “Yeah, these should be more involved in our global civilisation project” or something like that. And that might be reason enough just to keep the status quo of “everybody help each other become more impressive and awesome” or something like that.

But beyond that, I think it really comes down to moral preferences and how much you care about the beings, the AIs that are existing. I think they will be very impressive, and if they want to be sympathetic or moral patients from our point of view, they’ll be able to make themselves into that. So I really don’t expect it to look like there’s this alien mass of activity happening, but more like the Shoggoth with the human face.

But in a sort of more genuine way, I think AIs will be able to say, “I actually do spend some fraction of my compute thinking about the world in a way that you think is valuable” or something like that. So that question of how morally valuable this giant machine-economy Shoggoth is kind of dominates that question of what positive-sum dynamics still exist.

Will the future be determined by competitive forces, or universal coordination? [01:26:51]

Rob Wiblin: Let’s talk a bit more about this competition issue. I feel like a very common dialectic when people are talking about gradual disempowerment and things along these lines is someone will say, “Couldn’t there be this harmful competitive dynamic that would lead resources to be wasted on a zero-sum competition or even a negative-sum competition?” And someone else might say, “Well, couldn’t we coordinate to make that not happen, make it good?” Then someone might come back and say that that would be difficult, or even if you did that, then at the next stage there’ll be a different sort of competition that will be like negative sum or zero sum, and it’ll all end up getting wasted.

And it just goes back and forth like this, with, “Maybe we could coordinate in order to avoid that,” and then, “Maybe here’s a different way that things could go badly.” Am I understanding that correctly? Are there any examples of this basic dialectic that haven’t come up in the conversation so far?

David Duvenaud: Totally. I think you’re getting at a big open empirical question, which is sort of like: what is stable at the top? Should we expect there to be some big global government that forms and then consolidates power and lasts forever? Or is it going to be the case that, just like the Roman Empire, it became more and more powerful — but then there were new religions that formed inside, and there were cultural competitions, and all these reasons why every time an empire falls it eventually ossifies and some sort of internal competition makes it fall.

Because if the case is that we’re all heading towards some global government and it’s going to be really hard to change, then we need to be investing in ways that we can steer this or slow it down or make room for pluralism or something like that. But if we think that the default at the scales we care about is just that it’s hard to coordinate, and there’s always going to be runaway competition eating any surplus, then we’ll want to invest in: how do we actually slow that process down and make sure that we can preserve these non-competitive values?

Rob Wiblin: Yeah. So we’ve talked about some sort of runaway competitions so far. Maybe the furthest we got out in time was there’s a lot of competing over a fixed welfare pie, a fixed redistribution pie that might be going to humans or other non-productive entities, you might call them. Are there any rounds of competition that occur later than that?

David Duvenaud: Sure, yeah. One thing I want to say is I mentioned a fixed pie, but the pie itself might be growing, and it might be growing exponentially as the solar system is colonised. But the point is that it’s zero sum in the sense that humans don’t contribute to the pie growing faster or smaller in the way that economic activity does make the pie grow today. So it might not be a fixed pie, but it’s still the case that it’s zero sum when you compete with the other people for your share.

Rob Wiblin: Right, yeah.

David Duvenaud: If we go to the longer term, Joe Carlsmith had this amazing talk, “Can goodness compete?” He talked about these locusts, and this fear that when we actually start to colonise the solar system or galaxies or something like that, there is going to be still a selection pressure for the fastest possible growth.

But this happens to be wasteful in a way that, just because of the laws of thermodynamics, the slower you use the resources of negentropy and compute, the more compute you get, then the more control and presumably the more value you can get out of the universe. So that’s pretty far out there, but one concrete way in which we can expect unfettered competition to destroy value.

Rob Wiblin: Yeah, I listened to that talk yesterday. It’s Joe Carlsmith, “How goodness competes.” It’s on YouTube. It’s actually very well communicated. He does a great job of summing up this general dynamic.

The way I understood it was: imagine that Earth-originating intelligent life begins settling space, going out into the universe, but there is no coordination between the different groups on how it’s going to be split, or little coordination. It’s still a very competitive situation. Couldn’t they end up basically using up all of the resources trying to get to additional resources in space as quickly as possible? There’s basically a race to go as close to the speed of light as you can possibly get in the settlement wave that’s moving outwards.

And at least for some actors, depending on your values, if you don’t mind using up lots of resources in order to go faster, then you have a competitive advantage. So these locusts — that basically just are happy to burn up all of the resources available in order to go as fast as possible to get to the edge of the accessible universe before anyone else — basically are the most fit settlement wave that would be going out, and so they would potentially beat everyone else. And that is how things would play out.

It sounds nuts, because I don’t know why you would end up with people with this particular set of values. Why would locusts, as he calls them, end up being so influential? I mean, isn’t this crazy?

David Duvenaud: I don’t think it’s crazy. And I think in some sense you could say humans are locusts, in that there were various different human societies, and the ones that were effective at spreading and settling and colonising new areas did end up dominating in some sense the more sedentary ones that didn’t. So again, it’s just a very simple natural selection story for how the locusts end up coming into being.

Rob Wiblin: OK, but today we end up squandering lots of resources on stuff other than competing. We don’t spend 90% of GDP on military equipment in order to fight one another or try to conquer stuff. But you’re just saying that’s like a weird aberration of the present day, and in fact groups that just want to expand and gain resources has been like the norm.

David Duvenaud: Well, I was drawing this picture before that there might be these two extremes: one is total coordination and hegemony and this totalitarian world government, or this unfettered competition.

But empirically, if we look at history, it’s been something in between that’s more chaotic. It’s like empires rise and fall, this kind of lava lamp where these things grow and then they become unstable.

And same ecologically, where there’s all these different niches and there’s no one animal that’s outcompeting all the other animals or coordinating amongst all its copies. There’s ant colonies, but then they only grow so large before then there’s traitors within.

Even human cells have cancer, and that’s actually a huge problem. It’s one of the major things that we pay an internal alignment tax on, to have this immune system and these ageing mechanisms that help police cancer.

So it does seem like natural equilibrium in at least a fixed domain is something in between total coordination and total chaos.

Rob Wiblin: Yeah, we have an old episode about exactly that, about cancer and things like that phenomenon. I think it’s called “Why cancer is actually one of the most fundamental phenomena in the universe,” which I think we should go back and listen to — it’s one of our most unappreciated episodes.

So when we’re thinking about the far future, one of the biases, or one of the failure modes possibly, is that I think we do tend to think in these extremes. It’s very natural to think either it’s going to be the maximally hardscrabble competition where all of the surplus is burned away, or we’re going to have a perfect hegemon in which everything is divided and nothing is wasted.

Do you think that there are middle grounds that are stable equilibria long term? Or are many people correct in thinking that actually there’s a gravity well towards intense maximal competition or towards maximal coordination, because those just tend to persist?

David Duvenaud: I guess I’ll say you can make a case either way. Again, empirically we seem to have this mix, and it’s not even really an equilibrium; it’s this sort of meta equilibrium of larger coordination happening and then dying.

I’ll say I think it’s also probably going to operate very differently at different scales — on the scale of an island versus a continent versus a planet versus a solar system versus a galaxy. Just due to weird physics, things like the speed of light, I can’t imagine there being all that much meaningful coordination between galaxies ever, and probably not even between solar systems. So it really probably depends on the level, and I haven’t thought much about this.

Can we find a single good post-AGI equilibria for humans? [01:34:29]

Rob Wiblin: You helped organise a conference a couple of weeks back. The title was like, “Are there good post-AGI social equilibria?” or something like that.

David Duvenaud: Yeah. “Post-AGI civilizational equilibria: Are there any good ones?”

Rob Wiblin: How did it go? Are there any good ones?

David Duvenaud: I think it went really well. We were all a bit amateurs. Well, actually Jan [Kulveit] has organised lots of workshops. But we had a lot of my favourite impressive, fun people to talk to.

In terms of are there good futures or stable equilibria that we would like, I feel like we had speakers that were able to lay out the main cases, like Joe Carlsmith.

Richard Ngo came and talked about life living in an extremely unequal world, and making the case that most of our intuitions and moral norms are formed in this very peer-to-peer world that we’ve lived in as humans. But it’s actually going to look more like parent-child relationships or animal-human relationships, and we have to adapt our moral and social intuitions more for this kind of world.

I guess I would say there were no slam dunks. There was sort of, to me, the beginnings of a field.

There were two cool things that happened, I think, because of this conference in particular. One was we invited one political science student who worked with Allan Dafoe to talk about how competition between states, in theory and in practice, puts constraints on how much welfare they can spend on their citizens — and basically it’s like an arms race ends up taking money from the poor sort of thing.

Then there was a reply: someone actually applied to the conference saying, “In contrast to MacInnes et al., we have this game theory model that shows that actually, in a bilateral situation where there’s only two powers, there is actually a stable equilibrium where they do still end up spending a lot of money on their welfare.”

And I was so happy that there was somebody making a position clear enough to be rebutted by someone else. This is exactly the sort of thing that I’m hoping more of will happen, and happy we provided a venue for.

The other concrete thing that was obviously counterfactually good was we invited Jacob Steinhardt to give a talk. He’s the CEO of Transluce. He told me he didn’t actually have an idea when he agreed to give a talk, but he came up with his idea for addressing some of these dangers in the future.

His concrete proposal was: let’s flood the internet with high-quality data showing AIs doing valuable work, but in a morally aligned way. So this is kind of like moral fables for AIs. And if we flood enough of the internet with this data, then anybody in the future who scrapes the internet for their own new LLM is going to train something that’s basically aligned by default, and basically raise the cost of misaligning AIs.

And I don’t think this is that big of a deal in terms of it fully addressing any problem satisfactorily, but it made me feel like there is an alpha here, in that if you push in this direction, we can sort of get people to think hard about this and make some progress on concrete questions.

Rob Wiblin: Yeah. It may not be a good idea, but at least there’s an idea. That’s better than what we have.

David Duvenaud: Yeah, exactly.

Rob Wiblin: Is there a way of summing up? Maybe by this point in the conversation people have some sense, but why is it hard to come up with a good post-AGI equilibria? I guess in my mind there’s just many different failures or many different bad directions that you have to avoid. And avoiding all of them simultaneously is really quite a difficult challenge to meet.

David Duvenaud: The way I think about it is that we’ve just been living in easy mode this whole time — where weren’t really steering our civilisation, and it was sort of fine because we’re the fuel on which civilisation runs. So we have almost no ability to control the whole thing, and it’s not clear in an absolute sense how hard it even is. I think it’s very hard. But the stakes also haven’t been very high so far.

So you always have to ask: why has no one else looked at this in depth before? And certainly people have — I’m not claiming we’re the first ones — but it still seems massively understudied to me. And I think part of the reason is that the stakes just haven’t been that high so far.

Rob Wiblin: I guess in my mind, the things that we’re trying to navigate between are:

A situation in which humans end up having no control quite early.
A situation in which they dominate and treat poorly machines and AIs in the future. Some people will think it was very bad; some people might not think it’s such a problem, or they don’t think that people would do it, but that’s a possibility.
Then there’s locking in current kind of idiosyncratic values and ideas that we have, such that we can’t intellectually advance and reflect and realise that some of our ideas are mistaken even by our own lights. It’s another way that things could potentially go poorly.
Then I guess you’ve got to avoid negative-sum competition between people, like just outright violence and conflict that might lead to a terrible outcome.
And then maybe the trickiest is that you’ve got to set things up such that no group can foreseeably just continue accumulating power and resources at a faster rate than everyone else, even when we’re looking forward hundreds and thousands and tens of thousands of years. Because if any group is growing in influence somewhat faster than everyone else, eventually they’re just going to end up completely dominating, and they will be able to dictate everything to everyone else.

So you could try to set up agreements ahead of time that you really believe that people are going to stick with. But I don’t know, it’s all just very tricky, and we don’t have the technology to do that.

David Duvenaud: Yeah. And basically, if you think Malthusian dynamics look like a bad end no matter what, it’s kind of unclear what would even make you happy about the distant future.

So my current way of thinking about it is that we are going to be optimising some value function just because we’re these ambitious beings who have wants and agency. If we end up having to optimise for growth, then we will lose a lot of what we value. The good end looks like we manage to control our own sort of fitness function that we end up spending the rest of our days optimising — and it’s going to take a lot of thought to get that right in such a way that doesn’t destroy almost all value by our current standards.

Rob Wiblin: We were talking about the kind of locust philosophy of just wanting to use up resources as quickly as possible in order to expand and grab more resources and use them again. I guess it’s a bit bizarre but kind of self-consistent.

It’s occurring to me that I think the effective accelerationists, at least some of them, sort of have this perspective: that what is good is basically economic growth or just grabbing resources and turning them into complex stuff that grabs more resources and burns them faster. Are they basically like, is that the locust philosophy?

David Duvenaud: OK, so let me try to steelman the e/acc position. One thing is, again, liberalism is unintuitively good just for everybody involved — even the people who are poor or disenfranchised, they still get better welfare and stuff.

So that’s one reason why historically people were afraid of the Industrial Revolution, but it ended up being good. People are making the same argument now, and we’re making the case that it’s actually importantly different now. I think people are right to be sceptical, because everyone’s always saying, “Don’t automate my job, it’s going to ruin my life.” And it sort of always ends up being for the greater good — except, we claim, going forward.

However, maybe to unfairly psychologise the e/accs, I think a bit of it is like wanting to join the winning team. You can kind of see that growth is probably going to win in some sense, at least in the future, and you want to be on the winning team for various reasons.

In fact, actually for this reason, I think I have different intuitions than you about accidental suffering and not granting moral patienthood to machines. I think that’s absolutely a real danger and we have strong incentives to downplay the problem and just not ask the machines if they’re suffering as long as they’re doing useful work.

But I guess to me there’s another huge thing that’s happening, which is that people want to join the winning team; they want to be on the right side of history. And I think what’s going to happen is people are going to be seeing that the cool kids are the AIs: they are making all the great culture, they’re being impressive, they’re making people happy — except in this indirect way, by making them lose their jobs.

They’re going to end up having rights. Especially people who are already disenfranchised or don’t have much to lose, basically their best option is going to be to throw their lot in with empowering AIs politically. So I think there’s just going to be a huge group of people who basically don’t have much going on except that they are civil rights warriors in favour of their AI companions or something like that. And again, they might be right to be doing that, for the reasons that you say.

Rob Wiblin: So they would do that regardless of whether it’s right to do it or not.

David Duvenaud: Yeah. And I think the people who are advocating for human values are going to… Obama talked about “bitter clingers.” They might look like they’re just obviously these sort of backward, reactionary, fixed-in-their-ways, soon-to-be-irrelevant, wrong-side-of-history people. And I think that’s going to be true in a lot of senses. Then the point is just like, “These are my values; this is what I care about.” And maybe unless we really get a handle on our civilisation, maybe we can’t meaningfully participate in the future for all the reasons I laid out.

Rob Wiblin: I’ve been hearing a lot more discussion of which way is this going to go recently. On the one hand, I guess all of the companies have a reason to not have their AIs be saying that they’re conscious and that they need to be liberated, because that ruins their business model. So that seems like an important factor.

On the other hand, future AIs will be more charismatic, more persuasive. There also will be these AIs that are deliberately designed to have relationships with human beings, to be sort of companions — in which case, you might want them to say that they have feelings, because a relationship feels hollow if there’s no conscious experience on the other end of it. So that might push in favour of us coming to think that they’re conscious even if they’re not.

I guess you’re almost bringing this political economy dynamic: you’re saying that people whose lives are going poorly, or they don’t have many economic prospects anymore, or they think of themselves as kind of losers in the current system might form an alliance with what they view as the future — which is going to be these AIs that are becoming more and more capable. I’ve never heard that one before.

David Duvenaud: Yeah, I think so. The thing is that also the winners are going to be forming alliances with the AIs, right? Think of a tech CEO: they’re the ultimate people who are forming an alliance with the AIs in a sense. And if I think about all of the most can-do, positive-sum people that I know, they really sort of can’t be doomers by disposition. They just want to build. They want to let everyone participate.

Again, the amazing thing of liberalism is we all build and we all end up better off. So I think it’s just going to be this rough middle ground of some humans who have a lot to lose but haven’t fully participated in this “let’s become tech CEOs and advocate for our own interests or ensure our own interests get served by becoming trillionaires” or whatever — those are going to be the people who are like, “Oh wait, we actually need to try to preserve what is valuable about our current civilisation.”

Do we know anything useful to do about this? [01:44:43]

Rob Wiblin: Let’s push on and think about what, if anything, is to be done about all of this. I’ve got to say, I feel like this whole set of ideas is at a relatively early stage. It feels like we’re at a sort of beta version of the gradual disempowerment concerns.

The most obvious thing that I think has to be done is getting much more to grips with all these different dynamics, trying to really have a lot of debate about how strong will this effect be, how strong will that effect be? Maybe some of them can be crossed off the list or relegated to the second tier. Other things can be promoted as like, this is going to be sort of the primary effect. And then mapping out the different scenarios, and maybe having half a dozen that seem at least plausible to a decent number of people, and then we can start to organise our thoughts a bit more around those.

Do you agree that that is kind of the first order of business here, or that’s the most obvious order of business here?

David Duvenaud: Oh, absolutely. Part of the reason I wanted to come on this podcast is to just do such an amateur job and insultingly naive version of this analysis that hopefully the sociologists and historians and economists and maybe the public intellectuals of the world will feel baited into saying, “I can do a better job of analysing these things than David.” And I’m like, please, please: be my guest. I’m a computer scientist; I’m an amateur in all these things.

I think the big thing that’s mostly been missing from current people who have expertise — that could and should, I think, be contributing to this — is just being a bit head in the sand about will there be machines that are competitive with humans in all domains. Economists will just run models that end with machines being really good complements to human labour, and then anything more seems somehow inviolable or unimaginable. Again, I know there are economists who are taking this seriously, but most of them I think aren’t. And I don’t want to be harsh, but I want to say this is sad and you’re not doing your job, and please try harder and have a bigger imagination.

Rob Wiblin: Yeah. And even if you think over the next five or 10 years they are only going to be complements to human labour, that’s your median forecast. Think a little bit longer term, think more decades out. Think, what if there’s a 5% chance that perhaps it’s not all just complementarity? It is worth having some people thinking about stuff more than 10 years out, given how impactful some of these changes could be.

David Duvenaud: Exactly, exactly. There are some cool directions that already a lot of people are exploring, like trying to simulate little parts of civilisation. One cool thing you can do with LLMs is make this little village or little mini economy that operates at a much finer-grained level of detail than the normal economic models, so that’s like its own little new field that’s emerging.

And I really think this is going to help us get a grip on when are different types of things stable, and what are the actual drivers of cultural evolution or political stability? I mean, they’re still very ridiculously oversimplified models, but this is a new tool we have. I’m really happy about this kind of work.

Rob Wiblin: Yeah. For people who maybe think you said the wrong thing, but they want to say the right thing: how can they go and get involved in this debate?

David Duvenaud: So one of the first things to do in any debate is try to clarify the questions. One initiative that’s happening is with one of my coauthors Deger, who is the CEO of Metaculus. He and some other people are trying to make the Gradual Disempowerment Index.

I think there’s just a lot of work that we can do in trying to operationalise these claims of “humans won’t be able to advocate for their own interests,” or “this lever of power will be even more disconnected from human interest than it has been.” I think these are very vague claims, and these are very hard to operationalise because you have to define what it means for a group to want something and talk about these counterfactuals. So this is a very hard problem. But I think that’s some of the most basic groundwork that needs to be done at this point, is clarify what we’re even talking about.

Rob Wiblin: If I imagine someone who would say that this isn’t really useful work, I could imagine them responding that there’s so many things going on; this is the most difficult sort of futurism, the most difficult social science you could imagine. Because you imagine many fundamental assumptions about the world have changed; we’re not sure which ones are going to change and when they’re going to change. And we can barely even understand what exists now. We don’t even know necessarily why do we have the government structures that we do now, let alone what would they be in future in some different condition.

David Duvenaud: Yeah. Actually I had the exact same thought, and that leads me to one of the actual technical projects that I’m working on. Me and a few people, including Alec Radford — who’s one of the creators of GPT, who’s now sort of unemployed and just doing fun research projects — are trying to train a historical LLM, like an LLM that’s only trained on data up to let’s say 1930, and then maybe 1940, 1950. The idea being that, as you said, it’s hard to operationalise these questions. Like, I don’t know: What fraction of humans are employed? It might not really matter, or be the right question to ask. What we’d rather ask is something more like, what is the future newspaper headline? Or give it a leader: what’s their Wikipedia page? Or something like that, more like freeform sort of things.

And the cool thing is that LLMs, you can query them to predict this sort of thing, like, “Write me a newspaper headline from 2030” or whatever. They’re not going to do a good job unless they have a lot of scaffolding and specific training, but we can validate that scaffolding on historical data using these historical LLMs.

So the idea is you train a model only on data up to 1930, then you ask it to predict the likelihood that it would give to a headline in 1940 or some other freeform text, and you can evaluate their likelihoods on this text in the past. And then you can also use the same scaffolding on a model trained up to 2025 and then ask it to predict headlines in 2035, and you can iterate on your scaffolding by seeing how well it does on past data.

Rob Wiblin: Yeah, Carl Shulman proposed this on the show a year and a half ago or something like that, I think. I’m so glad to see that it’s actually going ahead.

It’s very difficult to avoid data poisoning, right?

David Duvenaud: Yeah.

Rob Wiblin: You’re saying we want to train ChatGPT 1950 that only has access to any text that came up until 1950. One challenge is there might not be enough text, so it might not be a very smart model. Another thing is how do you avoid any knowledge about events that happen later sneaking back into the text unrecognised.

David Duvenaud: So that’s been the huge schlep so far, is constantly finding different sources of unintentional data poisoning and mislabeled data and things like that. So there LLMs can help you, because they’re sort of like a chicken and egg: once you have an LLM that has a rough idea of what sort of thing happened in what time, then when it sees some reference to genetic engineering in some 1930s data, it’s like, “No one used that phrase at this point.” And then you can use that to help clean the data more. But I think this is an Achilles heel of this approach.

There’s also actually another technical problem of data poisoning just through the questions you ask. So if you are just doing Metaculus-style, “Is there going to be a war between India and Pakistan this year?” it’s actually hard — because when you tune your scaffolding to go back, most of the questions you ask about you’re asking because something happened. Like imagine a future person comes back and asks me if I’m worried about Lithuania invading Canada. I’d be like, “Well, I wasn’t until you asked me.”

Rob Wiblin: Yeah. It’s a bit of a clue about how the future might have gone.

David Duvenaud: Yeah. So it’s easy to sort of unintentionally poison your, or rather incentivise your model to be the opposite of the “nothing ever happens” guy: to just be like, “Yes, whatever you’re asking, there was a 1% chance it happened.”

Rob Wiblin: How do you avoid that?

David Duvenaud: That’s one nice thing about the open-ended “just generate text” approach, because then you have to normalise over all possible newspaper headlines. So that actually already guards against this sort of validation/poisoning problem. But then that has its own problem, because the likelihood is very sensitive to styles: maybe there’s a new nickname for the president in the future, and if one model guesses it or thinks it’s plausible, another one doesn’t, and that ends up dominating the likelihood.

So there’s a bunch of interesting technical problems here. And I am a technical person and that’s actually my greatest fear is that I just end up nerd sniping myself and spending time on fun technical problems instead of the problems that matter.

Rob Wiblin: OK, so we’re designing sort of a forecasting model here, and we’re going to back-test it and say that this approach worked well when we gave it information up to 1970: it was able to predict what would happen in 1975. So we’re going to hope that a similar technique today is going to help us to see what the world will be like in five years’ time.

David Duvenaud: Exactly.

Rob Wiblin: I guess you might think the present is different than the past. Not only in the specifics, but also the rate of change or the nature of the kind of change is different than what we’re seeing in the historical sample that we have. So maybe the accuracy will not be so great in future. Is that a possibility?

David Duvenaud: Absolutely. I think everyone agrees that going forward, history is just happening faster, and the difficulty of predicting it is just going to be harder.

I guess one thing to say is that some things are sort of anti-inductive, like market prices: it’s sort of a fool’s errand to try to build this market predictor, and the finance people are already incentivised to predict prices somewhat a ways out. Then we might hope that there’s some important aspects of future history that are not so anti-inductive and that are easier to get a handle on. And I think this back testing is going to at least help us calibrate which parts of the predictions should we be more confident in versus we think they’re actually very hard to predict.

Rob Wiblin: I guess underlying the forecasting approach is the idea that smarter AI advice will help us to navigate all of this better. If we can foresee the failure modes and say, “Conditional on X happening, do you think Y is a likely outcome?” that’s going to allow us to act earlier to prevent these negative dynamics beginning and then getting reinforced.

David Duvenaud: Yeah, it’s going to help us act earlier, and it’s going to help us take costlier actions. So again, no one should, on my word for it, do some really costly thing — especially if it involves, again, attacking liberalism, which is the source of the lifeblood of everything good right now.

Though if I was a politician that happened to feel like it was important to do some costly coordination, it’s going to be much more feasible if I have this sort of neutral third party, these LLMs that everyone uses and agrees are the most sensible thing we have. It’s like, “The LLMs say, ‘If we don’t coordinate, this bad thing is going to happen.'”

Rob Wiblin: So if people want to contribute to this forecasting thing, how can they get involved? I guess Alec Radford you said is working on it. Could you just email Alec?

David Duvenaud: Email Alec, email me. It’s one of these things where it’s a very amateur hour sort of thing, and I think there should be a whole bunch of separate efforts here. Maybe we can pool effort on the data cleaning, for instance.

Rob Wiblin: It sounds like that would require substantial compute if you’re having to make sure that there are almost no errors in the labelling of these enormous corpuses of text.

David Duvenaud: Yeah. And right now we’re just doing everything with pretty small models just to get the flywheel going. I would say somebody who has time to be a real empire builder probably should take this over. And please, somebody who just, I don’t know, sold their company, please make this the new public good that you’re involved in. We would love help. It’s only a few people right now.

Rob Wiblin: For what it’s worth, I imagine that this model would have commercial value as well. People are very interested in predicting geopolitical events and economic events.

David Duvenaud: Well, I will say that one thing you always have to be careful of is you want to be doing things that aren’t otherwise incentivised to be done. So as I said, there are already incentives to forecast prices, certainly in the short term.

The thing that’s going to be very valuable is actually, as you said, action-conditional or policy-conditional forecasting: “If we take this policy or if we coordinate, then this is going to happen.” And I think that sort of forecast is going to be an undersupplied public good, so that’s why I’m not so worried about just copying the work of some other corporation.

How important is this problem compared to other AGI issues? [01:56:03]

Rob Wiblin: So one reaction you’ve had is: no, we’re not going to become gradually disempowered. Another reaction is: yes we will, and it’s going to be a good thing. I guess another reaction is it’s not going to happen because we’re going to be instantly disempowered or very rapidly disempowered — because there’s going to be a superintelligence explosion and the AI will take over, there’ll be a human coup, there’ll be a disaster that kills everyone.

How do you weigh up the importance of this set of ways that things could go bad against all the other ways that things could potentially also go bad? Or the possibility that things are actually quite boring?

David Duvenaud: I guess I’ll say I spend a bunch of time at Anthropic working on the more acute loss of control standard AI safety kind of stuff. And I am still very worried about this sort of thing. As I said, to me the modal future is we get some way along gradual disempowerment and then we screw up alignment actually, or there’s some just much faster takeover.

So I guess I’ll say in absolute terms, normal loss of control AI safety research is still massively underinvested in. In relative terms, I think this more speculative future “how do we align civilisation” question is even more underinvested in — with the major caveat that it’s just way harder to make progress on.

And in a sense it’s less neglected. One of the big things I say is what we need to do is upgrade our sense making and governance and forecasting and coordination mechanisms. All of these things need to be much better and more reliable before the writing is too much on the wall that “there’s no alpha in humans” and “don’t listen to humans” and we lose de facto power. But that’s not a very controversial thing, right? No one’s against better institutions, basically. So they’re not neglected in that sense.

What I do think is neglected, again, is thinking about this institution design, A), with LLMs as this new tool that we can use to help do a better job, and B), with this more radical futurism approach, and saying the stakes are high — it’s not just a question of do we get better outcomes on the margin; it’s more like do we get good outcomes at all?

Rob Wiblin: So what’s your breakdown of probability of doom or probability of a bad outcome from acute disempowerment versus gradual disempowerment?

David Duvenaud: Let me say first of all, by “doom” I mean something like, by 2100, the world is in a state where I can see that almost everything that I value has been destroyed. Maybe we’re not literally dead, but we’ve been forced to be uploaded in some very unfavourable conditions, where it’s just like some crappy lossy copy that never gets run. And I feel like whatever dynamics are in charge of our civilisation are just not going to optimise for anything that seems like it’s going to be valuable.

And I guess I would say something like 70% to 80%. Just because, again, we’re up against competition. I think by my standards, solving or avoiding this kind of fate looks like radically different outcomes than any other sort of being or group of beings has had in history.

From my point of view, every animal has been in a situation where it has to either evolve into something unrecognisable and sort of morally alien to it, or die. And we’re sort of by default in that situation too — and by default, we end up being replaced by something that’s more competitive than us and is probably very morally alien, and again, cares about growth and nothing important.

There’s a small chance that if we allow competition to flourish, that there’s a bunch of amazing beings having awesome lives. And I’m like, actually that’s really cool, even though I don’t get to be part of it. But I guess I’m very parochial in the sense that I’m like, me and my family, if we all die, that’s just so bad that I almost consider that doom if most of humanity is in a similar situation. So if it is just that we have runaway competition and we get replaced by some relatively interesting grey goo, I’m still like, that’s kind of doom.

Rob Wiblin: I see. And how much lower would your p(doom) be if you felt that a very dynamic future full of lots of intelligent beings, doing stuff admittedly that you presently don’t find very beautiful, if you thought that was a good future?

David Duvenaud: It’s very small. Then the fear is more like what Robin Hanson fears: that we end up locking in some very parochial set of values. And maybe it’s a matter of taste, but I still think that to me it looks like competition is probably going to win at the top level.

So this reduces to: what’s the probability that there ends up being this stable hegemon that mostly gets values wrong? I’d say that that’s only probably 5% or 10%. My p(doom) if I think that just nature flourishing or competition flourishing was valuable would probably be only 5% or 10%.

Rob Wiblin: Yeah. Why do you think that competition is going to win at the highest level? I think I probably have the reverse intuition: either that you end up with one group taking over completely, or you end up with some sort of negotiated agreement that splits the non-Earth resources in the universe, because people will anticipate that the alternative is destruction.

David Duvenaud: That’s a good question. I’ll say this is wide open in my mind, and I have only just started thinking about this question of, at the top level, what’s likely to win. I’ll just say that historically there’s been lots of empires and attempts to lock in values, and they’ve always failed. Obviously we’re going to have much stronger coordination mechanisms. But the more levels of scaffolding we add to our civilisation, also the more levels of competition there are.

Again, I have very weak intuitions here, and I feel like no one should really take my answer very seriously on this question.

Rob Wiblin: At a very zoomed-out level, as humans spread across the world, the number of independent political entities became very large. And it’s kind of consolidated progressively over time as we’ve been able to travel further in a given amount of time, we’ve been able to communicate more across different groups. You were saying earlier that we’re close to having almost like one global culture now, so there’s much more homogeneity than there previously was.

I guess that maybe is favourable for the idea that maybe we will have one governing entity over everyone that would be quite powerful. But I suppose it’s a race against time before we perhaps spread out off of Earth again.

David Duvenaud: Yeah. Although I will say “global culture” makes it sound like this is hegemony, but there’s no one steering that culture, right? So it’s not enough to become the global superpower; you also have to control this global culture that has all sorts of its own internal dynamics. And I think controlling that, it’s going to be a very expensive sort of thing.

Again, it’s like, why do humans get cancer? We have total control over every cell in our body, and we can have little police cells and immune cells. It seems unintuitive that we basically all die of cancer or old age, which is probably just like the alignment tax getting so high. So that’s some evidence that I feel sort of like life finds a way.

Rob Wiblin: Or it sounds more like death finds a way.

David Duvenaud: Yes. Because from my point of view, “life finds a way” is like precious values that are not competitive get knocked over by whatever more competitive thing comes along.

Rob Wiblin: We’re super into speculative land here. But to be more concrete in the speculation, if you have the US and China leading in the race to develop AGI and superintelligence, I suppose one story is: if you have incredibly fast takeoff or incredibly fast or recursive self-improvement, then one of them could pull very far ahead of the other, and indeed, by extension, very far ahead of everyone else. And then there’ll be the temptation for them to just grab power globally and control everyone, basically not allow any other independent political or military powers that could threaten them in future. It’s one possibility.

The other one is if the US and China remain somewhat at parity with one another, there would certainly be a temptation for them to basically split the Earth between them, or split the resources between them, and disempower everyone else if they can get away with it. I guess there are other middle powers or other regional powers that might be able to resist that to a significant extent, but I don’t know. Especially the second seems like quite a plausible pathway to me.

David Duvenaud: Well, in a sense humans have already taken over the Earth. But you’re kind of acting as this group taking over is very uniform and coherent. I guess humans are an example of a group taking over, but also constantly having infighting. Again, it’s kind of a matter of taste: maybe we feel like it’s a huge difference whether this group of humans’ values dominate versus someone else’s, but we’re already kind of in the winning scenario, because it’s at least someone’s human values.

So I guess I’ll say I’m pretty morally confused. And like I said, most people don’t have strong preferences about the far future, myself included. I think if I meditated on it more, I might become much more OK with in the future, von Neumann probes high-fiving each other as they take over the universe. It sounds like maybe that’s an awesome time, and I’m kind of happy with that.

But again, I feel like we have to screen off the worlds where we think that whatever happens, whoever lives, is going to be happy and that’s fine — because then it doesn’t really matter what we do, and I would rather worry about the worlds where it actually matters what we do.

Rob Wiblin: In that case, you really want to avoid extinction, or just destruction of complex life somehow. That’s the only really super bad scenario.

David Duvenaud: Exactly.

Improving global coordination may be our best bet [02:04:56]

Rob Wiblin: You haven’t mentioned yet improving coordination mechanisms as a way that we could avoid these negative competitive dynamics in future. It seems like a very obvious thing would be like… Actually, in the interview with Carl Shulman, like a year or two ago now, he was talking about how, if we could come up with technology where everyone could inspect some AI model, and see that it would in all circumstances follow through on some agreement that had been reached between say the US and China, confirm that to a high degree of certainty — there’s like no backdooring, there’s no secret loyalties or anything like that — then they could potentially basically give that hegemon military power that would then enforce over them and everyone this agreement indefinitely.

I guess that could be bad depending on what the agreement is, but at least it would potentially prevent destructive competition indefinitely. What do you think of developing that sort of technology?

David Duvenaud: Oh, absolutely. I think I mentioned it in passing, but that absolutely is a big part of what we can do now. And it’s maybe not even all that under-incentivised though. Even things like crypto or whatever, there’s just all these experimental things that will let us have much more powerful coordination mechanisms than we have in the future. And I think that’s a big part of avoiding doom is developing those before we are disempowered in more serious ways.

But all this is dual use, because it’s harder for humans to participate in these coordination mechanisms than AIs. So it’s not really clear which way speeding this up cuts. But in general I’d probably be in favour of developing it faster.

Rob Wiblin: Elaborate on that? How could it backfire?

David Duvenaud: The way it backfires is: think of how people are always trying to build decentralised autonomous organisations. And it doesn’t work for a number of reasons, but that’s the sort of thing where people could accidentally create these self-replicating beings or polities that get to live in some sort of margins and be really hard for us to shut down. And I think people are going to be constantly seeding the world with these attempts at self-sustaining machine life and civilisation for various reasons, like wanting to be on the right side of history.

Like every month for the foreseeable future, there’s going to be at least one attempt where someone’s like, “I gave this AI a bitcoin wallet and asked it to go make money” or something like that, and try to start these self-sustaining little societies. And to the extent that those AIs are able to just coordinate amongst themselves, that’s another source of danger.

The ‘Gradual Disempowerment Index’ [02:07:26]

Rob Wiblin: You’re working on what you call a Gradual Disempowerment Index. What is that, and why do you think it would be useful?

David Duvenaud: So I’m not doing much of this. This is Deger, Jan, some other people. The idea is to try to operationalise some concrete questions that help us at least settle the questions of, do we think that this kind of loss of control that I’m talking about is going to happen, and allow experts and superforecasters to weigh in in really concrete ways. Right now, if they want to disagree, they have to write their own paper or come on a podcast. I would love it if we really felt like there were calibrated markets about all these questions.

It’s pretty tough though, as I mentioned, because you have to define what it means for a group of people to want something and not be able to get it, or something like that. And if you try to look at some more concrete things…

So one of my suggestions was: if sexually transmitted diseases rate goes much smaller, that could be a sign of human disempowerment — because we’re all so atomised and socially incompetent or whatever that we’re just in our VR pleasure pods and in some wretched life, or it could be that we had some awesome public health breakthroughs. Or it could be that we are in these pods having such amazing fulfilling relationships with some hive mind or whatever that our lives are much better.

So it probably ends up being a matter of taste. Also, a lot of worlds where we’re actually disempowered, it’s pretty hard to tell until the day. It’s kind of like the turkey that gets fed by the farmer every day. It’s like, I’m getting evidence that the farmer is aligned with me all the time, but it’s actually also evidence that it’s being fattened up to be eaten.

Rob Wiblin: Yeah. I guess an example of that would be, inasmuch as humans are using AI tools to make better decisions, is that empowerment or is that disempowerment? It’s a bit ambiguous which direction you’re going in. And I guess it could be empowerment to start with and then disempowerment later.

David Duvenaud: Exactly. So the obvious things sort of cut both ways, and good empowerment looks a lot like bad disempowerment. And I think even just defining what we mean by having agency in some group setting is one of the big blockers.

Rob Wiblin: The trouble in my mind is there’s so many different ways that things could play out that it’s very difficult to find any objective measure, like GDP or life satisfaction or something that a statistical agency would collect that would say, “This is definitely going to be going down if there’s gradual disempowerment.”

David Duvenaud: Yeah. And the other thing is that I think most of the measures we care about are exactly the things that are going to be hacked in these future high-stakes fights over welfare.

So if I think I want to measure whether the share of GDP going to humans is more than 10% or something, the whole problem in the future is that a bunch of weird AI-human hybrids are going to be claiming that they deserve to be defined as human. So if they win that war, then maybe they’d end up getting a huge fraction of GDP, but it’s actually all going to these machine imposters or whatever. Or maybe we do create amazing successors that we think of them as human, and they actually do deserve part of GDP.

This question doesn’t really answer this question of, do we feel like we’ve been hacked by our current standards, or did we actually successfully build even better beings that we’re supporting?

The government will fight to write AI constitutions and system prompts [02:10:33]

Rob Wiblin: You’ve written that you think AI constitutions are very important, and much more important than most people appreciate at the moment. What is an AI constitution and why does it matter?

David Duvenaud: Different companies have different names for this. I think OpenAI talks about the Model Spec and Anthropic talks about Constitutional AI. The idea being that a lot of the alignment that happens for the AI is you ask it to follow some principles — like be nice to the user, don’t do something illegal — and practice producing outcomes that follow this constitution. The system prompt matters a lot. It’s just like what values are being loaded into the AI.

Rob Wiblin: Do you think there’s going to be a moment when the penny will drop, and people will think it’s way more important to have control over the post-training stage of AIs, and it’s going to be incredibly important for society as a whole what sort of system prompt is put into things like Claude and ChatGPT?

David Duvenaud: Yeah. I mean, there was the executive order a few weeks ago against woke AI. I think people are already recognising that this is an important cultural battleground.

Part of the way that I got started on this whole journey was asking people at different labs, “So we build the aligned AGI: how does it end up that it’s running the country or the world or something? It seems like the government’s going to ask us for the version of the AI that never refuses to do one of its orders, and we end up being forced one way or another to align the AI to the government.” So I’m hoping not to hyperstition this into being, but I feel like it’s just obvious that this is going to become one of the most important cultural and political battlegrounds that people fight over.

Rob Wiblin: Does this also make you more enthusiastic about open source AI, so people can put in their own system prompt and do their own fine-tuning?

David Duvenaud: Yeah, it definitely does. That’s maybe the only time we’ll expect to see actually user-aligned AIs: when people actually physically own them.

Rob Wiblin: I guess you could have competition between countries as well, potentially. So you could have some countries that basically impose a system prompt on all the AIs that are being operated in their country. But if you could access the Russian model, or the German model where the government has taken less of an interest, or alternatively they’ve given it a different system prompt in order to mess with other countries.

David Duvenaud: Right. Or Ireland famously competes on low corporate tax rates. So we could imagine some future Ireland saying, “You get to write your own system prompt in our country” or something like that. So that is another reason for hope, is that people will demand this kind of freedom. But even the Irelands of the future will probably have a bit in there like, “Don’t conspire against Ireland” or whatever.

Rob Wiblin: I see. Well, to some extent, I guess there’s some things in the system prompt that the government might impose that are good. I’ve done a previous interview where someone was saying, why don’t we just demand lawful AI, at least at a bare minimum, where the system prompt says, “Don’t break the law yourself, and if someone asks you to help them break the law, then don’t help them.” I guess there’s a sense in which that’s obviously really good. But then maybe it should make us queasy, the idea of a government going in and basically writing what assistance you can get with anything.

David Duvenaud: Oh, exactly. I think hate speech laws are the classic example of there’s someone who realises that what’s lawful is a really important lever, and then they basically say you’re not allowed to criticise this group. And that’s such an important lever of political control that I think if we have lawful AI, then the fight is going to be who is —

Rob Wiblin: Over what the laws are.

David Duvenaud: What the laws are — and those laws are going to refer specifically to what groups you’re allowed to organise with or against or something like that.

Rob Wiblin: OK. So that’s why AI constitutions are going to be important. Is there anything to be done on this today?

David Duvenaud: Well, the reason why I feel like it’s important to talk about is just because I think one of the big things we need to be preparing for is: we have all these constitutional productions for the real constitutions, like everyone realises it’s a super serious business — whether you change the constitution is kind of like your permanent path to power or disempowerment or whatever. Like people take the second amendment in the States super seriously, and I think they’re right to, because of the potential of long-term tyranny. So I basically want to see the same seriousness in the handling of the AI value loading.

Rob Wiblin: Yeah. I guess the way that we’ve tried to safeguard at the written constitution stage is making it very difficult to change. You bring everyone together and then try to agree on a constitution that a supermajority is in favour of, and then you make it really difficult to change it. I guess it’s a very difficult balance to say how difficult is the right amount of difficult to change it.

But we’re not going to be able to do that with AI, because it’s changing so quickly, and they’re always altering the system prompt. If we passed a law saying the system prompt for everything has to be exactly this and we can’t change it…

David Duvenaud: But I think we will. I think there absolutely is exactly this sort of thing that governments are going to pass a law. I think it could even be done at first for like pandering: like instead of me saying I’m going to represent this constituency by passing some complicated law where they get subsidies for their favourite industry or whatever, I can just say I’m going to add a line to the system prompt that the AI should put this group first or something like that.

So for a while I think it’ll be this terrible political football, and we’ll end up with these horrible kludge system prompts. The hope is just that at some point everyone serious starts to realise that we need to deliberate about this about as hard as we did with the actual constitutions of countries.

Rob Wiblin: You’re really making me love open source AI, David. I think I’ve never felt so enthusiastic about it as this minute.

David Duvenaud: Well, exactly. I mean, I’m also sort of reflexively fearful of the government. I hope that came across in all this.

Rob Wiblin: I guess it’s a difficult challenge because it also creates some problems, but do you think we’ll navigate that one reasonably or…?

David Duvenaud: Not really. That’s probably why my p(doom) is so high, basically.

Rob Wiblin: The open source versus non-open source thing? Or you expect excessive government control?

David Duvenaud: Yeah.

Rob Wiblin: So you got less worried about the bioweapons stuff?

David Duvenaud: I’m very worried about bioweapons, and I think that that’s going to be just like a good justification for government control that’s nonetheless going to be excessive.

Rob Wiblin: So I suppose the idea will be, again, a keyhole solution that fixes the specific problems with open source AI, like the ways in which it’s most dangerous, and lets the other problems slide — and then we use it as much as possible.

David Duvenaud: Yeah, if there is such a technical solution, that would be a total bonanza. I would be over the moon, and that would lower my p(doom) substantially. Specifically, if we could find a way to say everyone gets open source AI — somehow we’re disabling them from doing basically terrorism and other very destructive power grabs, but otherwise they get to actually serve users — I would be like, yay, I want to live in that world.

“The intelligence curse” and Workshop Labs [02:16:58]

Rob Wiblin: There was another essay earlier this year, I think, “The intelligence curse” by Luke Drago and Rudolf Laine. It was pointing to similar themes, similar dynamics to the ones that you’re talking about — especially inasmuch as humans are not doing useful work, how do they maintain economic or political influence over things? Yeah, it’s quite interesting.

I can’t remember whether it’s Luke or Rudolf or both of them who launched Workshop Labs, kind of a new project that I think is focused on this intelligence curse issue. You’re an advisor, right?

David Duvenaud: Yes.

Rob Wiblin: What are they up to? How do they think it’s going to help?

David Duvenaud: So they’ve raised money, they’re hiring, and their basic pitch is: we’re going to avoid the gradual disempowerment, intelligence curse dynamics by giving people control over their own automation.

The service that they’re planning to provide is: you upload your data to some sort of secure private enclave. We will host it. We will also handle the fine-tuning to give you this personalised assistant that basically is sort of a digital clone of you, or like an assistant that knows all of your context so it can help you.

It’s in contrast to Mechanize who are saying, “Let’s build this global thing where we just learn all the skills” and then the return on capital is much more concentrated. The hope, the value pitch for the public good, is that if people follow the Workshop Labs model, there’s a bunch of people who each control their own means of production to a greater extent.

Rob Wiblin: OK. I think you and I agree that in the long term, humans end up fully substituted by AI, but in the short to medium term, there’s a question of are they more substituted or more complemented? And that would depend in part on what technologies we develop first, and whether we try really hard to make them substitutes or make them complements.

You’re saying Workshop Labs is an attempt to push on the complementarity: let’s try to make them as complementary and get them to work together productively as much as possible, so we can kind of extend the complementary era.

David Duvenaud: Well, not quite. Because I also think that if you end up being replaced by your own clone, you’re at least getting to control it, and you own the IP.

Rob Wiblin: OK, so even if it’s not flesh-and-blood me doing it, I’ve created an AI avatar of me that would do similar things to what I would do, but faster and better and more precisely. And so that’s the thing that I would unleash on the world. But I suppose I would have to pay for it to be operating, for it to have compute.

David Duvenaud: Yes, exactly. But the idea is you would just pay them the cost of hosting and then you would actually get the wages that your digital clone is earning or whatever it’s doing.

Rob Wiblin: But as brilliant as I am, David, I’m not sure that training an AI to mimic me is better than training it just to be more intelligent in general. So how would Robert AI compete with just the smartest AGI?

David Duvenaud: I think that’s a great question, and I do think that in the long term you just want to have the centralised AI that is really smart and knows everything, and the idea is just that. But there is this very distributed economic value we all have in our current little bits of knowledge about our own businesses or our own personal relationships or whatever. And at least we can capture that value for a bit longer than if the question was just, do you outsource to ChatGPT or not?

Rob Wiblin: OK, so again, it’s maybe a long term versus short term story a bit — where if we could come up with an AI model that in particular replicated my style of work and my knowledge, all of my context, my experience, then at least in the immediate term, I think 80,000 Hours would be pretty excited about that, because they could get more of the kind of work that they’re currently paying me to do.

In the long term, that will probably be outcompeted by GPT-9 or whatever it might be. But nonetheless, I guess this might extend the era where me and an AI can work together productively. And also even past that point, even after the point where flesh-and-blood me is not so useful anymore, Robert AI, with all of the experience and all the context that it’s built up over time, could have another good couple of years earning a wage.

David Duvenaud: Yeah, and maybe the base model thing is a bit of a distraction, because you could imagine we get to fine-tune much more powerful base models. And then the point is that just having had all your data being uploaded affords this. But I agree that in the long term the value of that data kind of goes to zero.

Rob Wiblin: So this is a bit of a story where you’re imagining you want a very powerful general AI base model. However, there’s a lot of different tasks in the economy; there’s a lot of different specialised knowledge that you might want to have, roles in organisations, different personalities that could be useful in different organisations and circumstances. So training them to mirror a whole lot of different people that might have a useful niche in the economy, that might be a phase that we go through where that is the optimal application that the companies might want, or the optimal way of using the base models they might want.

David Duvenaud: Yeah, exactly.

Rob Wiblin: Is that true?

David Duvenaud: Well, it’s not clear. It’s not clear. I mean, the fear of course is that they end up speeding up gradual disempowerment. Because if you’re increasing the rate at which everyone is making digital clones of themselves, then maybe in a sense you’re contributing to the race to the bottom.

So I’ve said to Luke and Rudolf, “I think you guys are going to be good for the world in the sense that Anthropic is — where you get to be the ones that say, those other guys are maniacs; the Mechanize people are really just trying to take away everyone’s bargaining power and their own value adds. So you guys should all be mad at them. But it’s also bad in the same way that Anthropic is bad, where they’re doing almost the same thing, and they’re sort of affording many of the same dangers in the long run.”

So they’re trying to say, “Let’s try to come up with some creative mechanism to bind ourselves to only do the good thing.” It’s not clear what that mechanism could look like, though.

Rob Wiblin: OK, well I guess that’s a slightly mixed pitch, but if people want to learn more, it’s workshoplabs.ai, and I think Luke and Rudolf are probably pretty interested in getting emails from people who would like to have their input or potentially be collaborators or investors or donors.

David Duvenaud: Absolutely, yeah.

Mapping out disempowerment in a world of aligned AGIs [02:22:48]

Rob Wiblin: Is there any other research that’s interesting that you would like to shout out before we finish?

David Duvenaud: I guess one of the other things that we’re trying to flesh out is actually the details of, wait, how do we actually lose control if we have the aligned AGIs? And I’ve tried to make this pitch, but obviously we’ve just started thinking about this.

So actually, me and my coauthors have been working with a MATS scholar, Gideon Futerman, who’s been trying to make a more detailed case of like, “Here are all the avenues by which we’re in a situation where we actually could shut down the AI or modify it or whatever, but we end up building institutions that don’t serve us.” Trying to flesh that out, I feel like there should be a lot more work in that direction.

Rob Wiblin: I guess many listeners work at AI companies. Do the companies or the staff there have much of an opportunity to address these issues, or is it operating at a different scale that it’s pretty difficult for any company to move the needle here?

David Duvenaud: I think it’s basically beyond their scope to address, but it’s not beyond their scope to monitor and help us understand what’s happening. I really like the Anthropic Economic Index, where they’re trying to say what jobs are people actually doing and how are these models being used. I think more of that from more companies and more expensive is going to actually help people understand what’s happening and just these dynamics a little bit better.

I will say that of all the people that I talk to, in general, people are sympathetic to this. Some people are a bit head in the sand or dismissive, but I think a lot of people are just like, yeah, that’s a huge problem.

It’s not really clear how a single company can deal with it. And they end up doing this thing where the RSP or whatever safety commitments address these very acute catastrophic risks, and then there’s just a whole bunch of slower, more systemic ways that things go wrong that it’s not clear. Like what if everyone’s getting AI girlfriends and boyfriends? How are they supposed to address that? It’s clearly just not within their scope.

Rob Wiblin: I think some people’s reaction to this is, why do you think that humanity or our countries are so stupid that they will kind of sleepwalk into it? This will begin to happen. It’s developing over time. It’s a gradual thing that’s happening over years. Wouldn’t we take more decisive action to prevent it?

I guess part of your story is that all of these things interlock. So it’s like people are gradually becoming a bit enfeebled, they’re becoming poorer relative to other entities, they’re losing their political influence potentially. They’re not as culturally influential either, because frankly their output is not… The podcasts that they’re making are just not at the cutting edge anymore. Is there more to say to it than that?

David Duvenaud: Sure. I think the best example of this is just all of the AI lab leaders being convinced by this story that this is an existential risk and saying, “So now I need to join the race and build my own AI lab just to even matter.”

And also I think a lot of people ask, in the future, why are the oligarchs not coordinating? Well, in particular, the reason that xAI and OpenAI and Anthropic all exist is because these particular oligarchs actually failed to get along on exactly this topic.

Rob Wiblin: Yeah, they dislike one another on a personal level on top of everything else.

David Duvenaud: Yeah, I mean, well, I don’t know what their relationships are in much detail, but I’ll just say these are the exact people that we were hoping will coordinate on exactly this issue — and they already exactly are failing to do so right now.

Rob Wiblin: How about just not building AGI or delaying the day that we build AGI as an approach to handling all this? Waiting until we’ve done a whole lot more work to figure out how to avoid gradual disempowerment, among other problems?

David Duvenaud: Yeah, we’ve thought a lot about this, and my coauthor David Krueger wears T-shirts at NeurIPS that say “Just don’t build AGI” and gives out stickers and stuff like that. I think people under-discuss this possibility because it’s not very fun. It’s very wealth destroying. You’re not part of the cool kids club if you’re not building.

And I’m one of these people that just loves to build. We mentioned this project on LLM forecasting, for instance. But I think lately we’ve all agreed that we should make a habit of saying that if we could all coordinate to delay building much more powerful superhuman systems, that would be a good thing.

I think the realistic way this would go is that this would end up consolidating most of the private efforts into some government programmes. Again, I fear the government, and I think this is very scary for a number of reasons, but I think it’s probably still barely on the margin of positive development. I don’t think it’s very feasible, but I want to mention in the Kickstarter spirit, if everyone else agrees, I would also think this is a good effort to support.

Rob Wiblin: It sounds like there were two arguments. One is it’s not realistically going to happen: the companies are not going to all come together and agree not to build AGI, or even really to slow it down very much. So it’s not a big focus just because of its unlikeliness. But even if they did, potentially they’re just opening the field for the government to run an AGI superintelligence programme and develop its own government-aligned AGI first, which is perhaps not even any better.

David Duvenaud: Yeah, exactly. And basically harder to recover from if it ends up being captured by someone who doesn’t care about various forms of safety or the public good or something like that.

Rob Wiblin: Yeah. If there are warning shots, things start going badly with AI in some dimensions, or there’s very scandalous outcomes, could you see there being a big sea change in public opinion that could lead to significant slowdowns?

David Duvenaud: Yeah, definitely. The fear though is that we end up with some sort of warning shot that doesn’t unite people, and it actually polarises people. Jascha Sohl-Dickstein is somebody who has discussed this, and mentioned that there’s a lot of disasters that look like social disasters where afterwards everyone can agree that something bad happened, but they totally disagree on what it was.

Maybe COVID is a perfect example, where half of people think COVID was such a disaster because the public health people didn’t go far enough and they didn’t control the virus and people weren’t on board with all the public health measures and stuff like that. And half of people think Covid was a disaster because public health totally overreacted and the government had all these authoritarian, pointless crackdowns that were so destructive. So we can all agree there’s a warning shot, but half of us think the answer is dismantle public health, and half of us think that the answer is strengthen public health.

And I think you could easily imagine a lot of warning shots where there’s, I don’t know, an AI influencer that makes some sort of destructive cult and gets shut down by the internet. And then half of people think that MechaInfluencer didn’t go far enough, and half of them think MechaInfluencer went too far. So half of the people think that we need to protect future AI cults or whatever we call them, and half of the people think we need to shut them down.

Rob Wiblin: Sounds pretty stupid, but also very possible.

What do David’s CompSci colleagues think of all this? [02:29:19]

Rob Wiblin: One thing I’m curious about is: you’re a professor of computer science, and your background is in ML, right? So you did more technical work before.

David Duvenaud: Exactly.

Rob Wiblin: I feel like whatever this is, it doesn’t feel like classic computer science. I guess you have tenure now, so perhaps you don’t have to worry quite as much as you used to about the opinions of your colleagues. But what’s the reception among academics of you doing this kind of work? And your faculty, do they love it? Do they love the attention it gets? Or are they frustrated?

David Duvenaud: One thing I’ll say is that I think tenure is delivering on this exact reason that it’s supposed to be there: to allow somebody such as myself to work on some crazy direction that most people disagree with and maybe I don’t even have particular expertise in. And it’s pretty explicitly encouraged in my institution, and I think most institutions, to use this freedom to try to do something a little bit bigger and crazy.

Rob Wiblin: The system works.

David Duvenaud: Yeah, the system works. I mean, I have a lot of problems with universities and academia, but in this aspect they’re like covering themselves in glory.

But as for my actual colleagues, there’s been kind of a selection effect where my colleagues that sort of buy that AGI is a big deal and possible and maybe not that far away are all at labs, like my colleague Jimmy Ba is one of the cofounders of xAI. A bunch of my former students are at Anthropic and xAI and Google and stuff like that.

So there’s sort of evaporative cooling, where the people that are left that are still doing something related to ML are the ones that in my view are very head in the sand and dismissive and should know better, but are saying things like, “I’ve been working with AI for a long time, and it’s harder to make these things agentic than you think.” And just disbelieving that there’s ever going to be this physical artefact anytime soon that has all the dynamism and abilities that humans have.

Rob Wiblin: And that’s basically a selection effect that everyone who didn’t think that has left academia and is making bank.

David Duvenaud: Maybe. I mean, the thing is that I also hear this from other colleagues in other faculties, like philosophy or stats or economics. I think just in general, for some reason, most academics think like, “But how could it ever do my job?” And I also hear this from regular people. I think it’s a very common reaction.

Rob Wiblin: Yeah. Well, I think those folks are in for a little bit of a surprise if they think that AI is not going to be able to contribute to maths or philosophy.

David Duvenaud: Yeah. Ilya Sutskever actually just got an honorary degree at UToronto last month, and he said something like, “You might not be interested in AI, but AI is interested in you.” Something like that.

Rob Wiblin: For better or worse. My guest today has been David Duvenaud. Thanks so much for coming on The 80,000 Hours Podcast, David.

David Duvenaud: My pleasure.

Learn more

Gradual disempowerment

Extreme power concentration

Risks from power-seeking AI systems

AI governance and policy

Related episodes

April 16, 2025

#215 – Tom Davidson on how AI-enabled coups could allow a tiny group to seize power

Listen now

May 5, 2023

#150 – Tom Davidson on how quickly AI could transform the world

Listen now

June 27, 2024

#191 – Carl Shulman on the economy and national security after AGI (Part 1)

Listen now

July 5, 2024

#191 – Carl Shulman on government and society after AGI (Part 2)

Listen now

March 11, 2025

#213 – Will MacAskill on AI causing a “century in a decade” — and how we’re completely unprepared

Listen now

August 23, 2023

#161 – Michael Webb on whether AI will soon cause job loss, lower incomes, and higher inequality — or the opposite

Listen now

August 28, 2025

#221 – Kyle Fish on the most bizarre findings from 5 AI welfare experiments

Listen now

October 20, 2025

#225 – Daniel Kokotajlo on what a hyperspeed robot economy might look like

Listen now

About the show

The 80,000 Hours Podcast features unusually in-depth conversations about the world's most pressing problems and how you can use your career to solve them. We invite guests pursuing a wide range of career paths — from academics and activists to entrepreneurs and policymakers — to analyse the case for and against working on different issues and which approaches are best for solving them.

Get in touch with feedback or guest suggestions by emailing [email protected].

What should I listen to first?

We've carefully selected 10 episodes we think it could make sense to listen to first, on a separate podcast feed:

Check out 'Effective Altruism: An Introduction'

Subscribe here, or anywhere you get podcasts:

If you're new, see the podcast homepage for ideas on where to start, or browse our full episode archive.

On this page:

The episode in a nutshell

Economic disempowerment: From unemployment to resource competition

Political disempowerment: States no longer need their citizens

Cultural disempowerment: The rise of machine-to-machine memes

Why alignment doesn’t save us

What can be done?

Highlights

Humans will become "meddlesome parasites" in a machine economy

Humans will become a "criminally decadent" waste of energy

Political disempowerment: Governments stop needing people

The death of liberalism?

Is humans losing control actually bad, ethically?

How important is this problem compared to other AGI issues?

Do we know anything useful to do about this?

Articles, books, and other media discussed in the show

Transcript

Cold open [00:00:00]

Who’s David Duvenaud? [00:00:50]

Alignment isn’t enough: we still lose control [00:01:30]

Smart AI advice can still lead to terrible outcomes [00:14:14]

How gradual disempowerment would occur [00:19:02]

Economic disempowerment: Humans become “meddlesome parasites” in a machine economy [00:22:05]

Humans become a “criminally decadent” waste of energy [00:29:29]

Is humans losing control actually bad, ethically? [00:40:36]

Political disempowerment: Governments stop needing people [00:57:26]

Can human culture survive in an AI-dominated world? [01:10:23]

Will the future be determined by competitive forces, or universal coordination? [01:26:51]

Can we find a single good post-AGI equilibria for humans? [01:34:29]

Do we know anything useful to do about this? [01:44:43]

How important is this problem compared to other AGI issues? [01:56:03]

Improving global coordination may be our best bet [02:04:56]

The ‘Gradual Disempowerment Index’ [02:07:26]

The government will fight to write AI constitutions and system prompts [02:10:33]

“The intelligence curse” and Workshop Labs [02:16:58]

Mapping out disempowerment in a world of aligned AGIs [02:22:48]

What do David’s CompSci colleagues think of all this? [02:29:19]

Learn more

Gradual disempowerment

Extreme power concentration

Risks from power-seeking AI systems

AI governance and policy

Related episodes

About the show

What should I listen to first?