#152 – Joe Carlsmith on navigating serious philosophical confusion
#152 – Joe Carlsmith on navigating serious philosophical confusion
By Robert Wiblin, Luisa Rodriguez and Keiran Harris · Published May 19th, 2023
On this page:
- Introduction
- 1 Highlights
- 2 Articles, books, and other media discussed in the show
- 3 Transcript
- 3.1 Rob's intro [00:00:00]
- 3.2 The interview begins [00:09:21]
- 3.3 Downsides of the drowning child thought experiment [00:12:24]
- 3.4 Making demanding moral values more resonant [00:24:56]
- 3.5 The crazy train [00:36:48]
- 3.6 Whether we're living in a simulation [00:48:50]
- 3.7 Reasons to doubt we're living in a simulation, and practical implications if we are [00:57:02]
- 3.8 Decision theory and affecting the past [01:23:33]
- 3.9 Newcomb's problem [01:46:14]
- 3.10 Practical implications of acausal decision theory [01:50:04]
- 3.11 The hitchhiker in the desert [01:55:57]
- 3.12 Acceptance within philosophy [02:01:22]
- 3.13 Infinite ethics [02:04:35]
- 3.14 Infinite ethics and the utilitarian dream [02:27:42]
- 3.15 What to do with all of these weird philosophical ideas [02:35:28]
- 3.16 Welfare longtermism and wisdom longtermism [02:53:23]
- 3.17 Epistemic learned helplessness [03:03:10]
- 3.18 Power-seeking AI [03:12:41]
- 3.19 Rob's outro [03:25:45]
- 4 Learn more
- 5 Related episodes
What is the nature of the universe? How do we make decisions correctly? What differentiates right actions from wrong ones?
Such fundamental questions have been the subject of philosophical and theological debates for millennia. But, as we all know, and surveys of expert opinion make clear, we are very far from agreement. So… with these most basic questions unresolved, what’s a species to do?
In today’s episode, philosopher Joe Carlsmith — Senior Research Analyst at Open Philanthropy — makes the case that many current debates in philosophy ought to leave us confused and humbled. These are themes he discusses in his PhD thesis, A stranger priority? Topics at the outer reaches of effective altruism.
To help transmit the disorientation he thinks is appropriate, Joe presents three disconcerting theories — originating from him and his peers — that challenge humanity’s self-assured understanding of the world.
The first idea is that we might be living in a computer simulation, because, in the classic formulation, if most civilisations go on to run many computer simulations of their past history, then most beings who perceive themselves as living in such a history must themselves be in computer simulations. Joe prefers a somewhat different way of making the point, but, having looked into it, he hasn’t identified any particular rebuttal to this ‘simulation argument.’
If true, it could revolutionise our comprehension of the universe and the way we ought to live.
The second is the idea that “you can ‘control’ events you have no causal interaction with, including events in the past.” The thought experiment that most persuades him of this is the following:
Perfect deterministic twin prisoner’s dilemma: You’re a deterministic AI system, who only wants money for yourself (you don’t care about copies of yourself). The authorities make a perfect copy of you, separate you and your copy by a large distance, and then expose you both, in simulation, to exactly identical inputs (let’s say, a room, a whiteboard, some markers, etc.). You both face the following choice: either (a) send a million dollars to the other (“cooperate”), or (b) take a thousand dollars for yourself (“defect”).
Joe thinks, in contrast with the dominant theory of correct decision-making, that it’s clear you should send a million dollars to your twin. But as he explains, this idea, when extrapolated outwards to other cases, implies that it could be sensible to take actions in the hope that they’ll improve parallel universes you can never causally interact with — or even to improve the past. That is nuts by anyone’s lights, including Joe’s.
The third disorienting idea is that, as far as we can tell, the universe could be infinitely large. And that fact, if true, would mean we probably have to make choices between actions and outcomes that involve infinities. Unfortunately, doing that breaks our existing ethical systems, which are only designed to accommodate finite cases.
In an infinite universe, our standard models end up unable to say much at all, or give the wrong answers entirely. While we might hope to patch them in straightforward ways, having looked into ways we might do that, Joe has concluded they all quickly get complicated and arbitrary, and still have to do enormous violence to our common sense. For people inclined to endorse some flavour of utilitarianism, Joe thinks ‘infinite ethics’ spell the end of the ‘utilitarian dream‘ of a moral philosophy that has the virtue of being very simple while still matching our intuitions in most cases.
These are just three particular instances of a much broader set of ideas that some have dubbed the “train to crazy town.” Basically, if you commit to always take philosophy and arguments seriously, and try to act on them, it can lead to what seem like some pretty crazy and impractical places. So what should we do with this buffet of plausible-sounding but bewildering arguments?
Joe and Rob discuss to what extent this should prompt us to pay less attention to philosophy, and how we as individuals can cope psychologically with feeling out of our depth just trying to make the most basic sense of the world.
In the face of all of this, Joe suggests that there is a promising and robust path for humanity to take: keep our options open and put our descendants in a better position to figure out the answers to questions that seem impossible for us to resolve today — a position he calls “wisdom longtermism.”
Joe fears that if people believe we understand the universe better than we really do, they’ll be more likely to try to commit humanity to a particular vision of the future, or be uncooperative to others, in ways that only make sense if you were certain you knew what was right and wrong.
In today’s challenging conversation, Joe and Rob discuss all of the above, as well as:
- What Joe doesn’t like about the drowning child thought experiment
- An alternative thought experiment about helping a stranger that might better highlight our intrinsic desire to help others
- What Joe doesn’t like about the expression “the train to crazy town”
- Whether Elon Musk should place a higher probability on living in a simulation than most other people
- Whether the deterministic twin prisoner’s dilemma, if fully appreciated, gives us an extra reason to keep promises
- To what extent learning to doubt our own judgement about difficult questions — so-called “epistemic learned helplessness” — is a good thing
- How strong the case is that advanced AI will engage in generalised power-seeking behaviour
Get this episode by subscribing to our podcast on the world’s most pressing problems and how to solve them: type ‘80,000 Hours’ into your podcasting app. Or read the transcript below.
Producer: Keiran Harris
Audio mastering: Milo McGuire and Ben Cordell
Transcriptions: Katy Moore
Highlights
The deterministic twin prisoner's dilemma
Joe Carlsmith: The experiment that convinces me most is: Imagine that you are a deterministic AI system and you only care about money for yourself. So you’re selfish. There’s also a copy of you, a perfect copy, and you’ve both been separated very far away — maybe you’re on spaceships flying in opposite directions or something like that. And you’re both going to face the exact same inputs. So you’re deterministic: the only way you’re going to make a different choice is if the computers malfunction or something like that. Otherwise you’re going to see the exact same environment.
In the environment, you have the option of taking $1,000 for yourself: we’ll call that “defecting” — or giving $1 million to the other guy: we’ll call that “cooperating.” The structure is similar to a prisoner’s dilemma. You’re going to make your choice, and then later you’re going to rendezvous.
So what should you do? Well, here’s an argument that I don’t find convincing, but that I think would be the argument offered by someone who thinks you can only control what you can cause. The argument would be something like: your choice doesn’t cause that guy’s choice. He’s far away; maybe he’s lightyears away. You should treat his choice as fixed. And then whatever he chooses, you get more money if you defect. If he defects, then you’ll get nothing by cooperating and $1,000 by defecting. If he sends the money to you, then you’ll get $1.001 million by defecting and $1 million by cooperating. No matter what, it’s better to defect. So you should defect.
But I think that’s wrong. The reason I think it’s wrong is that you are going to make the same choice. You’re deterministic systems, and so whatever you do, he’s going to do it too. In fact, in this particular case — and we can talk about looser versions where the inputs aren’t exactly identical — the connection between you two is so tight that literally, if you want to write something on your whiteboard, he’s going to write that too. If you want him to write on his whiteboard, “Hello, this is a message from your copy,” or something like that, you can just write it on your own whiteboard. When you guys rendezvous, his whiteboard will say the thing that you wrote. You can sit there going, “What do I want?” You really can control what he writes. If you want to draw a particular kitten, if you want to scribble in a certain way, he’s going to do that exact same thing, even though he’s far away and you’re not in causal interaction with him.
To me, I think there’s just a weird form of control you have over what he does that we just need to recognise. So I think that’s relevant to your decision, in the sense that if you start reaching for the defect button, you should be like, “OK, what button is he reaching for right now?” As you move your arm, his arm is moving with you. And so you reach for the defect, he’s about to defect. You could basically be like, “What button do I want him to press?” and just press it yourself and he’ll press it. So to me, it feels pretty easy to press the “send myself $1 million” button.
Newcomb's problem
Joe Carlsmith: The classic thought experiment that people often focus on, though I don’t think it’s the most dispositive, is this case called Newcomb’s problem, where Omega is this kind of superintelligent predictor of your actions. Omega puts you in the situation where you face two boxes: one of them is opaque, one of them is transparent. The transparent box has $1,000, the opaque box has either $1 million or nothing.
Omega puts $1 million in the box if Omega predicts that you will take only the opaque box and leave the $1,000 alone (even though you can see it right there). And Omega puts nothing in the opaque box if Omega predicts that you will take both boxes.
So the same argument arises for the causal decision theory (CDT). For CDT, the thought is: you can’t change what’s in the boxes; the boxes are already fixed. Omega already made her prediction. And no matter what, you’ll get more money if you take the $1,000. If there was some dude over there who could see the boxes, and you were like, “Hey, see what’s in the box, and what choice will give me more money?” — you don’t even need to ask, because you know it’s always just take the extra $1,000.
But I think you should one-box in this case, because I think if you one-box then it will have been the case that Omega predicted that you one-boxed, because Omega is always right about the predictions, and so there will be the million.
I think a way to pump this intuition for me that matters is imagining doing this case over and over with Monopoly money. Each time, I try taking two boxes and I notice the opaque box is empty. I take one box, opaque box is full. I do this over and over. I try doing intricate mental gymnastics. I do like a somersault, I take the boxes. I flip a coin and take the box — well, flipping a coin, Omega has to be really good, so we can talk about that.
If Omega is sufficiently good at predicting your choice, then just like every time, what you eventually will learn is that you effectively have a type of magical power. Like I can just wave my arms over the opaque box and say, “Shazam! I hereby declare that this box shall be full with $1 million. Thus, as I one-box, it is so.” Or if I can be like, “Shazam! I declare that the box shall be empty. Like thus, as I two-box, it is so.” I think eventually you just get it in your bones, such that when you finally face the real money, I guess I expect this feeling of like, “I know this one, I’ve seen this before.” I kind of know what’s going to happen at some more visceral expectation level if I one-box or two-box, and I know which one leaves me rich.
The idea of 'wisdom longtermism'
Joe Carlsmith: In the thesis, I have this distinction between what I call “welfare longtermism” and “wisdom longtermism.”
Welfare longtermism is roughly the idea that our moral focus should be on specifically the welfare of the finite number of future people who might live in our lightcone.
And wisdom longtermism is a broader idea that our moral focus should be reaching a kind of wise and empowered civilisation in general. I think of welfare longtermism as a lower bound on the stakes of the future more broadly — at the very least, the future matters at least as much as the welfare of the future people matters. But to the extent there are other issues that might be game changing or even more important, I think the future will be in a much better position to deal with those than we are, at least if we can make the right future. …
There’s a line in Nick Bostrom’s book Superintelligence about something like, if you’re digging a hole but there’s a bulldozer coming, maybe you should wonder about the value of digging a hole. I also think we’re plausibly on the cusp of pretty radical advances in humanity’s understanding of science and other things, where there might be a lot more leverage and a lot more impact from making sure that the stuff you’re doing matters specifically to how that goes, rather than to just kind of increasing our share of knowledge overall. You want to be focusing on decisions we need to make now that we would have wanted to make differently.
So it looks good to me, the focus on the long-term future. I want to be clear that I think it’s not perfectly safe. I think a thing we just generally need to give up is the hope that we will have a theory that makes sense of everything — such that we know that we’re acting in the safe way, that it’s not going to go wrong, and it’s not going to backfire. I think there can be a way that people look to philosophy as a kind of mode of Archimedean orientation towards the world — that will tell them how to live, and justify their actions, and give a kind of comfort and structure — that I think at some point we need to give up.
On the classic drowning child thought experiment
Joe Carlsmith: I think what that can do is sort of break your conception of yourself as a kind of morally sincere agent — and at a deeper level, it can break your conception of society and your peers, or society as a morally sincere endeavour, in some sense. Things can start to seem kind of sick at their core, and we’re just all looking away from the sense in which we’re horrible people, or something like that.
I actually think part of the attraction of communities like the effective altruism community, for many people, is it sort of offers a vision of a recovery of a certain moral sincerity. You find this community, and actually, these people are maybe trying — more so than you had encountered previously — to really take this stuff seriously, to act rightly by its lights. And I think that can be a powerful idea.
But there is this then this thing comes up, where it’s like, “OK, but how much is enough? Exactly how far do you go with this? What is demanded?” I think people can end up in a mode where their relationship with this is what you said: it’s about not being bad, not sucking — like you thought “maybe I sucked” and now you’re really trying not to suck — you don’t want to be kind of punished or worthy of reproach. It’s a lot about something like guilt. I think that the thought experiment itself is sort of about calling you an asshole. It’s like, “If you didn’t save the child, you’re an asshole.” So everyone’s an asshole.
Rob Wiblin: But look at how you’re living the rest of your life.
Joe Carlsmith: Exactly. I think sometimes you’re an asshole, and we need to be able to notice that. But also, for one thing, it’s actually not clear to me that you’re an asshole for not donating to a charity — that’s not something that we normally think — and I think we should notice that. Also, it doesn’t seem to me like a very healthy or wholehearted basis for engaging with this stuff. I think there are alternatives that are better.
On why bother being good
Rob Wiblin: What are the personal values of yours that motivate you to care to try to help other people, even when it’s kind of a drag, or demoralising, or it feels like you’re not making progress?
Joe Carlsmith: One value that’s important to me, though it’s a little hard to communicate, is something like “looking myself and the world in the eye.” It’s about kind of taking responsibility for what I’m doing; what kind of force I’m going to be in the world in different circumstances; trying to understand myself, understand the world, and understand what in fact I am in relationship to it — and to choose that and endorse that with a sense of agency and ownership.
One way that shows up for me in the context of helping others is trying to take really seriously that my mind is not the world — that the limits of my experience are not the limits of what’s real.
In particular, I wake up and I’m just like Joe every day — every day it’s just Joe stuff; I wake up in the sphere of Joe around me. So Joe stuff is really salient and vivid: there’s this sort of zone — it’s not just my experience, there’s also, like, people and my kitchen — of things that are kind of vivid.
And then there’s a part of the world that my brain is doing a lot less to model — but that doesn’t mean the thing is less real; it’s just my brain is putting in a lot fewer resources to modelling it. So things like other people are just as real as I am. When something happens to me, at least from a certain perspective, that’s not a fundamentally different type of event than when something happens to someone else. So part of living in the real world for me is living in light of that fact, and trying to really stay in connection with just that other people are just as real as I am.
More broadly, when we talk about forms of altruism that are more fully impartial — or trying to ask questions like, “What is really the most good I can do?” — for me, that’s a lot about trying to live in the world as a whole, not artificially limiting which parts of the world I’m treating as real or significant. Because I don’t live in just one part of the world. When I act, I act in a way that affects the whole world, or that can affect the whole world. There’s some sense in which I want to be not imposing some myopia upfront on what is in scope for me. I think those are both core for me in terms of what helping others is about.
Articles, books, and other media discussed in the show
Joe’s work:
- Joe’s website
- Audio versions of many articles and posts are available at Joe Carlsmith Audio
- PhD dissertation: A stranger priority? Topics at the outer reaches of effective altruism, which covers:
- “Crazy train” work
- Simulation arguments
- On infinite ethics
- Against neutrality about creating happy lives
- Wholehearted choices and “morality as taxes”
- Can you control the past?
- Against meta-ethical hedonism
- Against the normative realist’s wager
- Actually possible: thoughts on Utopia
- AI alignment and takeoff research:
- Other podcast appearances:
- Utilitarian Podcast: Creating Utopia
- The Lunar Society: Utopia, AI, & Infinite Ethics
Thought experiments, strong longtermism, and other “crazy train” work:
- Are you living in computer simulation? by Nick Bostrom
- This cartoon explains why Elon Musk thinks we’re characters in a computer simulation. He might be right. by Alvin Chang in Vox
- 193,340 People Agree With Me, 85,660 Disagree — the sleeping beauty video by Veritasium
- Are You a Boltzmann Brain? by PBS Space Time
- Astronomical waste: The opportunity cost of delayed technological development
- The case for strong longtermism by Hilary Greaves and William MacAskill
- Epistemic learned helplessness by Scott Alexander
- Infinite ethics by Nick Bostrom
- Pareto principles in infinite ethics by Amanda Askell
- Functional decision theory: A new theory of instrumental rationality by Eliezer Yudkowsky and Nate Soares
- Newcomb’s paradox
- Psychological twin prisoner’s dilemma
- Parfit’s hitchhiker
- Boltzmann brains
AI alignment arguments:
- AI alignment: Why it’s hard, and where to start — presentation by Eliezer Yudkowsky
- What failure looks like by Paul Christiano
Other 80,000 Hours podcast episodes:
- Ajeya Cotra on worldview diversification and how big the future could be
- Alan Hájek on puzzles and paradoxes in probability and expected value
- Amanda Askell on tackling the ethics of infinity, being clueless about the effects of our actions, and having moral empathy for intellectual adversaries
- Sharon Hewitt Rawlette on why pleasure and pain are the only things that intrinsically matter
- Will MacAskill on moral uncertainty, utilitarianism, and how to avoid being a moral monster and what we owe the future
- Hilary Greaves on Pascal’s mugging, strong longtermism, and whether existing can be good for us
- Tom Davidson on how quickly AI could transform the world
- Paul Christiano on how OpenAI is developing real solutions to the ‘AI alignment problem,’ and his vision of how humanity will progressively hand over decision-making to AI systems
Transcript
Table of Contents
- 1 Rob’s intro [00:00:00]
- 2 The interview begins [00:09:21]
- 3 Downsides of the drowning child thought experiment [00:12:24]
- 4 Making demanding moral values more resonant [00:24:56]
- 5 The crazy train [00:36:48]
- 6 Whether we’re living in a simulation [00:48:50]
- 7 Reasons to doubt we’re living in a simulation, and practical implications if we are [00:57:02]
- 8 Decision theory and affecting the past [01:23:33]
- 9 Newcomb’s problem [01:46:14]
- 10 Practical implications of acausal decision theory [01:50:04]
- 11 The hitchhiker in the desert [01:55:57]
- 12 Acceptance within philosophy [02:01:22]
- 13 Infinite ethics [02:04:35]
- 14 Infinite ethics and the utilitarian dream [02:27:42]
- 15 What to do with all of these weird philosophical ideas [02:35:28]
- 16 Welfare longtermism and wisdom longtermism [02:53:23]
- 17 Epistemic learned helplessness [03:03:10]
- 18 Power-seeking AI [03:12:41]
- 19 Rob’s outro [03:25:45]
Rob’s intro [00:00:00]
Rob Wiblin: Hi listeners, this is The 80,000 Hours Podcast, where we have unusually in-depth conversations about the world’s most pressing problems, what you can do to solve them, and how you can improve the past. I’m Rob Wiblin, Head of Research at 80,000 Hours.
Today’s episode with philosopher Joe Carlsmith is probably our most challenging to date.
The first reason for that is that we dive into some pretty tricky philosophy and move through the topics fairly fast.
And on top of that, Joe is reacting to a debate among philosophers and global priorities researchers, and trying to rebut an attitude that he thinks some people like me hold. And that debate is one that we haven’t fully covered on the show before, so there’s a risk that for many people it could sound like joining a conversation halfway through.
To help make it a bit easier to follow, and to help you figure out whether it’s for you, I’m going to give a bit more of a summary of the whole conversation here at the start than we usually do, or at least my gloss of it.
The first section is the most straightforward — Joe talks about the drowning child thought experiment from Peter Singer, and why he thinks it can make people feel like they’re being strong-armed into doing the right thing by being effectively called complete jerks. This alienates some people from their own intrinsic desire to do the right thing. Joe offers a variation on the drowning child thought experiment that highlights that we really do want to help others in cases where we can do a lot for them at little cost to ourselves — not out of guilt, but just out of compassion.
Joe and I then dive into his PhD thesis which he recently completed, called A stranger priority? Topics at the outer reaches of effective altruism. It covers a number of recent ideas in philosophy that, if true, could upend our understanding of the world. We talk about two of those and a third which he has written about on his website.
The first idea is that we might be living in a computer simulation, because, in a nutshell, if most civilisations go on to run many computer simulations of their past history, then most beings who perceive themselves as living in that history must be in computer simulations. This is a basic idea you might have heard before, and it’s one which Joe examines in his thesis and modifies to make it more robust to criticism. At the end of the day he doesn’t know of a good rebuttal to the basic argument for taking this simulation hypothesis seriously. And if it were right, it seems like it would be a really big deal.
The second is the ability we might have to influence places and times on which we can’t have any causal effect. Crazy though it may sound, if parallel universes existed, we might be able to improve our expectations of what they are like, by changing how we behave, and knowing that the way other beings elsewhere in the universe decide what to do is correlated with the way that we do it.
Basically there’s the possibility that we can have effects on the world that are not causal in the way we normally understand them. Indeed philosophers have come up with this term ‘acausal’ to describe these potential influences that we might have, that we kind of lack intuitions about.
The third is that the universe might be infinitely large, which can create all sorts of problems for our theories of ethics. The easiest way to explain the issue here, though not actually the most serious one, is that if the universe is infinitely large, and so contains an infinite amount of good in it, then doing an additional good thing doesn’t increase how good the universe is in total, and so you might naively judge it as pointless. This has come up on the show before in episodes #42 – Amanda Askell on tackling the ethics of infinity and #139 – Alan Hájek on puzzles and paradoxes in probability and expected value.
In reality Joe is worried about other more complex problems that infinities create. During his PhD, Joe looked into this topic and tried to find ways to adjust our theories of normative ethics so they were compatible with a universe that might be infinite in scope. But he just found more ways that this was a problem for every theory of ethics, and concluded that any way of dealing with it was going to have to violate our common sense in all sorts of ways.
Having briefly considered these three ways in which philosophy might greatly undercut our common sense, we think about how we should handle this.
I explain to Joe that I find thinking about these topics disorienting and demoralising, because I feel so out of my depth, and like this deep uncertainty about the nature of the world means that anything I do might turn out to be completely misguided and overturned by some later discovery. I don’t see how I can reach meaningful conclusions on these fundamental issues, yet they might determine what is right and wrong to do. So I’m tempted to just give up on the project of doing good.
Joe sympathises with this and explains that he thinks that most people shouldn’t spend a lot of time worrying about these topics, and should feel free to ignore them if they aren’t finding them useful.
For Joe, life comes before philosophy, and philosophy is a tool for living with greater awareness and clarity — and if your philosophy starts to break, your life probably shouldn’t break with it. Ignoring issues that one is unlikely to act on usefully is an option — and indeed, the option we choose for most topics most of the time. And even though we feel very adrift, Joe points out that none of us really thinks that just starting to act completely randomly is as good as doing anything else — and that is probably a sign of wisdom.
That said, Joe thinks it’s important that listeners appreciate that we are far from having all the answers about the nature of the universe or a complete and satisfying theory of moral philosophy.
That’s because if people feel confident that they’ve got it figured out they may make big mistakes, like trying to commit humanity to a particular future, or taking radical and dangerous and uncooperative actions that only make sense if someone is very confident that they know what’s right and wrong.
Indeed he has written the above in part because he worries that me and perhaps some listeners are too confident about what sort of future we should be aiming for, and how to get there, and lack an appropriate sense of confusion.
Joe argues that if we embrace the idea that humanity is really out of its depth in trying to improve the world given its current state, then this probably does shift our priorities in some concrete ways.
He suggests that becoming a wiser civilisation may be our best bet for properly dealing with profound issues we don’t understand today — an approach that Joe calls “wisdom longtermism.” Basically, we should: firstly, keep humanity’s options open, work today on issues that are so urgent they can’t be delegated to future generations; and otherwise, try to put civilisation on a track where it can become much wiser than we are today, and hopefully one day answer a lot of the questions that we feel incapable of satisfactorily dealing with today.
In summary, Joe’s key message is that profound philosophical ideas are worth taking seriously, but we must also hold them lightly and with maturity and not get too sucked into them. Focusing our efforts on creating a wise future civilisation that can properly deal with these issues may be the best response, given our limited capacities today.
All right, so that’s a summary of the arc of the conversation, the attitude that Joe is responding to, and the key points that come up.
The reason to give this summary is that I think it will help people make more sense of the conversation, and also to avoid getting stuck on the many technical details and arguments that Joe refers to, which can certainly be confusing, but which aren’t so key to the broader points being made.
I’m going to chime in with some clarifications and definitions during the interview, so hopefully it’s educational even if you find it tricky to follow every point being made.
One thing I will just quickly note is that Joe works at Open Philanthropy, which is 80,000 Hours’ biggest donor.
All right, without further ado, I bring you Joe Carlsmith.
The interview begins [00:09:21]
Rob Wiblin: Today I’m speaking with Joe Carlsmith. Joe is a senior research analyst at Open Philanthropy, where he has worked since 2018. His research focuses on risks to humanity’s long-term future. In particular, at Open Philanthropy he has helped with research trying to estimate when AI would be capable of different kinds of tasks, and he’s worked on a report on whether AI systems should be expected to converge on generalised power-seeking behaviour and whether that would be a problem.
He’s a philosopher by training, having done his undergrad at Yale before completing a Philosophy PhD at Oxford University on the strangest issues that arise in global priorities research. He’s also a well-known writer and blogger who has an extensive corpus of research on all sorts of different topics on his website, joecarlsmith.com — and many of those articles you can hear on his podcast feed at Joe Carlsmith Audio.
Thanks for coming on the podcast, Joe.
Joe Carlsmith: Thanks for having me, Rob.
Rob Wiblin: I hope to talk about whether we’re living in a computer simulation and how one can cope when philosophy demolishes just one piece too many of our common sense. But first, what are you working on at the moment, and why do you think it’s important?
Joe Carlsmith: Most recently I’ve been thinking about moral uncertainty, and uncertainty about worldviews in general. I’m doing that partly in the context of informing Open Philanthropy’s thinking about the topic, but I also think it’s just important more generally to approach high-stakes and big ideas with the right type of moderation and maturity.
In particular, there’s something about the existing discourse about moral uncertainty that feels kind of fake to me. I think part of that is that it places a lot of emphasis on the idea of moral theories, and your credences in moral theories. I have some sense that this isn’t the right way to understand the richness and tension of human moral life. I think there are some alternative models in particular that emphasise more cooperation between different parts of yourself, and different motivations and perspectives you might have, that I see as more promising. So I’ve been exploring some of that.
Rob Wiblin: Yeah. Moral uncertainty — for people who need a refresher — is this idea that, if you think there’s a 50% chance that utilitarianism is true (so you should just focus on consequences), and there’s a 50% chance that some particular deontological theory is true (so you shouldn’t break any particular rules), then you would do some kind of averaging between them. Maybe you’d focus on consequences except in the cases where you’d be forced to break some prohibition according to the other theory.
Something about that feels kind of forced to you, is that right? It doesn’t feel like it’s capturing the spirit that you want people to be acting with?
Joe Carlsmith: Yeah, just as a sort of intuition pump. I don’t think people really go around with credences on deontology and credences on consequentialism — or I think that’s not a very natural description, if you go out and look at people’s moral life and you ask them, “So, by the way, are you like 60% on deontology? 80%?”
Obviously there’s a lot of ways in which philosophy uses artificial models of psychology and other things to make various abstract points. But I think it’s kind of a clue that maybe we’re missing something in what’s going on when people feel torn between different perspectives and different ways of looking at things — that our way of setting it up doesn’t fit super well with what you would have naively said about people.
Downsides of the drowning child thought experiment [00:12:24]
Rob Wiblin: Yeah, I think some of those things will come in later in the conversation. But I wanted to open with the topic that’s not exactly the core theme of the interview, but I think will help to motivate why we’re going to go through all this effort to try to sincerely understand the world — even when it’s taking us to irritating places. I think most listeners will be familiar with the drowning child thought experiment from Peter Singer. It might be one of the most influential thought experiments in moral philosophy in the last century or so.
Can you very quickly remind us of the drowning child thought experiment, and then explain what reservations you have about it?
Joe Carlsmith: Sure. The basic setup is that there is a child drowning in a pond, and you can save the child, but only by ruining your clothes. Let’s say in particular it’s like a very expensive suit, thousands of dollars, though that wasn’t specified in the original experiment. The basic claim is that, a) you’re morally obligated in this situation to save the child and to ruin your clothes, and b) that many people in the world, including many listeners to this podcast, would be in a morally analogous situation towards the world’s needy. The implication being that just as you’re obligated to save the child, you’re obligated to donate money or otherwise take kind of costly actions to help others around the world.
I think this thought experiment is, with suitable caveats, pointing at something really important about the world and something I take seriously. I think we should move cautiously in diagnosing the lesson that this thought experiment gives; there’s something important here, but I think we shouldn’t assume too quickly we know what that is.
At a separate level though, I think psychologically, the thought experiment for me creates too easily a kind of quasi-adversarial and reluctant and coercive relationship to the morality in question. In particular, I think it conjures this image of: I’m over here, I’ve got the stuff that I care about directly and the stuff that matters to me, and then morality comes in from outside and it’s sort of like taxes — like, it wants to take some of my stuff and some of what I care about for the sake of something that I don’t care about, that apparently the rules say I have to give something up for.
And so you can get into this mindset, and I think this is the discourse that the Peter Singer conversation conjures — this sense of how much is enough? How much do I “have to give” to have played by the rules, or to be off the hook? What is demanded and what is OK to not do? I think there’s something important about that, but I also think it’s maybe not the right frame, or the only frame, or the frame that I would want to start with.
Rob Wiblin: Yeah. What sorts of negative effects do you think reflecting a great deal on that thought experiment can have on people, or maybe that you’ve observed it having on people?
Joe Carlsmith: I think there are a number of effects, some of which I see as entirely negative and some of which are kind of mixed. So, I encountered this thought experiment in high school, and like many people, I was kind of convinced. I was like, “Wow, this is a powerful argument. This is an important consideration.” So I went around and I talked with lots of people about it, and often they were reasonably convinced too — or they didn’t have some devastating retort or anything like that. But nevertheless, neither they nor I were acting on the apparent conclusion of this argument. So that for me, I had this feeling of like, “Wait, so are we just giving up on morality?” You know, “So we said it was wrong…”
Rob Wiblin: “So I guess we all suck, right?”
Joe Carlsmith: Yeah, it was sort of like that. We said it was wrong, but we’re going to still keep doing it. Like, what’s going on here?
Rob Wiblin: It’s like the lesson of the thought experiment is that we suck and actually we don’t care. Maybe that’s the bottom line.
Joe Carlsmith: Yeah, that’s right. I think that’s the takeaway a lot of people have. I think what that can do is sort of break your conception of yourself as a kind of morally sincere agent — and at a deeper level, it can break your conception of society and your peers, or society as a morally sincere endeavour, in some sense. Things can start to seem kind of sick at their core, and we’re just all looking away from the sense in which we’re horrible people, or something like that.
I actually think part of the attraction of communities like the effective altruism community, for many people, is it sort of offers a vision of a recovery of a certain moral sincerity. You find this community, and actually, these people are maybe trying — more so than you had encountered previously — to really take this stuff seriously, to act rightly by its lights. And I think that can be a powerful idea.
But there is this then this thing comes up, where it’s like, “OK, but how much is enough? Exactly how far do you go with this? What is demanded?” I think people can end up in a mode where their relationship with this is what you said: it’s about not being bad, not sucking — like you thought “maybe I sucked” and now you’re really trying not to suck — you don’t want to be kind of punished or worthy of reproach. It’s a lot about something like guilt. I think that the thought experiment itself is sort of about calling you an asshole. It’s like, “If you didn’t save the child, you’re an asshole.” So everyone’s an asshole.
Rob Wiblin: But look at how you’re living the rest of your life.
Joe Carlsmith: Exactly. I think sometimes you’re an asshole, and we need to be able to notice that. But also, for one thing, it’s actually not clear to me that you’re an asshole for not donating to a charity — that’s not something that we normally think — and I think we should notice that. Also, it doesn’t seem to me like a very healthy or wholehearted basis for engaging with this stuff. I think there are alternatives that are better.
Rob Wiblin: I recall in another interview you did that you said you worry that there’s a generation of people for whom this thought experiment is one of the key ideas in morality that they’ve heard, and that they have this almost low-level of trauma or sense of alienation from the idea of doing the right thing. Because it’s coming at them like this hostile force that is trying to shut down their party: they were having a good time, and the police have arrived, and now the way to live a good life would be to give away everything, to live in penury. They don’t feel like that, but now they’re just torn up about it.
Joe Carlsmith: Yeah, I think the term I call it in my head is “Peter Singeritis.” Which is, somehow this broke something about your conception of yourself and your conception of the world — I think that can be a sense of the party, it can be a sense of being a good person, it can be a sense of what it would be to try to do what you should do — all sorts of kind of subtle forms of alienation from the idea of a moral and good and sincere life.
Rob Wiblin: Yeah. I should say it’s far from everyone — but potentially at least, it’s had this effect on some people. But yeah, you have this variant thought experiment that captures a similar idea, but at the same time tries to capture the intuition with a very different emotional tone to it. Can you explain that?
Joe Carlsmith: Sure. It’s not super different, but I find it lands in my head differently. So I really like taking walks, and I think it’s good if the thing that you’re paying or giving up is something that you have a direct connection with valuing. I feel like suits and clothes, it’s shallow, like, “What? You and your clothes?” I don’t know, walks are more wholesome.
Rob Wiblin: The thought experiment does make me wonder whether philosophers have never heard of dry cleaning, but I suppose that’s kind of a trivial empirical issue relative to the moral ones.
Joe Carlsmith: There’s piranhas in the pond. They’ll shred your suit.
So for the thought experiment, I imagine going off for a walk. It’s a beautiful fall day. I’ve been looking forward to this. As I’m about to enter the forest, kind of far away in the distance, there’s this river. I see there’s some commotion by the river, but the light is fading. So I don’t go to the river. I just go on my walk, and I come back and it’s great. But I learn later that while I was on my walk, there was a man who was drowning in the river. He was pinned under some sort of machinery. His leg was caught. There were people there trying to move the machinery. His wife was in the water with him, his child was there watching, and they couldn’t move the machinery. They thought maybe one extra person would have made the difference.
And the intuition this pumps for me is some sense of like, I just directly want to trade my walk away and say, “Have this man live instead.” I imagine maybe it’s going to the river, though that’s a little complicated. If it’s just a cleaner thing, I imagine I can go back in time, and I sort of disappear in my walk — I just don’t exist for that afternoon, and instead, somehow this man’s leg gets free. Somehow he flops out onto the shore of the river and has his entire life, and his family has him back. I just feel directly that I want to make that trade. That’s just worth it from my own perspective: I want that man to live more than I want me to have this walk.
Rob Wiblin: Yeah, no one’s twisting your arm, I guess. Is the key thing that you have a great level of regret that you didn’t realise that, and so you weren’t able to make this trade?
Joe Carlsmith: Yeah, that’s right. I’m trying to skip the part where you’re an asshole and just cut to the stakes themselves, and connect more to the sense of like, “Actually, this is something that I care about too” — rather than setting up this conflict of how there’s the thing you care about, and then there’s the thing over there that morality demands that you sacrifice for instead.
Rob Wiblin: Where do we go with that? I suppose someone could encounter that and think, “Yes, I would love to not have gone on the walk and been able to save this other person’s life instead, if only I’d realised.” But they might continue down the chain of reasoning and think, “But I’m in this situation that resembles this in normal life, and I don’t.” Maybe you could still end up with a degree of alienation, or you might conclude yourself that you are the jerk who would rather take the walk.
Joe Carlsmith: Yeah, I think the thought experiment doesn’t resolve all of the stuff that comes up with Singer. As you say, there’s still a question like, what are you going to do now? Also, notably, part of what’s doing the work in the thought experiment is that you didn’t know that you could save the guy. As you say, in our actual situation — with respect to donating to charity and stuff — obviously there’s uncertainty about exactly what the impact is, but we know a lot more about what we could do.
It’s not a perfect analogy for our current situation. It also doesn’t answer stuff about how, suppose you go out and it’s always every time you go on a walk, there’s a guy drowning.
Rob Wiblin: Not just one. There’s like many recurring every mile or two.
Joe Carlsmith: Yeah, that’s right. It’s just a line. It’s like, “Someone fix this machinery, please.”
Rob Wiblin: Irresponsible business.
Joe Carlsmith: Yeah. So for me, I don’t want to say that this changes the conversation in a super structural way. It’s more just a way of tuning into a sense of direct connection with the stakes themselves, and a sense that that can be something that’s coming from your perspective: in the same way you care about other things in your life, you can care about other people. I just think that’s useful to remember and a more wholehearted initial approach.
Sometimes in these contexts, people talk about these different framings. Where there’s the obligation framing, which is the “you’ll be bad” guilt. There’s this other framing, sometimes called the opportunity framing, which is, “Ooh, isn’t it exciting that I can save lives in the same way? Wouldn’t it be amazing to have saved someone from a burning building?”
I feel like neither of these is the most resonant framing for me: the first one is too guilty and coercive, and the second one is too happy and excited. I feel like there’s this middle or this other one, which is just like, you just care about this thing. And it’s not necessarily a happy thing — there’s a lot of grief and sadness sometimes — but it’s also not necessarily a guilt-based externalised pressure.
Making demanding moral values more resonant [00:24:56]
Rob Wiblin: Yeah, there is something perverse about imagining someone coming upon a burning building with glee, with excitement that finally they have an opportunity to show them their moral chops. There’s something that’s perverse about that as well. I guess you’re trying to find something that’s a middle ground of sorts.
As I understand it, a priority for you with your writing recently has been taking not just this idea, but other ideas related to effective altruism or longtermism or trying to do good in the world, and trying to bring out the aspects of them that you think are likely to be the most emotionally resonant for readers. Why have you taken on that challenge?
Joe Carlsmith: I think the most basic answer is that I just think this stuff is real; it’s not a game. I think tuning into that is emotional. We’re talking about really high-stakes stuff that really matters, and I think it’s important to just see that clearly.
I also think that our emotional orientation towards these issues — the sense in which they feel real and visceral and compelling — is important more broadly to a bunch of things. I think it’s important to our motivations. I think it’s important to how much of our humanity we’re bringing to the project of trying to help others.
I think it’s also, in sometimes subtle ways, quite important to our epistemology. If something feels real and visceral, that’s a sign about its reality; it’s a signal that it’s been processed at sometimes implicit and unconscious parts of your epistemic system in a way that has passed a bunch of checks for like, “This is important and worth acting on.” So if you kind of find yourself in relation to this stuff living entirely in a mode that feels kind of dead and unreal and totally abstract, I think that’s like a warning sign. Maybe you’re misunderstanding, maybe some part of you isn’t convinced of this, or some part of you doesn’t care about it. I think that’s worth noticing.
In practice, seen in the right light, this stuff is just very visceral and important, so I think it’s just useful to bring that out.
Rob Wiblin: Yeah. Is there a good or simple example of an idea that you think is often presented in a dry or maybe even hostile way, where it could maybe relatively straightforwardly be presented in a more resonant way?
Joe Carlsmith: Sure. One example I’ve written about is the idea of the value of creating new happy lives that wouldn’t have otherwise existed. There’s this, in my opinion, somewhat strange discourse in philosophy — where the idea is that it’s somehow intuitive, and a kind of datum that we should be trying to capture, is that just nothing matters about creating new wonderful lives, and that’s just a sort of neutral act. The hard thing is designing a population ethics that can capture this supposed datum.
I think when you look at this from a different angle, it just doesn’t to me seem true, or at all intuitive, that this is a neutral act. In fact, if you look at it, creating someone’s entire life — a life of richness and beauty that they value hugely — is very significant. When I think about my own life and all of the beauty and joy and even the pain and everything that my own life means to me — friends and relationships and music and leaves in the fall and all sorts of things. And I imagine someone who has a chance to create me or do something else, like take a walk, and who chooses to create me, I have this sense of, this is a time for real gratitude. This was an incredibly significant thing this person did for me. I feel like I want to approach them in an attitude similar to someone who saved my life. I owe everything to this person.
Similarly, if I imagine being in a position to create someone else, where they would value their life in the way that I value mine, that seems similarly significant. I also have some golden rule energy about it: like, I would want people to create me, so I should do unto others the same in suitably similar circumstances. So that’s one where, if you reframe it, there is an accessible orientation towards this, where it’s emotionally compelling in a way that the philosophy doesn’t always capture, and also that I think people lose touch with.
So I think some people, when they think about the stakes of extinction, they sort of think, “Well, future people. It’s hard to get a grip on, but apparently something something population ethics — this matters a lot.” I actually think if you tune in to all of these people that are equally real, then it’s much more directly compelling. And that can be important to our motivations and our sense of those stakes.
Rob Wiblin: Yeah. What about the case of having children? Setting aside the question of whether you might have any duty to have kids, or if they would have a great life or not, if you present the case to someone of: “Imagine that you had a child. They went on to live this fantastic, rich, fulfilling life. They accomplished things for other people. They experienced a wide range of things. On their deathbed, they were very satisfied with how things had gone. Somehow from heaven, you could look down and see how their life had played out. Would you think that it was good that you had had kids in retrospect, because you created this person who had this wonderful life?”
I feel like in that case, it is kind of intuitive that you’d feel like maybe you didn’t have to do it, but it was nice that you did. Yeah. I’m not sure how much philosophers talk to parents.
Joe Carlsmith: Yeah, I agree. I think this is useful, especially if you frame it as a pro tanto. We’re not saying all things considered, this is how all the considerations shake out — I think that’s a substantially more complicated question. But the idea that it’s just a zero, like neutral: why would we be trying to say that?
Rob Wiblin: That’s a really good one. Is there another one that’s easy to explain?
Joe Carlsmith: Another one, for me, is just the notion of “utility” in the context of utilitarianism. I feel like this is just the worst word. I’m not a utilitarian, but we’re talking about people’s lives — we’re talking about a huge portion of the substance of what makes life meaningful — and we’re using this word that connotes something kind of dry and functional.
And similar even for something like pleasure. The kind of central example of pleasure is like rats on heroin, or orgasms, or something like that. Or some more futuristic utilitarians will use this notion of like hedonium, which is a computationally optimised pleasure and they’re tiling the universe with, or something like that. It calls to mind this kind of sterile and uniform and cold vision.
I think there’s more to life than pleasure, but I think pleasure — in the deep sense, in the sense of the best, just how good experience can be — is something really profound. At the least, you should be imagining something that is appearing to you as something like sublimity and energy and boundlessness — something kind of roaring with life and love and victory, all sorts of stuff. It’s not that in particular, but you should really think of the best things — how good experience can be, which I think is really a lot — and then be extrapolating from there. I feel like the discourse around the idea of pleasure just doesn’t capture that.
So that’s another one where it seems to me like there are just ways this gets talked about that doesn’t capture the real stakes.
Rob Wiblin: As you were saying that, it was occurring to me that I think I almost always stay away from the word “utility,” except when I’m really forced to because there’s some technical case. I usually say “wellbeing,” and if not that, maybe “flourishing” is another one.
And as you’re saying, I sometimes say “pleasure,” but it has this sense that you get pleasure from food, but do you get pleasure from having a relationship? Do you get pleasure from achieving something great at work? It’s not something that you would normally say. It has this more narrow connotation, even though usually we’re just talking about all positive experiences that people could have much more broadly.
Joe Carlsmith: Totally. I actually think that there’s a broader issue here, something about abstraction itself. There’s a point a friend of mine made to me a few weeks back: this idea that sometimes in contexts where you’re talking about how to do the most good, you end up talking about numbers, and people can be dismissive of the notion of numbers. But when you’re talking about numbers, you say, “a million people died” — and you’re talking about one person, and another person, and another person, and another person: you’re talking about a bunch of concrete things at once.
I think that’s also true of when we use notions like factory farms, or we use the idea of suffering, or talk about future people. These sound like these abstractions, and they’re abstract because they’re trying to cover a lot of stuff — but the stuff that it’s trying to cover is all just as detailed and visceral as everything else. So if you’re trying to talk about a lot of the world at once, you have to work to remember the detail and concreteness of the ultimate thing you’re referring to.
Rob Wiblin: Yeah. One rat on heroin is a blessing, Joe, but a million rats on heroin, that’s just a statistic.
What are the personal values of yours that motivate you to care to try to help other people, even when it’s kind of a drag, or demoralising, or it feels like you’re not making progress?
Joe Carlsmith: One value that’s important to me, though it’s a little hard to communicate, is something like “looking myself and the world in the eye.” It’s about kind of taking responsibility for what I’m doing; what kind of force I’m going to be in the world in different circumstances; trying to understand myself, understand the world, and understand what in fact I am in relationship to it — and to choose that and endorse that with a sense of agency and ownership.
One way that shows up for me in the context of helping others is trying to take really seriously that my mind is not the world — that the limits of my experience are not the limits of what’s real.
In particular, I wake up and I’m just like Joe every day — every day it’s just Joe stuff; I wake up in the sphere of Joe around me. So Joe stuff is really salient and vivid: there’s this sort of zone — it’s not just my experience, there’s also, like, people and my kitchen — of things that are kind of vivid.
And then there’s a part of the world that my brain is doing a lot less to model — but that doesn’t mean the thing is less real; it’s just my brain is putting in a lot fewer resources to modelling it. So things like other people are just as real as I am. When something happens to me, at least from a certain perspective, that’s not a fundamentally different type of event than when something happens to someone else. So part of living in the real world for me is living in light of that fact, and trying to really stay in connection with just that other people are just as real as I am.
More broadly, when we talk about forms of altruism that are more fully impartial — or trying to ask questions like, “What is really the most good I can do?” — for me, that’s a lot about trying to live in the world as a whole, not artificially limiting which parts of the world I’m treating as real or significant. Because I don’t live in just one part of the world. When I act, I act in a way that affects the whole world, or that can affect the whole world. There’s some sense in which I want to be not imposing some myopia upfront on what is in scope for me. I think those are both core for me in terms of what helping others is about.
Rob Wiblin: Yeah, we’ll come back to a bunch of that later on, these questions of: How does one emotionally grapple with the challenges of the world? And how does one remain motivated? And what is the right attitude to have towards all of this? I think some of what you’re saying resonates, but for me, sometimes it can feel a little bit heavy, a little bit serious. I almost want to find something a bit more frivolous in there in order to keep me sane. But yeah, we’ll come back to that in a later section.
The crazy train [00:36:48]
Rob Wiblin: Let’s push on to the big thing we’re going to cover today. I guess there’s multiple different names for it, but you and I have both sometimes referred to this as “the crazy train” or “the train to crazy town.” Regular listeners might recall this crazy train expression from episode 90 with Ajeya Cotra. I think Ajeya may have even actually come up with this term. I think it’s a rock song from the 80s. But anyway, at least in this context, she’s come up with the crazy train.
But you completed your PhD thesis recently, and it’s titled A stranger priority? Topics at the outer reaches of effective altruism. It kind of has the crazy train or the train to crazy town as a unifying theme. Can you explain what that idea is?
Joe Carlsmith: The basic idea with the crazy train is that there’s a certain kind of broadly Bayesian, scope-sensitive, quantitatively oriented philosophy that tries to take seriously just how big and strange the world can be, just how big our impact on the world could, in principle, be.
This is a philosophy that I’m interested in, various other people are interested in, and that has been used to argue for conclusions like strong longtermism — which is this idea that if you crunch the numbers and you really think about how many people there could be in the future, that positively influencing the long-term future should be the kind of overwhelming moral priority, at least from an impartial perspective. For example, Nick Bostrom’s paper “Astronomical waste” draws on a simple version of this argument, and it’s been developed more recently in papers like “The case for strong longtermism” by Hilary Greaves and Will MacAskill. So this has been discussed on the show quite a bit. There’s also weaker versions of longtermism that don’t go so far as to say it’s an overwhelming moral priority, but they draw on some similar vibes and ideas in arguing for their conclusion.
I think it’s easy to look at this discourse and to look at this type of philosophy and say, strong longtermism is the kind of philosophically principled position here: it’s where you end up if you take these ideas seriously and you’re willing to say some of the counterintuitive things that they imply. The real question is more about: But are these ideas too in conflict with our common sense? Are they too extreme? Are they compatible with various other important values we hold? That’s one reading of the dialectic. I think it’s easy to be in that narrative, and Ajeya talks about that narrative on the podcast.
But I actually think it’s not that simple — and the philosophy in question, broadly construed, actually brings up a bunch of other questions and uncertainties. And that when you keep doing this reasoning, or you take it to other places, things get quite a bit murkier. I think that’s an important fact. And that’s the broad narrative of the crazy train.
Rob Wiblin: Yeah. It’s a little bit related to the idea of crucial considerations: these big ideas that might significantly change what you think you ought to do. Like, not just say you should do this 10% less, but maybe you should do the opposite of what you were doing before. Or maybe you should focus on a completely different thing and you’ve had the wrong idea all along. Maybe the crazy train is like going on to further and further stops, potentially overturning the things that you thought before and overturning more and more aspects of common sense that you were living with before you started this project of trying to make everything better.
But “crazy train,” as I understand it, you kind of don’t like that. Maybe because it presupposes that these ideas are crazy when maybe we should have a slightly more neutral attitude towards it going in. Is that the issue?
Joe Carlsmith: I think there’s a few issues. I have issues both with the term “crazy” and with the term “train.” I actually think the “train” problem is bigger.
Crazy: it’s sort of pejorative. I feel like anytime you’re in a context where someone’s like, “There’s this idea, and then there’s this other idea — let’s call the other idea the ‘bad idea’ or the ‘wrong idea,'” I think you’ve got to wonder whether the framing here is influencing things. Obviously, these ideas are undeniably strange in some sense and kind of unfamiliar, and I think that’s important, but I don’t want to prejudge what we can learn from them. “Crazy” makes it sound like either you’re going to recognise that this is totally crazy and get off the boat, or you’re going to go crazy — and there’s no middle ground; there’s no way to handle this stuff maturely and with different amounts of weight and balance and stuff like that. So I don’t like that.
The idea of a train suggests that there’s a single train, just one train, and there’s a single ordering of the stops, and you have to get off at one place in particular. Also there’s a single place that it’s going, and the question is just how far you ride, how far you let it take you — you’re not doing it; the train is kind of pulling you there.
And I think that basically none of that is true. I think there are a lot of different ways into this stuff — there’s not a single craziness metric that’s kind of linearly increasing along one dimension. I think you can reject a bunch of ideas associated with longtermism and other ideas in this space, but still get bitten by a bunch of these issues. So I don’t think it’s just people who’ve accepted a bunch of things who have to deal with this.
Maybe most importantly, it doesn’t go to one place in particular. There’s a whole kind of garden of branching paths and different forms of uncertainty. Sometimes things just, like, break. It’s not as though there’s just like this clear conclusion, but it’s clearly wrong — it’s more like your tools start to break down and you have to start forging new track.
But I don’t have a super better term that I want to propose. The term that Aaron at Open Philanthropy suggested was something about “the wilderness,” which I like. Maybe the philosophical wilderness, where this calls to mind that there’s a lot of ways in which it is unfamiliar and strange or far away from home. Maybe you’re going to get lost. There’s a risk of getting lost. There’s a risk of getting eaten by something. Also there’s a kind of exploration dimension: you’re pushing into new territory, and there’s a lot of different paths forward.
I think that’s probably better in practice. I’ve gotten used to the term “the crazy train” — I do use it in my thesis, and I’ll probably just use it here — but I want to emphasise its limitations.
Rob Wiblin: Yeah. For replacing “train,” I like “the wilderness.” Maybe it feels more like a sampler platter, as you’re saying — because it’s not a linear thing, where you might only accept the later stops if you’ve accepted the earlier stops and stayed on. It’s more like a range of different ideas that you can dabble with, each to a different extent. You have like the sampler platter of peculiarity.
Joe Carlsmith: Yeah, I think that’s useful. A thing that neither of these does well is it suggests that you can only be in one place at a given time, whereas in fact, I think the right way to hold a lot of these ideas is with differing levels of weight and letting them influence you in different ways.
Personally, for example, I treat various confusions and uncertainties and interesting arguments in this vicinity as kind of clues that there’s something here to learn, that it’s worth paying attention to this. There’s some way in which I’m confused; it’s not necessarily that the craziest conclusion coming out of this is true, but apparently reasoning that I took seriously in other contexts is saying something weird. I should pay attention, but that doesn’t mean I should upend my life, or I should divert all my resources to this or anything like that.
To reach that practical bar, I personally impose a much higher standard of feeling resonantly convinced, feeling like I really understand this idea. But somehow the idea of getting off at a stop doesn’t allow for this kind of gradation in your response. It doesn’t have, like, you can kind of be interested in this; you can learn more without acting on it. It’s just either you’re here or you’re here or you’re here. And I don’t think that’s a useful lack of nuance.
Rob Wiblin: Yeah. Before we go up to the salad bar of batshit crazy ideas, is there anything else you’d like to say about what attitude we should bring to this discussion?
Joe Carlsmith: Yeah, I want to say a few things up front. I do think these topics require some caveats.
One thing is, just in terms of my own work, most of what I’ve worked on and written about is not this. I wrote about these topics in my thesis centrally in a particular academic context that values a certain type of intellectual contribution, and isn’t centrally asking the question of, “What’s the most important topic to be thinking about?” I’m not here to say, “This is the most important topic to be thinking about; we need tonnes more people working on it” or anything like that. I think these ideas are interesting and have nonzero relevance to important stuff, but I just want to be clear about how central this is to what I’m doing in the world and what I think other people should be doing.
More generally, in approaching these topics, we want to find a balance between, on the one hand, being kind of reflexively dismissive and closed and just like “haha” in a way that doesn’t let us learn anything new — and on the other hand, being kind of too credulous and excited and just grabbing hold of some random conclusion and running with it or holding these things too heavily.
On the first hand, I do think we want to be able to learn new things from thinking and from philosophy. That is sort of the point. And new things, they’re genuinely new, and can be strange and unfamiliar. And I think sometimes reactions to this stuff are born partly of the unfamiliarity and the sense of like, “I haven’t really thought about this, I don’t really know what it is.” That can change sometimes over time, and learning new stuff can be kind of disorienting and destabilising and unpleasant, but it’s often worth it. I think there’s a good track record of learning new stuff and following up on ways in which our models break and ways in which we’re confused.
On the other hand, I do think there’s just so many ways to approach these topics and kind of go wrong that I do worry. Just a few examples: one is that I think these topics can be kind of intellectual catnip, in a way that can be unhelpful and distracting, especially relative to more grounded and real-world stuff. I think there are ways to fail to track all of the ongoing confusions and assumptions that are going into various of these arguments and topics, such that you can feel like, “This is on solid ground. I’ve totally learned this new thing,” but in fact there’s like a bunch of structure and a bunch of additional uncertainty. I think people can just quickly become overconfident that they know what’s up with some of this stuff, and that there’s a particular conclusion that falls out of it. And I think that’s often premature.
I do think, for some people, these topics can be kind of destabilising and psychologically unhelpful. I just want to give people permission that if you notice that it seems like, “This isn’t showing up in my world in a very helpful way. It doesn’t feel like my psychology is interacting with this in a good way,” then it’s really OK to just put it down.
Finally, a thing I’ve noticed is with some of this stuff, some topics, they’re sufficiently weird that I feel like people, when they talk about them, they enter into this kind of intellectual la la land, where they lower their standards intellectually for argument. It’s somehow not a serious topic, and so they feel permission to just kind of say whatever they want.
I find this in particular in contexts like sims — and we’re going to talk about simulations in a little bit. In my experience, sometimes when even very secular, otherwise rigorous people start talking about sims, suddenly it’s just like the floodgates are open and they feel like they can just make stuff up and say whatever, and kind of speculate about all sorts of stuff in a way that seems to me often quite divorced from serious thinking. I have an essay where I call this notion “simulation woo” — where I feel like people get kind of woo-y about simulations where they wouldn’t otherwise.
I actually think that’s a feature of unfamiliar topics more broadly: that when you’re charting new terrain, there’s less structure, there’s less existing tradition of thinking about it. But I actually think that’s a case where you need more rigour and more carefulness, rather than less. I think we often go wrong there too.
That’s a bunch of caveats about this stuff. I wanted to say that, because it is weird.
Rob Wiblin: I feel like with all of this buildup, we’ve really got to bring the crazy now to justify it all. If we bring something lame like just tell people that there’s difficulty philosophically fully grounding induction, I think they’re going to be disappointed.
Joe Carlsmith: Yeah, that’s right. That’s the whole thing actually. It’s: “Guys: Causation. Have you heard? We don’t know. We don’t know what’s up with it. Is it real?”
Rob Wiblin: No, I think we’ve got some bangers.
Whether we’re living in a simulation [00:48:50]
Rob Wiblin: Yeah, so the disorientating topic that we should talk about first I think is the idea that maybe we could be living in a computer simulation, which quite a few philosophers have written about.
What is the key argument for the proposition that maybe we’re living in a computer simulation?
Joe Carlsmith: The argument that I take most seriously isn’t an argument for the claim that we are living in a computer simulation — it’s an argument for a constraint on our beliefs in this vicinity, and specifically our credences. “Credence” means kind of the probability you assign to something, and the constraint is that you can’t assign nontrivial probability to it being the case both that you are not living in a computer simulation and it being the case that most people who are suitably like you are living in a computer simulation.
So “suitably like you” requires some unpacking, but for simplicity, in this context, I think the relevant similarity we can use is the idea of people who find themselves apparently living in a stage of civilisation prior to a technological maturity: it looks like there’s a tonne of technology that we could develop — that our civilisation could be vastly larger and have vastly more powerful capabilities — but that’s not where we’re at yet; we find ourselves at an earlier stage. So call people who have those sorts of experiences “early-seeming people,” and the claim would be you can’t have nontrivial probability on it being the case both that you are not a sim and that most early-seeming people are sims.
Rob Wiblin: I see. The basic idea is if most people who are having experiences like yours — who are living in worlds like the one that yours apparently is — are living in computer simulations, then you should think that probably you are as well.
Joe Carlsmith: That’s the basic idea.
Rob Wiblin: That’s the structure. I suppose it’s quite abstract. What reason should we give to think that it is the case that most people living in the world — that is apparently the way that we see it — would be perhaps living in computer simulations?
Joe Carlsmith: The way you can start to wonder about those sorts of worlds, it gets a little slippery, but roughly speaking, we start out by saying, “How much computational power would it take to run a simulation of various kinds?” You can do some estimates of that. And then you can do some estimates of how much computational power would be available to a very advanced civilisation. It’s this very, very large number — such that it looks like at least for certain sorts of simulations, a kind of advanced civilisation could run many, many simulations of human history or of people living in this era for a tiny fraction of their overall resources.
So you would have thought — just extrapolating from our basic scientific picture of the world, and how simulations work, and neuroscience, and stuff like that — that it’s at least quite plausible that advanced civilisations would run lots of simulations of early-seeming people. You start to take seriously that you live in those worlds. But then you notice that if that’s the world, then this constraint that I talk about bites. So there’s a question like, does that mean I take whatever credence I had in those worlds and give it to my being a sim? Or do I change my credence on those worlds? There things get muddier, but it shows that some revision to our common sense is required.
Rob Wiblin: So there could be all kinds of reasons why possibly there could be simulations of people like us being run. Of course, it’s extremely mysterious, because if that is the case, then we don’t know what the world outside the simulation is like; we don’t know what the beings simulating us would be like and what their motivation might be.
But we could come up with stories that are kind of plausible about why humanity — if it continues for millions of years and ends up with incredible technological capabilities and able to harness all of the energy of the sun — would potentially turn some of that towards running simulations of our history for science or technology. Or perhaps just out of curiosity or as performance? I don’t know. Given that it would be extremely cheap for us to do that at that stage, maybe we would do it a lot. Maybe we would do it again and again and just learn about the differences, or create different worlds just to sate our curiosity and learn more about what other things might be out in the universe, for example, or what kind of stuff is possible. It could be really fascinating.
So that’s one possible story, and there might be other ones as well. If we did do that, then there would be very many simulations, perhaps. Now we’re bitten by this constraint you said. You’d have to say that they would run very few, or it’s very unlikely that would ever happen, or we should think that maybe we are [sims].
Joe Carlsmith: Yes. But I do want to flag that I think there can be something quite slippery about this particular way of framing it. And this is something that disturbed me when I first encountered the argument.
There’s a few forms of slipperiness here. One is if we start with trying to make salient the hypothesis that we do live in this world where there’s a lot of sims of early-seeming people — and we do it by first talking about a bunch of empirical claims about, for example, the computational power that would be required to run a sim, the computational power available to an advanced civilisation — then I think there is this weird kind of self-undermining quality that the argument can start to take on, where we condition on those empirical claims being true and then we assign some portion of that credence to being in a sim. But if we’re in a sim, then the empirical claims were derived from evidence that wasn’t about the world simulating us. So it doesn’t look like we actually have very good evidence for those claims. So there’s some self-undermining quality there.
There’s also a different (though related) thing, which is the way you put it: if we imagine ourselves first, we imagine ourselves as people who aren’t sims. Then we imagine our future going forward, and people in the future of us, ahead of us in time, running a bunch of sims. It can sort of sound like the argument is then saying, maybe you’re one of those people over there. But the way it’s been set up, the imagery is like they’re over there and we’re pointing ahead of us in time. You can’t be over there. We’re not in one of the sims that our descendants will run, because that can’t happen: they wouldn’t be our descendants if we’re sims. So the particular framing where we first talk about what our civilisation will go on to do can get quite slippery as well.
A thing I tried to do in the paper I wrote on this topic is set the argument up in a way that doesn’t rely on these empirical claims. I think that the argument bites regardless, and is more forceful and isn’t actually subject to these sorts of objections.
Rob Wiblin: Yeah, that’s great. I feel like it’s necessary to give people some idea in their mind of why there would be any simulations at all. Why shouldn’t we expect that no beings would be interested in running simulations? And then maybe it feels more intuitive.
I have to confess, the idea, to me, that we might be living in a computer simulation isn’t very counterintuitive. I know it seems very strange to some people, but at least to me, even the first time I encountered this, I was like, “Oh yeah, it kind of makes sense.”
Joe Carlsmith: That’s interesting. I’m really the opposite. At a gut level, I’m still like, “This is bullshit.”
Especially because I think there’s an underappreciated thing about the original paper on simulations that doesn’t get enough play. So Nick Bostrom wrote this paper, “Are you living in a computer simulation?,” which sort of popularised this argument, and the type of simulation he’s talking about is what’s known as a “shortcut simulation” — which means that there’s a bunch of stuff that you naively think is kind of detailed and real, but is actually not being simulated. So in Bostrom’s paper, the stars are very unlikely to be simulated in any detail. I think he talks about how the microstructure of the Earth doesn’t exist when people aren’t looking at it. And I hear that, and I’m like, “That’s some bullshit.”
Anyway, so I think it’s interesting how people react to this in different ways.
Rob Wiblin: Yeah, it’s like that classic Rick and Morty episode with the people simulating it, and as they’re running, they’ve got little robots creating the space in front of them.
Reasons to doubt we’re living in a simulation, and practical implications if we are [00:57:02]
Rob Wiblin: I guess intuitively this is weird to you. Do you think there are any good reasons to doubt the soundness of the general argument here?
Joe Carlsmith: I think there are reasons to hesitate. I think the argument has the argument, as I presented it, for this basic constraint — which is not an argument for a specific credence on being a sim — and it strikes me as quite forceful. As far as I can tell, there is no obvious hole that everyone in the know is aware of. In fact, some confident dismissals of the argument, including by sometimes-prominent scientists, strike me as quite unserious in their engagement with the topic.
That said, I do think there are remaining reasons for hesitation. One of the most salient to me is that the argument works most naturally in finite worlds, where you can talk easily about the ratio of sims to non-sims in different classes of observers. We want it to extend to infinite worlds as well, I think. And when we try to do that, the sorts of uniformity principles that Bostrom and I and others are going to draw upon in making arguments — like, if 99% of X people are sims, and you’re an X person, then 99% you’re a sim — that sort of reasoning gets a lot weirder and harder in infinite contexts.
Just to give some intuition for that, it makes sense if I tell you: you’re one of 10 people in rooms labelled one through 10, and what’s the probability you’re in room one? It’s very natural to be like, “10% — I’m going to split my credence equally between these rooms.” If I instead tell you you’re one in an infinite line of people with rooms labelled with the natural numbers; what’s the probability you’re in room 52? “Well…” A thing you can’t do — if you want to have real-numbered normal probabilities on each of these rooms and non-zero — is you can’t assign uniform credences to each of the rooms, because they’ll add up to more than one. So you need to do something else. Maybe you need to have infinitesimal credences, or maybe you need to assign unequal credences to the different rooms.
Suddenly we’re in sort of new territory. Now, admittedly, this is territory that cosmology in general has to grapple with. There’s this general problem in cosmology — the measure problem — of how do we talk about the fraction of observers in a big universe that are having different sorts of experiences? It looks plausible we’re going to need some answer to that if we live in an infinite world. You might think we’ll just apply that to sims too. But the issue is it’s not clear that that solution is going to be of the kind of uniformity indifference-ish form that is normally at stake in the simulation argument as it’s presented initially. I think that’s a reason for pause for me.
And I think it’s connected with another reason for pause: there are these arguments about Boltzmann brains, which are like these observers generated by random fluctuations in the universe. Those arguments are actually quite structurally similar to the simulation argument. It’s roughly like, there’s this big class of observers. If most observers of X type are Boltzmann brains, then you should think you’re a Boltzmann brain. I’m just out on being a Boltzmann brain. I’m like, “Sorry guys, I’m not a Boltzmann brain.”
Rob Wiblin: Yeah, we haven’t really defined Boltzmann brain here. It’s something about how there’s so much matter bouncing around in gas clouds and in stars that you could accidentally end up with something that functionally resembled the mind of an observer like us for a fleeting moment in those structures. I read about this a few years ago, and I was just like, “I don’t have space for this in my life.” At least the Wikipedia article at the time said that most scientists think that this is a sound argument and that we should expect most observers to be like this, and yet nobody thinks that they are a Boltzmann brain of this kind. I was like, “I’ve got better things to do than read more about this.” I feel confident that I’m not a Boltzmann brain. It seems like nobody thinks that they are either. But yeah, it’s very interesting that it has a somewhat similar character.
Joe Carlsmith: Yeah. One disanalogy I think is relevant — and it’s part of what makes me feel very confident that I’m not a Boltzmann brain — is that all the Boltzmann brains, or almost all of them at any given moment, are about to disintegrate. It’s just like random fluctuation, and then they’re about to fall apart. If you think you’re a Boltzmann brain, you should be highly confident that like, this is the last moment.
Rob Wiblin: You only exist for an instant.
Joe Carlsmith: Yeah. And also that your memories are probably fake. I mean, it’s a little complicated.
Rob Wiblin: I’m just out.
Joe Carlsmith: I’m not saying that’s too weird. I just think it’s wrong. I’m just not a Boltzmann brain. So we need to figure that out. I think the sim stuff is sufficiently similar that I think that’s another red flag.
There’s just a bunch of ways the sim argument could be wrong. I don’t think we should be like, “Let’s hang our hat on this.” I do think this is an unusually strong argument for an unusually dramatic revision in our basic understanding of our existential orientation. You don’t come across that every day. I think if you do, we should sit up straight. We should be like, “There’s something to learn here.”
Rob Wiblin: “We should get someone on this one.” Yeah, I like that when I asked you what doubts you had or what reservations you had, you brought infinite ethics to challenge it. I feel it’s like trying to put out a fire with a flamethrower.
As you’re saying, it’s an unusually strong argument for an unusually large revision of our understanding of our reality. Of course, if being in a simulation made no practical difference to anything that we ought to do, then this might be a fun discussion topic, but not something that we should spend that much time thinking about. But yet, what might being in a simulation imply about what actions we ought to take, if any?
Joe Carlsmith: I thought much less about the practical implications of the simulation stuff than just about the argument itself. So nothing here is especially confident. I also think it’s worth distinguishing between what would the practical implications be if you knew that you lived in the sim, and what are the practical implications in our actual epistemic circumstance — where we have some uncertainty, maybe some newfound uncertainty and confusion about the possibility of living in a sim, but are very far from knowing.
I think on the former thing, if you knew you were living in a sim, I mean, it’s sort of surprising if it makes no difference, right?
Rob Wiblin: It would be outrageous.
Joe Carlsmith: I think people are sometimes too eager for it to make no difference. I think they hear this argument and they’re like, “OK, so tell me, should I buy different groceries? No? I thought so. OK, I’m out,” or something like that. There’s sometimes a lack of curiosity. And maybe it’s worth thinking about it for more than two minutes, about whether this would make a difference if your existential situation was radically different than what you thought it was before.
So ways it can make a difference: Maybe most salient to me is in the context of questions like longtermism, I think living in a sim would make it more likely that the future is smaller than a standard longtermist picture would suggest, because it’s maybe less likely that all of that was going to be simulated. And more broadly, stuff about what simulation and who are the simulators?
All sorts of that stuff is going to become more relevant. I think it’s more productive to actually ask the question of not, “Conditional on being in a sim, what would you do?” but more like, “Given where we’re currently at, does it make a difference?” There I think it’s plausible that there’s some complexity, but a decent first pass from my perspective is that — at least from an impartial altruistic perspective — I think acting like you are a non-sim or what’s called a “basement person” is a reasonable policy.
Rob Wiblin: And who came up with that term?
Joe Carlsmith: I don’t know.
Rob Wiblin: Sorry, I shouldn’t interrupt, but for ages I thought basement person was like someone being run in a server farm in a basement. But no, a basement person is someone in the basement universe, that is like the fundamental one, rather than someone who’s living in the higher floors, which are like simulations.
Joe Carlsmith: That’s right.
Rob Wiblin: I feel like we need to go back and change the term. But sorry. Yes, carry on.
Joe Carlsmith: That’s right. OK, yeah. Acting like you’re a basement person I think is a reasonable first-pass policy on this stuff. I think there’s a number of reasons that’s true, but one is that the amount of impact you can have if you’re a basement person is still really unusually large from an impartial perspective. If you ask yourself questions like, “What would I want people who make observations of being in that influential position to do?” Even granted this stuff about simulations, I think that a reasonable answer is that you’d want them to be especially concerned with the scenario where actually they’re in this influential position, and what they do makes a really big difference.
I don’t think that’s the whole story. I think there is more complexity there, but I think that’s a first pass that does sort of dull some of the, “Oh my god, this upends everything.”
Rob Wiblin: The idea would be that if you are in a simulation, then from an impartial point of view, probably the stakes are much lower — because at some point the simulation is going to be turned off, and there’s just only so much that can be done within this much smaller world. Whereas if we’re actually in the real underlying world, not a simulation, then, well, all of the stars really do exist, and time could go on forever. We’re not going to be shut down, so the stakes are much higher. So we should act as if we thought that we probably were in that world, because it just matters more. That’s kind of intuitive.
It would be amazing if this radical rethinking of the nature of existence perhaps would have so little impact on what we ought to do. I suppose maybe from a selfish point of view, it might change things a little bit more. Maybe you want to try to keep things spicy and interesting so they don’t shut down the simulation. I think that’s one thing that people have suggested, but it sounds a bit flippant.
Yeah, is there anything more to say on this?
Joe Carlsmith: Well, I think there’s quite a bit more to say, and I haven’t thought about this enough to have a confident take. I do agree that it’s surprising if it makes no difference. But especially if you’re just talking about, like, “I have some new uncertainty about whether I’m in a sim,” it’s not that surprising, given that it’s like a lower-stakes environment, at least from an impartial perspective.
There’s a variety of other considerations in a somewhat more complicated vein that might suggest that the kind of policy you would have wanted to commit to ahead of time is kind of “being the basement people you would have wanted to see in the world,” or something like that.
Again, I don’t think that’s the whole story, but I think it’s like a decent first pass from my current perspective.
Rob Wiblin: I suppose if you think many different simulations with minor alterations are being run, then we might bring in this other issue as well: that what you do might affect what your twins or very similar people in these alternative simulations will do, which then increases the stakes again.
But let’s just set that aside. There’s this other idea I’ve heard circulating, that the more weird and historically significant your life is, the more likely you should think it is that you might be in a computer simulation. I think I actually saw a comic one time about Elon Musk, and Elon Musk was like, “Obviously I’m living in a computer simulation. Look at all of the crazy stuff that’s happened in my life. Does any of this sound like something that actually happens in the real world? Obviously this is all being run just for entertainment purposes for someone.” I think that actually does have some intuitive appeal, that wouldn’t you be more likely to run simulations of stuff that was exciting, rather than stuff that’s really boring? But you don’t think that this goes through. Is there a way of explaining why?
Joe Carlsmith: The specific claim I want to make is that this doesn’t fall out of the classic simulation argument that I discussed earlier. I think it’s possible that you can get some update towards being in a sim from being an unusually influential or weird person or something like that, using more complicated anthropic principles. But that’s going to get into substantially gnarlier philosophical territory — which, in my experience, people who are compelled by this argument don’t especially want to explore.
My view is that this argument exerts a kind of influence on people’s thinking out of proportion to its having been worked out and making sense. To give an intuition for why I don’t think this falls out of the classic argument: basically, in the classic argument, all of your credence on being a non-sim is coming from worlds where there are very few sims like you. So let’s round that off to no sims.
A toy example would be: suppose there’s only one planet — it’s just Earth, and that’s the whole universe. And there’s two types of hypotheses: One is there’s Earth, and then we go extinct, and no one runs any sims — that’s World A. And in World B, we go on and we run 1,000 simulations of all of our early history and a million simulations of Elon Musk in particular, let’s say. So it’s true that Elon Musk is being simulated much more in World B than, let’s say, a janitor in Des Moines in 2023.
But what should their credences be? Well, what the simulation argument is saying to both the janitor and to Elon is that basically all of your credence on World B needs to go to being a sim. If you wake up as a janitor or as Elon, then conditional on World B, you’re probably a sim. But once you assign credences to World B, it doesn’t follow from the original argument that Elon should have any higher credence in World B than the janitor should have, because the original argument is not trying to get you to make anthropic updates of that kind. All it’s saying is that, conditional on the world being a certain way such that most people like you are sims, then you should think you’re probably a sim. That’s true of both Elon and the janitor. It’s true that Elon is a bigger factor, but it’s all squeezed into your credence on World B, such that conditional on World B, the janitor is like, “99.9% that I’m a sim,” and Elon is like 99.99999% — but it’s all squeezed in that 50%, so the nines don’t make a difference.
So that’s an intuition for why I think it doesn’t fall out of the original argument. You can also get this a little bit from if you just reflect on… I think a lot of what’s driving people when they’re like, “Oh my god, if I was Elon…” or “Oh my god, I’m working on AI. That’s so weird. I must be in a sim,” or something like that, is the sense of how this is a very weird and unique position in human history. Being an early person at all, living in the 21st century at all, is also a very weird and unique position in human history. And I think we’re not used to that. You think being Elon is this very unique thing, because you’re only looking around at the people today, and Elon seems unusual relative to the people today. But if you look at a vast cosmos, where almost everyone is like a posthuman observer or something like that, being early at all is very, very weird.
So in general, I think we should be a little surprised if there’s some kind of qualitatively different update from being Elon than from being an early person at all, at least without getting into some kind of additional anthropic gnarliness.
Rob Wiblin: OK I must admit I didn’t entirely follow the argument you made there, but I think for time let’s leave it and if people want to dive in they can take a look at your thesis.
Rob’s seven-minute explainer about anthropics [01:12:27]
Hey, listeners, Rob here. Joe mentioned the anthropic principle and anthropic reasoning just a minute ago, and I thought this might be a good moment to take you aside for five or ten minutes to explain why what he’s talking about there, because I think that will make what comes later a bit easier to understand. Of course, if you feel like you don’t need this little intro to anthropics, feel free to skip ahead, of course, to the next section.
So anthropics is this little field of philosophy that covers what can you learn about the world? In what way should you update about various facts about the world from the fact of your existence? Among other things, it tries to address the issue of observer selection effects, which refers to biases that are created because there are filters that affect who, if anyone, is going to be able to observe something in the first place. Or perhaps there could be a filter that causes many people to make a particular observation, even if it’s relatively rare.
As usual, it might be easiest to explain just by going into some concrete arguments. A place that you might have encountered anthropic reasoning and the anthropic principle is in response to the fine tuning argument. So the fine tuning argument is used to suggest that the existence of life in the universe can’t just be a matter of luck, but rather it has to be the result of careful fine tuning. And in doing so, it points to certain physical constants, like the strength of gravity or the charge of an electron, which seemed to be really precisely set. And if those constants were just a little bit different, then life as we know it wouldn’t be possible, and maybe no sort of life would be possible. So this argument suggests that this precision is unlikely to have occurred just by chance, which might imply that the universe must have been intentionally designed or fine tuned to support life.
And the anthropic argument is sort of a response to this. And the anthropic principle points out that our ability to observe and ponder the universe depends on the fact that we have to exist in the first place. So, in other words, we shouldn’t be surprised to find ourselves in a universe that supports life, because if it didn’t, we wouldn’t be here to notice.
Of course, people invoking this anthropic argument might talk about the possible existence of a multiverse, where there’s many, many different universes, each with different physical constants. And if there are enough of those universes, then it might not be surprising that at least one, like ours, is capable of supporting life, even if most don’t. This wouldn’t require any intentional fine tuning. And, of course, in such a situation, observers would always find themselves in universes, in parts of the universe that could support life. That couldn’t be otherwise.
I’ll just leave you with another thought experiment that isn’t quite analogous, but it’s similar and thought provoking. You can imagine that someone is up against a firing squad, and so they’re expecting to be killed, and there’s ten different shooters who are all going to shoot them and kill them. Imagine that they find that, miraculously, all ten guns in the firing squad have failed. They all had some mechanical error, and so they didn’t die, they weren’t executed. And perhaps, taking this as a sign from God, they decide not to proceed with the execution, given that all ten guns have mysteriously failed to fire.
Could the person then say that they’re surprised, that they’re shocked to observe that all ten guns failed? And this must be a great mystery that calls for explanation? Or does that not make sense because, of course, they couldn’t observe the situation at all unless that had happened?
I’ll just leave that there. Let’s move on to another thought experiment from the field of anthropics, which, again, kind of studies how we should reason and update our beliefs when we get information about our position within a particular group or a set of observers.
Here’s the basic setup of the Sleeping Beauty problem. This is one that I really love. So Sleeping Beauty volunteers for a scientific experiment. On Sunday, she is going to be put to sleep, and a fair coin is then going to be tossed. If the coin comes up heads, she’s going to be awakened and interviewed on Monday, and then the experiment will end. On the other hand, if the coin comes up tails, Sleeping Beauty is going to be awakened and interviewed on Monday, then put back to sleep, and then awakened and interviewed again on Tuesday. The memory of the Monday awakening is always erased so she can’t remember it during the Tuesday awakening. And then after the Monday and the Tuesday awakening, then the experiment will end. So during each awakening, Sleeping Beauty doesn’t know what day it is or whether she has been awakened before.
So now, whenever Sleeping Beauty is awakened and interviewed, she is asked, “What is your credence now for the proposition that the coin landed heads?” That is to say, what probability would you assign to the idea that the coin landed heads given that you’ve just been awakened? There’s two main positions in response to this problem. You have the halter view, and halfers would say that Sleeping Beauty should assign a 50% probability to the coin landing heads. The reasoning being that the coin toss was a fair one, so it has an equal chance of landing heads or tails, irrespective of how many times Sleeping Beauty is awakened. Sleeping Beauty is just irrelevant. The answer has to be 50%.
The alternative view, which you might be beginning to suspect now, is called the thirder position. And thirders would say that Sleeping Beauty ought to assign a one-in-three probability to the coin landing heads. The reasoning there is that there are three possible awakening events. There’s the case where the coin landed heads and she’s being woken up on a Monday. Or there’s the case where the coin landed tails and she’s being woken up on a Monday. And then of course there’s a case where the coin landed tails and now she’s being awoken on a Tuesday. And it’s each of those awakening events that are all equally likely. So given that only one of those three awakening events features the coin coming up heads — that is the case where a coin can’t land at heads and she’s waking up on a Monday — then the probability of heads is just one in three, not one in two.
So yeah, the Sleeping Beauty problem is interesting because it’s not clear which position is correct. And people have conflicting intuitions and even professionals kind of disagree on this. Personally, I think that the thirder position is correct, or that’s the one that feels right to me, that the probability of heads is one in three, not 50%. But that does have the interesting implication that things are more likely to be true if they create more observers who are capable of observing them. The underlying issue here with all of this being what can you update about given the fact of you observing and considering a question?
OK, I think that’s enough of an introduction to anthropics. I’ll stick up a link to a really nice video from a YouTuber called Veritasium who made a video about the Sleeping Beauty problem, and it describes a little bit more the underlying issues at stake and the implications. But I think you could get it if you just typed in “Sleeping beauty problem veritasium.” All right, with that out of the way, let’s get back to the interview.
Joe Carlsmith: One other thing I want to say on the Elon argument — or in my blog post, I call it the “But aren’t I improbably cool?” argument. And this is related to the thing about how would you want people with different sorts of observations to act. I think it’s a kind of scary and worrying prospect for the most powerful and influential people in the world to think that they’re in simulations in proportion to their power and influence. I think that’s just not the policy we would want people to be taking. Certainly for the people who are looking at them and going, “Here’s this person living in this real world. They’re not in a sim. This is not some solipsistic thing. We’re watching you and you think you’re a sim. That’s horrible. And I’m scared of what you’re going to do.”
I think that you’ve just got to go so cautious with that stuff. I just think there’s a direct intuition here. It’s also related to the point I wanted to make about what sort of basement people would you want people to be — and kind of defaulting to that pretty hard for a lot of different reasons. I just wanted to flag that, because I think it’s an additional source of hesitation.
Rob Wiblin: Yeah, definitely. I mean, it’s fine for the janitor in Des Moines to think they’re in a simulation. You really don’t want the president to think that they are and just start treating it like that.
Joe Carlsmith: Exactly. Like these people who are there at pivotal moments, and there is Petrov, and he’s like, “This is such a pivotal moment. No way that I’m in a position to influence what happens with the Cold War. I must be in a sim. Ha ha ha. Maybe I’ll try to be interesting so it doesn’t get shut off.” It’s like, “No, Petrov. Wrong.”
Rob Wiblin: “Stop, stop, stop.”
Joe Carlsmith: Yeah, so I think that matters a lot.
I just want to add another hesitation I have about the sim argument as a whole — though this is more inchoate and actually draws on an anthropic principle that I don’t otherwise endorse, so it’s a bit complicated. But I think the thing that people don’t appreciate about the sim argument is that the most salient hypotheses — where there are lots of sims, or where in particular there are many more early-seeming sims than early-seeming non-sims — those hypotheses also involve there being vastly more late-seeming people than early-seeming people.
A “late-seeming person” is just someone who finds themselves existing in a technologically mature civilisation. So even if it’s the case that an advanced civilisation devotes some fraction of its resources to running sims of early-seeming people, it still seems very likely by default that most of what’s going on is —
Rob Wiblin: Stuff in the present.
Joe Carlsmith: — having a posthuman civilisation — so people who are observing that they live in an advanced civilisation, like normal citizens, workers, all sorts of things. I think sometimes people want to use the simulation hypothesis as a way to be less surprised that they’re early or that they’re early-seeming or that they’re Elon. But you should be surprised that you’re early-seeming, even if you’re a sim, because the vast majority of people are not early-seeming at all, at least on a certain sort of anthropics. I actually think that anthropics is complicated, but I do feel the pull and some of the intuition.
Joe Carlsmith: I guess I just want to flag that as a persistent source of uncertainty for me, where even if you start to say you’re a sim, you should still be pretty confused. Like, “Why aren’t I a posthuman, flying around in a spaceship like approximately everyone else?” And that’s the sort of logic that leads people to what’s called the “doomsday argument” of saying maybe there are no posthuman civilisations at all. And there’s a bunch of complexity there, but I just want to flag that as an additional confusion that’s lurking for me, and that I think should give us pause.
Rob Wiblin: Yeah. OK, we definitely haven’t done this topic full justice, but we’ll link to the chapter from your thesis that’s about this and some other good resources for people who are interested to hear more about this kind of reasoning, and I suppose people’s reservations about it.
Decision theory and affecting the past [01:23:33]
Rob Wiblin: Let’s push on to a different kind of surprising bit of philosophy which might be important, or maybe not — decision theory.
Decision theory is this kind of established field in philosophy that shouldn’t really by rights be all that exciting or all that strange. But some people working on related issues, including you, have mounted an argument that I think is gradually gaining some popularity: that despite what you might think, our actions can impact parts of the universe so far away that we could never causally interact with them, or we might be able to in fact affect parallel universes that we can’t causally interact with, or maybe we could even do actions that might be able to influence what happened in the past. Which I have to say, to me, really does feel that’s really out there.
You open an essay on this with this beautiful pithy paragraph:
I think that you can “control” events you have no causal interaction with, including events in the past, and that this is a wild and disorienting fact, with uncertain but possibly significant implications. This post attempts to impart such disorientation.”
Can you explain the basic setup here?
Joe Carlsmith: Sure. So I think the sense in which I think you can control things that you don’t causally interact with is a specific sense of control, and specifically it’s not cause — we’re not impacting in the sense of causation; we’re impacting in the sense of you should think of your actions as influencing what happens over there. That that is, for all intents and purposes, a kind of reasonable way to make decisions, and makes an important difference.
And a way to bring that out, the experiment that convinces me most is: Imagine that you are a deterministic AI system and you only care about money for yourself. So you’re selfish. There’s also a copy of you, a perfect copy, and you’ve both been separated very far away — maybe you’re on spaceships flying in opposite directions or something like that. And you’re both going to face the exact same inputs. So you’re deterministic: the only way you’re going to make a different choice is if the computers malfunction or something like that. Otherwise you’re going to see the exact same environment.
In the environment, you have the option of taking $1,000 for yourself: we’ll call that “defecting” — or giving $1 million to the other guy: we’ll call that “cooperating.” The structure is similar to a prisoner’s dilemma. You’re going to make your choice, and then later you’re going to rendezvous.
So what should you do? Well, here’s an argument that I don’t find convincing, but that I think would be the argument offered by someone who thinks you can only control what you can cause. The argument would be something like: your choice doesn’t cause that guy’s choice. He’s far away; maybe he’s lightyears away. You should treat his choice as fixed. And then whatever he chooses, you get more money if you defect. If he defects, then you’ll get nothing by cooperating and $1,000 by defecting. If he sends the money to you, then you’ll get $1.001 million by defecting and $1 million by cooperating. No matter what, it’s better to defect. So you should defect.
But I think that’s wrong. The reason I think it’s wrong is that you are going to make the same choice. You’re deterministic systems, and so whatever you do, he’s going to do it too. In fact, in this particular case — and we can talk about looser versions where the inputs aren’t exactly identical — the connection between you two is so tight that literally, if you want to write something on your whiteboard, he’s going to write that too. If you want him to write on his whiteboard, “Hello, this is a message from your copy,” or something like that, you can just write it on your own whiteboard. When you guys rendezvous, his whiteboard will say the thing that you wrote. You can sit there going, “What do I want?” You really can control what he writes. If you want to draw a particular kitten, if you want to scribble in a certain way, he’s going to do that exact same thing, even though he’s far away and you’re not in causal interaction with him.
To me, I think there’s just a weird form of control you have over what he does that we just need to recognise. So I think that’s relevant to your decision, in the sense that if you start reaching for the defect button, you should be like, “OK, what button is he reaching for right now?” As you move your arm, his arm is moving with you. And so you reach for the defect, he’s about to defect. You could basically be like, “What button do I want him to press?” and just press it yourself and he’ll press it. So to me, it feels pretty easy to press the “send myself $1 million” button.
But I think this is really weird. This is a really different type of way of thinking about your influence on the world, with some implications. So that’s the case that convinces me most.
Rob Wiblin: Yeah, it’s the case that convinces me that there’s something real here that deserves further investigation. Because in the standard prisoner’s dilemma, you use the reasoning that you were saying, where you’re like, “No matter what they do, it’s better if I take the money for myself.” But that feels just completely wrong here. You know that there’s only two options: you send them $1 million and they send you $1 million, or you keep $1,000 and they keep $1,000. The idea that you could have a thing where you press one button and they press the other one doesn’t make any sense, because you’re this deterministic system that does exactly the same thing in the two locations, because you’re getting exactly the same inputs. It kind of seems to break the standard causal decision theory, the standard causal analysis that we would use.
But then, as you’re saying it, we can’t say that you’re causing them to do this in the normal sense, because we normally think of causation as occurring in this natural way — where you move an object or you send out some energy and then it travels in the intervening space and then hits them. And that’s not happening. Nonetheless, what you do changes what they will have done or what they are doing in a way that is decision-relevant to you, and that fact should change what you do.
Rob’s 10-minute explainer about decision theory [01:29:19]
Hey, listeners. Rob here. When I planned out this episode, I hoped that we’d be able to do this section on decision theory without properly going back and explaining fully what decision theory is, and what causal decision theory is, and evidential decision theory and the main kind of thought experiments that motivate this whole area of philosophy. I thought maybe we could skip over that because we just basically didn’t have time. But it turned out, listening back on it, I think there’s a big risk that people are going to get confused given that we skipped over so much of the kind of introductory material that we would have gone over if we had set aside more time for it.
So I am going to fix that here and talk for five or 10 minutes about introducing this whole topic so that you’re all more likely to be able to follow what comes next. If you feel like you’re pretty across these issues and don’t need an introduction, then feel free to skip to the next chapter to get back to the conversation.
OK, so decision theory is this field in philosophy that is basically the study of how a rational agent would decide what to do. And the most normal, the most natural, the most basic, perhaps for most of us, the default answer to how a rational agent would make decisions is what philosophers call causal decision theory, or CDT. And CDT says that when a rational agent is confronted with a set of possible actions, they should select the action which causes the best outcome in expectation.
So, just to give a super simple example, suppose you’re deciding whether to take an umbrella when you leave the house and you look at the weather forecast, and it says that there’s an 80% chance of rain. Using causal decision theory, you would think, well, if I take an umbrella, then that will cause me to not get wet. If I don’t take an umbrella, then I might get wet. That would be the causal outcome. So in this situation, taking the umbrella seems to be the better outcome. So CDT suggests that you should take the umbrella with you if it might rain. So, very natural. It’s probably the way that most of us are thinking most of the time. And it probably wouldn’t occur to us until we did a bit of philosophy that that is really a particular theory, and that there might be counterexamples to it.
I’m going to quickly go through three possible counterexamples to CDT here, and explain the main alternative that people potentially reach for, which is evidential decision theory, or EDT. So here’s a simple setup. The thought experiment here is called the psychopath button. So Paul is debating whether to press the “kill all psychopaths” button. It would, he thinks, be much better to live in a world that had no psychopaths. Unfortunately, Paul is quite confident that only a psychopath would press such a button. And Paul very strongly prefers living in a world with psychopaths to dying.
So should Paul press the button then to kill all psychopaths? Causal decision theory would say that he should press the button, because pressing the button doesn’t cause him to be or to become a psychopath. So, so long as Paul currently thinks that he’s not a psychopath, then he’s in the clear. He can press the button to kill all psychopaths, make the world better from his point of view, and there’s no risk that he’ll be killed in the process because he thinks that he’s not a psychopath.
But almost everyone thinks that there’s something, or like many people think that there’s something off about this, because surely pressing the button, as Paul suspects, is evidence that he is a psychopath, because he thinks that only a psychopath would be willing to press the button. So this is a case where it seems like causal decision theory is leading to the wrong answer: that it’s fine for Paul to press the button, that he doesn’t have to worry about being killed if he does so.
Here’s another thought experiment that is a little bit more famous, in which if you start reading about this topic, you’ll encounter again and again. It’s called Newcomb’s Paradox. Here it is in a sort of simplified form. So imagine that there’s a game show where you’re presented with two boxes, Box A and Box B. Box A is transparent and always contains $1,000, whereas Box B is opaque, and it will either be empty or it’s going to contain $1 million. Now, on this game show, the game show host, who has a near perfect track record of predicting contestants’ choices, that host has already made a prediction about what you’re going to do. If the host predicted that you would take the money from both boxes, then they left Box B empty. On the other hand, if the host predicted that you will only take box B, then they put $1 million inside it. So now you’re on the stage on this game show, and you have two choices: take both boxes, take the money from both boxes, Box A and Box B; or only take Box B.
Now, if you stop and think about it, from a causal decision theory perspective, you should take both boxes, because your decision now won’t change what’s in the boxes. They’ve already been filled based on the host prediction. You could say that the host predicted this a very long time ago, weeks ago. And they’re definitely not going to fiddle with what’s in them, and they’re not going to let you on the game show ever again, say. So, given that, you might as well take the definite $1,000 from Box A, along with whatever is in Box B. Maybe it’s a million dollars, maybe it’s nothing. But either way, from a causal decision theory point of view, you’re better off taking both boxes.
But there is an alternative approach that one could take, and one alternative that’s been proposed is evidential decision theory. And evidential decision theory says that when a rational agent is confronted with a set of possible actions, they should select the action with the highest news value. That is to say, the action which would be indicative of the best outcome in expectation, if they received the news that that’s the decision that they had taken. In other words, it recommends to do what you would most want to learn that you’re going to do or had done.
And from an evidential decision theory perspective, you should only take Box B. That’s because your decision to take only Box B is evidence that the host predicted that, predicted that you would only take Box B, and therefore they put $1 million in it. So rather than go home with $1,000 in Box A, instead you’re going to go home with $1 million, the $1 million that was put in Box B, because the host predicted that you would only take Box B. Now, even though your choice doesn’t causally affect the contents of the boxes, it’s correlated. Your choice to take Box B is correlated with a higher reward.
Now, philosophers, and I think just people in general who encounter this thought experiment, are kind of split. As Joe and I, in a minute, we’re going to talk about. They’re kind of split between thinking that one-boxing is the right decision or that two-boxing is the right decision. It brings up conflicting intuitions in both directions.
Here’s another thought experiment that’s often used to suggest that there’s something wrong with causal decision theory and that maybe we should use evidential decision theory or something else instead. This one is called the smoking lesion problem. Here’s the setup. Suppose that there’s a type of lesion that causes both a strong desire to smoke and lung cancer. So smoking itself in this situation doesn’t cause lung cancer, but the presence of the lesion leads to both the desire to smoke and an increased risk of lung cancer. So people who smoke are more likely to get lung cancer, but only because they’re more likely to have this lesion that gives them the desire to smoke.
Now, imagine in this situation you’re deciding whether or not to smoke. From a causal decision theory perspective, you might decide to smoke. After all, in this scenario, smoking doesn’t cause lung cancer, and if you have the lesion, you’re going to be at risk of lung cancer whether you decide to smoke or not. But from an evidential decision theory perspective, you might prefer to choose not to smoke. That’s because if you choose to smoke, that’s good evidence that you have this lesion, which would mean that you’re much more likely to get lung cancer.
So in this scenario, to many people, and I think, to me, evidential decision theory here seems to give more reasonable advice, that is not to smoke, because your decision to smoke is correlated with a higher risk of lung cancer and a shorter life expectancy, even though it doesn’t directly cause that lower life expectancy. So some people use this case of the smoking lesion to argue in favour of evidential decision theory over causal decision theory.
On top of those two, people have come up with other decision theories in more recent years to try to get out the seeming right answers in all of those cases, as well as to deal with additional perverse outcomes in other sorts of thought experiments. Two of those are called functional decision theory and updateless decision theory. But I haven’t read about either of those, so I can’t really say what they’re about. But you could google them if you’re really interested.
Now, one thing that’s going to come up a little later in the conversation with Joe is that most people who think that causal decision theory, that CDT, is in some sense the correct philosophical decision theory, it’s correct from some philosophical point of view, they’re going to want to commit themselves to follow a different strategy in particular cases, and perhaps in quite a lot of cases. So you could imagine a philosopher who is about to go on the game show that we were talking about earlier with Newcomb’s problem, where they put a lot of money in Box B, if and only if they predict that you’ll choose to only take the money in Box B and leave Box A there.
That philosopher, let’s say, might think that causal decision theory is correct, but before they go on the game show, they would love to somehow change how they make decisions, so that they, in practice, would only take Box B, because then they can expect that there’ll be $1 million in there. Now, humans can’t easily just look inside their minds and change how they make decisions in this way by sort of reprogramming themselves. But yeah, it is a little bit strange that causal decision theorists would say that in a situation like this, and quite a lot of other hypothetical situations that people have dreamed up, that they should try to make themselves not follow causal decision theory and instead follow evidential decision theory, or a different approach instead.
OK, that’s been a long interjection. I hope it’s been interesting, or at least amusing. And now we can get back to the interview.
Rob Wiblin: Yeah, it’s really out there. A sceptic might respond saying, “Sure, this is all well and good if you’re a deterministic robot who can be perfectly copied and moved about, but humans aren’t like that. This actually has kind of limited relevance to the real world, or at least to us.” What would you say to that?
Joe Carlsmith: A few things. First, I think any amount of magic is weird, right? If someone’s a causal decision theorist, it feels like the argument that you can only affect things that you cause should just apply universally. It shouldn’t be like, “Oh, well, there’s a few cases where you can affect things you don’t cause.”
Rob Wiblin: “Well, not robots, but…”
Joe Carlsmith: I think even if this consideration only applies in this sort of case, I’m like, well, we should be sitting up straight. There’s something weird about that.
But I also just don’t think that. To start to get a grip on this, maybe we can start by weakening the case, where you imagine that the inputs are subtly different. Like, let’s say you’re wearing a green hat and he’s wearing a red hat. You see this. At some point, people see their hats. OK, but you’re still really similar. So if, let’s say, an outside observer learned that you cooperated, that observer is going to update very hard about what he did because you’re so similar. It’s just a lot of evidence about what he’s going to do if you cooperated.
So I think that the same argument persists. Basically you should make that update too; you should be incorporating that information into your decision. If you cooperate, then you should be expecting him to cooperate, even though he’s seeing slightly different inputs. You’re still just so similar, he probably did the same thing, he probably thought about this stuff in the same way. It’s not as though the colour of the hat makes much difference to how you treat this. I mean, you could be that type of person, but that’s like we have to argue that type of person, and you kind of get to decide.
So I think the case is going to work in a very similar way as long as your actions are evidence about what he does. I think that the basic problem with causal decision theory, which I think is important to see clearly, is that the whole thing causal decision theory does is ignore that sort of evidence. So it assigns a fixed probability to him cooperating or defecting, and then treats that as independent of what you do. And it’s just not independent.
I think causal decision theory, or CDT, is just using the wrong probabilities: it’s acting with expected utilities that you shouldn’t actually expect, and I think that’s just kind of silly when you look at it clearly.
Rob Wiblin: Yeah. Just to clarify for listeners, causal decision theory is this kind of answer to how you should make decisions in this kind of situation. I’m not sure that I could exactly technically define it, but it’s the one that says you should think about what causal impacts you’re having on things and set aside other considerations. It follows the sort of reasoning that says whatever the other person, whatever the other copy, is going to do, just hold that fixed and then decide what you should do given that. And then you can just choose the one that is better no matter what they choose.
There’s various other approaches here, that probably we won’t get into in detail, that in various different ways try to take into account what you might learn from your own actions, or what process you might have wished to commit to using in various different cases.
I guess as you’re saying that the answer seems relatively obvious, at least to the two of us, in the case where it’s your exact copy who is deterministic and sees the exact same evidence. But it’s not as if this goes away if you start making minor changes — like the room is slightly different, the temperature is different, you’re wearing a different hat — as long as your decision is very correlated with what they’re going to choose, such that you’re learning a lot about what they will probably have chosen from what you do, then there’s still bites to some extent.
So it breaks or it weakens gradually over time as the situations become more and more distinct. That’s how it can potentially — even before we have any deterministic AI systems or whatever — can still be relevant to humans, because we’re in a bit of that situation. Like, what if your best friend is the other one out there? You tend to agree all the time. You have very similar interests. You tend to make the same decisions. You want to do the same stuff. Aren’t you learning something about what they will probably have chosen in this case from what you do, because you so often agree?
It suggests that this like spooky thing, this spooky crazy thing, is happening all the time. Do you want to take it from there?
Joe Carlsmith: Sure. I do want to be cautious in how far we extend these things to real-world cases, because I actually think it gets complicated. In particular, the more that your decision making is driven by explicit thinking about this decision theoretic dynamic that can be itself a source of decorrelation with other people. So if the other people aren’t thinking in this way — like if your friend never thinks about decision theory, for example — then it’s going to be at least quite a bit less clear how correlated your decision making is with his. I think that’s important for a bunch of human cases. I also think in many human cases we just have much stronger forms of evidence about what to expect other people to do.
This is quite a weak effect. I think we should be quite cautious in extrapolating to everyday cases. I think it’s at least a substantially open question how far you want to bring these into more real-world cases, like on Earth and with people you know.
I just want to say that, because I think it is important and people can misapply this stuff and get out there and think of themselves as having more of this control than they do. I think a reasonable way to test that is just to ask yourself: How much would you update about what the other person does on the basis of you’re doing this thing for this reason? I think it’s often not actually very much.
Rob Wiblin: Yeah, that makes a lot of sense in normal cases. I suppose in the robot case, you imagine that they’ve just woken up, and this is the only thing that they’re learning. It’s almost the only information they have. It’s also just incredibly dispositive evidence, incredibly persuasive evidence, because they’re identical. Whereas in a normal case, you already know a lot about your friend, and they’re different from you. Which might help to explain why we don’t have this intuition about what things we can affect. Our intuition is, most of the time, about this causal thing: that we think the only way we can change things is to influence them physically somehow. This kind of influence feels very spooky and counterintuitive until you really pin it down in a case where you can’t get away from it.
Newcomb’s problem [01:46:14]
Rob Wiblin: What’s another thought experiment here that might help us to understand this better?
Joe Carlsmith: The classic thought experiment that people often focus on, though I don’t think it’s the most dispositive, is this case called Newcomb’s problem, where Omega is this kind of superintelligent predictor of your actions. Omega puts you in the situation where you face two boxes: one of them is opaque, one of them is transparent. The transparent box has $1,000, the opaque box has either $1 million or nothing.
Omega puts $1 million in the box if Omega predicts that you will take only the opaque box and leave the $1,000 alone (even though you can see it right there). And Omega puts nothing in the opaque box if Omega predicts that you will take both boxes.
So the same argument arises for the CDT. For CDT, the thought is: you can’t change what’s in the boxes; the boxes are already fixed. Omega already made her prediction. And no matter what, you’ll get more money if you take the $1,000. If there was some dude over there who could see the boxes, and you were like, “Hey, see what’s in the box, and what choice will give me more money?” — you don’t even need to ask, because you know it’s always just take the extra $1,000.
But I think you should one-box in this case, because I think if you one-box then it will have been the case that Omega predicted that you one-boxed, because Omega is always right about the predictions, and so there will be the million.
I think a way to pump this intuition for me that matters is imagining doing this case over and over with Monopoly money. Each time, I try taking two boxes and I notice the opaque box is empty. I take one box, opaque box is full. I do this over and over. I try doing intricate mental gymnastics. I do like a somersault, I take the boxes. I flip a coin and take the box — well, flipping a coin, Omega has to be really good, so we can talk about that.
If Omega is sufficiently good at predicting your choice, then just like every time, what you eventually will learn is that you effectively have a type of magical power. Like I can just wave my arms over the opaque box and say, “Shazam! I hereby declare that this box shall be full with $1 million. Thus, as I one-box, it is so.” Or if I can be like, “Shazam! I declare that the box shall be empty. Like thus, as I two-box, it is so.” I think eventually you just get it in your bones, such that when you finally face the real money, I guess I expect this feeling of like, “I know this one, I’ve seen this before.” I kind of know what’s going to happen at some more visceral expectation level if I one-box or two-box, and I know which one leaves me rich.
That’s another thought experiment that I use. I think that case is complicated, and I discuss it in the essay, but I find that an additionally compelling intuition.
Rob Wiblin: Yeah, so this one is a bit more complicated and more controversial. People disagree a lot about what they think that they would do, or what they think it’s reasonable to do. I mean, you can see the intuition either way. However much money is in the boxes, I get more by taking the money from both of them, rather than just one of them. Some people have that feeling very strongly. And then other people think, yeah, but if I’m the kind of person who will take the one box, then there’ll probably be $1 million in the opaque box. That is the better choice because I will get more. How do you resolve this tension between the two?
I think many listeners might have this intuition that something weird is going on with this case. You’ve got this weird being, Omega. It seems like you’re smuggling in something spooky about this character, who can just always predict your actions somehow incredibly well, even if they were far further in the past. I think there’s a bunch of discussion about whether that is introducing something that should make us suspicious about the whole case.
Practical implications of acausal decision theory [01:50:04]
Rob Wiblin: OK, well let’s go back to the prisoner’s dilemma case. Can you talk about what implications that might have from a doing-good perspective? If you’re concerned about how the whole world goes, then how might this affect what we ought to do?
Joe Carlsmith: I think at a basic level, it seems like a sufficiently weird form of control that it just seems worth noticing and understanding prior to jumping too hard into practical implications.
The practical implication that’s most salient to me though, that’s quite strange, is less to do with how these sorts of ideas play into interactions between humans right now on Earth, and more to do with the sense in which they may expand the scope of what we can affect more broadly.
In particular, if you live in a very large universe — as I think is a live cosmological hypothesis — or alternatively, if there are other quantum branches that are real in the way that the many-worlds theory of quantum mechanics suggests, then normally we would think those parts of the universe and parts of reality are beyond the scope of what we can control, and so we can effectively ignore them.
Whereas if you take this acausal stuff more seriously, then suddenly it becomes more of an open question whether there are implications for what we do, for what we should expect things to be like in other quantum branches or other parts of the universe. I think it’s possible that there are important ways that that could be decision-relevant — though I think we should move very, very cautiously in kind of actually acting on that; it’s more like a possible line of inquiry to pursue.
Rob Wiblin: I guess one conclusion that might come from this is it seems to give you an extra reason to act with integrity, or to act to be a trustworthy person and to keep your promises. Because in so doing, you gain evidence about the niceness and the reliability of other people in cases where you don’t really know how reliable they are. At least inasmuch as you think that the decision-making procedure that you use to decide whether to stab someone in the back or not is similar to the decision procedure that other people are using to make similar decisions about you or other people who you care about, then maybe you really do gain some knowledge about how good beings are — how good humans are, how good agents are in the universe in general — by introspecting and thinking, “Well, how nice am I?”
Perhaps that’s one of the influences that we could have that might be most important: to just be really good, trustworthy, nice, cooperative beings — in the interest that that is going to demonstrate that is how beings in the universe tend to be, and that’s going to lead to a good outcome. Is that kind of a thrust that people have run with?
Joe Carlsmith: I think we want to move slowly. It can be easy to grab these ideas and read onto them whatever we hoped to say anyway. And these ideas do conjure a bunch of stuff in the vicinity of cooperation and integrity, and ways of combating destructive dynamics between more naively consequentialist agents, and stuff like that. I do think that’s real, but I also think sometimes the work of really working it out isn’t done.
That said, especially with respect to the kind of broader universe, I do think there’s stuff here that seems to me possibly relevant. In the sense that if humanity cooperates — and is nice in various ways to other value systems, or finds kind of pluralistic solutions to our own problems in general — I do think the amount of cooperation and niceness that we succeed at bringing to our own predicament on Earth is not just helpful for us; it’s also evidence about the nature of the universe as a whole. And it would be some amount of evidence about some other places in the universe: How well do they do? How much cooperation do they do? How do people who care about what we care about get treated?
It’s like an early-stage thought; I think there’s a lot to be done to work that out. But I think it’s at least a possible implication that we should be in some sense thinking about when we act: What does this do to my sense of how things are as a whole, everywhere — not just in my local environment? It may be that our niceness is evidence about that.
Rob Wiblin: Yeah. Another possibly more intuitive way of putting it is to imagine that we learned that humanity managed to resolve its disagreements and conflicts and avoid a nuclear war, and it kind of grows up as a species without any big calamities. Is that a reason to think that other similar civilisations — that might arise elsewhere in the universe at other times and other places — will also manage to do that, and avoid destroying themselves or avoid producing some bad outcome? It seems like the answer is kind of yes, that it’s at least some evidence for that, especially given how little we know about them. And then the additional bit is like: that’s a reason to do it, right? Because we’ll learn that the universe is better.
That feels a little bit stranger, but it does seem like it’s kind of implied by the cooperation thought experiment. A fuzzy bit here is we’re saying that we think that other agents’ behaviour is correlated with our own, and they’re using a correlated, similar decision procedure to decide what to do. And we’re like, “But how much? Can we measure this?” It feels like there’s some work being done there and we’re not being very precise.
Joe Carlsmith: Yeah, there’s a lot of work to do in really making out that thought. I’ve thought most just about the basic decision theory thing. And I want to be clear that I don’t have a worked-out overall decision theory even, and different decision theories will understand the prisoner’s dilemma case and Newcomb’s case in different ways. There’s a bunch of additional conversation to be had even about the very theoretical aspect of this.
But I am pretty convinced that you should cooperate in that twin prisoner’s dilemma case. I think that’s a pretty strong datum. We’re going to want a decision theory that validates that, and I think that is suggestive that there’s at least the potential for our influence to extend much more broadly than we’re used to imagining.
The hitchhiker in the desert [01:55:57]
Rob Wiblin: This might be a good moment to bring in another thought experiment in this vicinity, which is the hitchhiker in the desert case. Can you explain that one?
Joe Carlsmith: Sure. The idea here is you are a hiker in the desert. You’re dying of thirst and you need to make it to the city, otherwise you’re a goner. A selfish man comes along in his car, and he will take you to the city, but only if he predicts that when you get to the city, you’ll go to an ATM and give him $10,000. And he’s an extremely accurate predictor; he’s a sort of Omega-like character. Once you get to the city, he will be powerless to make you pay — you’ll just be able to run away and get your water, and you won’t need to pay him.
The problem here is that once you get to the city, then it’s this case where you know the outcomes: if you pay, then you’ll be less $10,000 and you’ll live; if you don’t pay, then you’ll have $10,000 more and you’ll live. So it’s very hard to get a decision theory that says that you should pay in the city if you’re just assessing the guaranteed payoffs in these cases. But that’s the type of thinking that gets you left in the desert — because he predicts that’s the decision you’re going to make in the city, and so he doesn’t take you to the city, and so you die.
So there are decision theories that are sort of designed for this situation in particular. Basically what they’re doing is specifically attending to what is the policy you would have wanted to commit to from some kind of prior epistemic perspective — in this case from the desert — and then executing that policy later, even when it wouldn’t have otherwise made sense.
So this type of thing is different from the twin prisoner’s dilemma in various ways. In particular, once you’re in the city, you know the outcomes — whereas in the twin prisoner’s dilemma you’re sort of changing your evidence about the outcomes. But they’re structurally similar in other ways, and I think they bring in additional interesting questions. I do think it’s very plausible that the thing to do is to pay once you’re in the city. I think that’s an additionally important datum about how we should be thinking.
Rob Wiblin: On its face this does feel like a different case, but the thing that it has in common is that it’s highlighting another problem with just following our standard causal analysis of what to do. Because the causal analysis says, let’s start at the end and work backwards: once we’re in the city, we shouldn’t pay, because we’d rather just keep the money. That’s our decision at that point. Then let’s work backwards, and it’s like, “OK, so now I should die in the desert.” You will die.
It’s another case where a kind of causal analysis of how to make decisions seems to be producing the wrong answer, or at least a questionable answer. Maybe that should cause us to reassess what things we think of ourselves as being able to influence with our decisions, or with the way that we make decisions.
Joe Carlsmith: Yeah, that’s right. I think it’s harder in this case to interpret the sense of influence well. You can end up saying, “OK, I’m in the city. But if I don’t pay, then it won’t be the case that I’m in the city — I’ll have died a few hours ago.” It can feel a little like, “Huh.” That’s a weirder thing than saying, “If I cooperate, then he will cooperate. If I defect, he’ll defect.”
But I do think there’s something intuitive here, and a way of bringing that out is: in the desert — if you had this causal decision theory — you would want to do something like strap a bomb to your arm that will explode once you’re in the city if you don’t pay, thereby making it the case that it will be rational for you in the city to pay, and so he’ll predict that you pay. But what if you don’t have a bomb? What if you don’t know how to make bombs? Do you really need a bomb? Why not just skip all this stuff with the bomb and just pay in the city?
There’s something weird about needing this weird commitment device. Why not just learn that oh-so-valuable art of actually making and actually keeping commitments? To me, that’s more of the lesson of this case: that being able to keep commitments is a real and important thing, and that it’s something that naive decision theories struggle with.
Rob Wiblin: Yeah, I find that to be a really funny example, where you’re thinking, “Obviously if I had a bomb that could pre-commit me to wanting to pay in the city, then I should. But if I’m missing any of the pieces of the bomb, then I shouldn’t pay, and then I won’t, and instead I should die in the desert.” It feels like something’s gone wrong there and you’re just like, “But couldn’t I do the same thing without the bomb?”
I think this case is a lot more intuitive, because it brings in other intuitions that we have about the importance of maintaining a good reputation and sticking with our commitments. You can make the case stranger by saying it’s a robot that looks for lost people out in the desert, and actually the robot is going to break, or it’s going to explode or something, before you even decide whether to go to the ATM or not. So the being with which you’ve made some agreement is not even going to exist anymore. But still, in that case, it feels like probably you should go to the ATM.
Joe Carlsmith: Yeah, I think we can bring in, you know, that the robot says to burn the money in the street, and the robot has already disintegrated by the time you’re there. I still think you should pay. But I think it’s worth making that distinction, because otherwise we’re triggering a bunch of extra stuff about promise-keeping and the interests of the guy and stuff like that.
Acceptance within philosophy [02:01:22]
Rob Wiblin: OK, going back to the stranger case, where you’re thinking of this kind of spooky action at a distance that you have with other beings that make decisions using similar methods to what you do. Is this idea that this is important, or that this maybe should influence how we think about the effects that we have on the world, is that gaining any acceptance within philosophy? I know this idea has been bouncing around for a couple of decades now, and it doesn’t seem to be going away. At least people haven’t managed to really knock it down; it’s still going. What’s the situation?
Joe Carlsmith: Well, I don’t have my finger on the pulse of the decision theory community as a whole. In academic philosophy, I think this view is still probably the minority; I think standard causal decision theory tends to be more popular in academic philosophy.
Then there’s a different set of folks… I live in the Bay Area, and I think it’s more popular in the Bay Area for some reason. Actually, there is a reason: various people in the Bay Area have done work on this topic. Particularly the Machine Intelligence Research Institute has done work on this topic that influenced my thinking, and I think has influenced some of the culture around this more broadly.
Rob Wiblin: Yeah. Do you think this idea, what do you think is the prognosis? If we come back in 20 years’ time, do you think this will be more widely understood, or be regarded as more important than it is today?
Joe Carlsmith: I think plausibly, though it’s hard to say exactly. The winds of academic opinion don’t always track what I think is right.
One thing is that in principle, everyone agrees that if you’re a causal decision theorist, you should self-modify to stop being a causal decision theorist kind of immediately. In practice, it doesn’t seem like the causal decision theorists even try this.
Obviously it’s not clear what self-modification involves, but ahead of time, if you want to become the type of person who pays in the city, and if there’s like a button to do that, then you should do that. And same with Newcomb’s problem: before Omega makes the prediction about you, you want to become the type of person who one-boxes.
So in principle, all the CDT people should be trying to stop being CDT people. Now, in practice, this doesn’t happen for various reasons. But you can imagine in a longer scope of time that this effect actually starts to matter, and CDT agents stop being CDT agents even if they started that way.
A different factor that I could imagine mattering in the longer run is that I think if we start living in a world where there are a lot of actual digital minds that can be copied and that are more deterministic, then —
Rob Wiblin: And that can self-modify.
Joe Carlsmith: — and that can self modify — then this stuff might just become more intuitive. If there are a bunch of copies of me running around and I start doing stuff and then seeing what they do, I might just get it in my bones a bit more.
Obviously this is very speculative, and exactly what’s going to be up with the AIs is a whole different story. I do think there’s a way in which this stuff can make a little more sense if you’re thinking of yourself as deterministic and if you’re imagining copies of yourself — and we are maybe building agents that are going to be in that sort of situation.
Rob Wiblin: Yeah. You have a really good post on this, for people who want to learn more, as we’re going to have to move on: it’s called “Can you control the past?” So obviously we’ll link to that. You have a couple of other really fun thought experiments in there, which people should go check out.
Infinite ethics [02:04:35]
Rob Wiblin: OK, the next strange idea from philosophy that I would like to talk about that might perhaps influence what we ought to do is the problems created for ethics by countenancing the possibility of infinite amounts of value. We slightly referenced that earlier, and actually, some parts of this were covered on the show not long ago — in episode #139: Alan Hájek on puzzles and paradoxes in probability and expected value.
Basically, our best understanding of the universe from physics and cosmology is that as far as we know, it might well be infinite in size. It has that look to it. And that could either be that it continues outward spatially forever in all directions; or that it continues going forward in time without ever finishing; or that there are an infinite number of parallel worlds to our own, alternative universes with different setups. Certainly none of these things can be scientifically ruled out at all.
But if your theory of ethics involves saying things like you should increase the amount of happiness in the universe, or you should reduce the amount of injustice and unfairness and inequality, or increase the amount of virtue that people display, then this is going to pose some problems. Can you explain what those are?
Joe Carlsmith: Sure. I actually think the problems are substantially more general than just, “If you want to say that more of something is good, you get problems” — I think there are just problems for everybody all over the place. Basically, anyone who wants to be able to make choices that involve infinities at all is going to run into some really serious issues, and plausibly everyone.
Like, we have intuitions about infinite worlds, so infinite hell is worse than infinite heaven. OK, that’s a data point. Cool. It’s not as though ethics is totally silent on this stuff. In fact, we often want our ethical principles to cover a very full range of choices we could make. More importantly, we already are making choices that have some probability of doing infinite things.
I think there’s two different central ways this can go. One is kind of causal, where we can be causing infinite things. Now, I think this is low probability, because it requires kind of different science. Our current empirical causal picture suggests that our causal influence, or our predictable causal influence, is made finite by things like entropy and lightspeed and stuff like that. But it could be wrong. Maybe in the future we can make hypercomputers or we make baby universes. I don’t know, maybe some sort of religion is true. Can you really rule it out? Like this person going to hell, there’s all these churches around — exactly how unlikely is it? What if God appeared before you? Are you certain that’s not going to happen?
Anyway, there’s causal stories you can have, where you can have causal influence. And then if you get into the acausal stuff that we were talking about earlier, you can also acausally influence an infinite universe. So if there’s infinite people, and their decisions are sufficiently correlated with yours, then when you choose to take a vacation or to donate your money or something, there’s all these other people who are doing similar things and you’re having infinite acausal impact. So I also think in practice we have to deal with this issue.
Problems that come up — there are a bunch. The classic problems that come forward first I think are actually not the biggest ones. Nick Bostrom has a paper on this, where the problem he really foregrounds is this idea that in an infinite universe, if you can only do finite things, then you can’t make a difference to the total amount of that thing. Let’s say you’re in an infinite universe, and you save a bunch of lives or you create a bunch of pleasure: naively, the finite difference didn’t make a difference to the total pleasure in the universe. I don’t find this super compelling.
I think that there’s an answer to this that I find quite resonant, which is just that you should focus instead on the difference that you made, rather than on the sense in which you changed the universe as a whole. If you save 10 lives, you still save 10 lives — and those people benefited from what you were doing even if you didn’t kind of “make the world better.” I’ve never been super into making the world better. For me, it’s helping the people, the specific people — and the world is not a moral patient. So I’ve never been super worried by that one.
There’s another one that comes up often: basically if you are sufficiently unbounded in the amount you care about additional somethings — so additional lives being saved, or additional pleasure, or something like that — then you can get obsessed with infinite outcomes. Any probability of making an infinite difference can swamp anything finite and so you can be kind of obsessed with infinities.
I think this is an issue, but I think that pure obsession is sufficiently continuous with other bullets that other people want to bite in finite contexts. Like the idea that sufficiently small probabilities of sufficiently large finite impacts can still dominate more guaranteed benefits, just saying “I’m going to be obsessed with infinities,” that in itself is not the biggest problem. I think the biggest problem is once you try to be obsessed with infinities — or once you try to make choices about infinities at all — you just don’t know how to do it. Our whole ethics just starts to break as you start to figure out how to rank infinite worlds, in particular how to choose between lotteries over infinite worlds or infinite outcomes.
And then especially if you have to bring in bigger infinities — there’s this unending hierarchy of ever-larger infinities — if you start having to bring in those, I think we should expect all the principles we’ve currently talked about for handling these cases to break yet again. I think it’s in choosing between infinities that the rubber really hits the road.
Rob Wiblin: I see. Just to clarify one thing there when you talk about choosing lotteries over infinite worlds, or infinite outcomes — lotteries there is philosopher speak for decision-making under uncertainty. So what would you do if you had a 20% chance of getting some outcome that has an infinity in it, and weighing that against a 5% chance of some other infinity. Our whole ethics doesn’t know how to deal with lotteries like that, or decision-making under uncertainty gambles that includes infinities.
A boring issue, or a standard issue, with the infinite case is you imagine the universe is infinite: it has an infinite amount of good and an infinite amount of bad in it. You help someone and make their life better, so you’ve got an infinite amount of good. Plus, the finite amount that you added, it’s still infinite — nothing has changed; the aggregate is still the same.
You think, as bad as that is, maybe you could overcome that. There’s these other deeper problems that you just end up being able to say nothing about how to compare all of these different things that do seem decision-relevant, and that you should be able to compare. This isn’t just an issue for total utilitarians or someone who cares about the universe as a whole. Anyone who cares about the state of what kinds of things exist or happen at all is getting bitten by this.
Joe Carlsmith: Basically, yeah. I think there are ways to try to get around it by having a very patchy form of ethics that just is silent about a zillion cases.
Rob Wiblin: Yeah, if you had an extremely narrow ethics that was like, “What you should do is just make sure that you eat enough. And it’s only about you, not about any parallel or similar people. You should just try to make sure that you eat well and that’s the extent of ethics.” Would that get away from it?
Joe Carlsmith: Well yes, in some sense, but then a thing you’re doing there is not saying a bunch of things. For example, there are these impossibility results in infinite ethics. These are just results that like, we just know you can’t get certain sorts of otherwise attractive principles. You cannot hold these in combination in the context of infinities.
Rob Wiblin: What’s an example of that?
Joe Carlsmith: For me, the most salient example is to consider two principles: The first principle we’ll call Pareto, which says if you can help an infinite number of people, and you make no one worse off, you should do it.
Rob Wiblin: Sounds good.
Joe Carlsmith: Sounds super good. It seems like, why are you not on board with that? So that’s the first principle.
The second principle we’ll call “Anonymity.” [Also called “agent-neutrality.”] Basically, it says that if you can kind of map each person in one world to another person uniquely — such that each person is one-to-one mapped — and then they each have the same wellbeing as their pair, then those worlds are equally good. In somewhat more intuitive terms, it basically means if I give you a distribution of lives and how good they are, it shouldn’t matter which names I drew under which slot. So Alice at five, Bob at 10 is just as good as Bob at five, Alice at 10. That’s the intuition. So that’s a quite attractive impartiality principle. It’s a way of not randomly caring more about certain agents than others because of their names or something like that. So those worlds are equally good.
The problem is that you can have worlds where, for example, suppose it’s an infinite line of people, and in the first world, every fourth agent has wellbeing of 1, and everyone else has wellbeing of 0: Agent 1 has 1, 2 has 0, 3 has 0, 4 has 0, 5 has 1, et cetera. In the second world, every second agent has wellbeing of 1, and everyone else has 0: Agent 1 has 1, agent 2 has 0, agent 3 has 1. Basically, what the second world does is it just improves the lives of an infinite number of people from 0 to 1: agent 3, agent 7, and so on.
So It’s better by the Pareto principle we talked about. But the problem is it’s also possible to map each agent in the first world to an agent in the second, such that the wellbeing is the same. By the anonymity principle, they’re equally good — but they can’t be both better and equally good, so you have a contradiction. And there’s a bunch of stuff in the same vein.
Rob Wiblin: I guess another thread that might bite more often in decisions that we might make that was one that we discussed more with Alan Hájek: what if your actions have some chance of creating an infinite amount of good, and there’s different probabilities, but the expected amount of good that you create is always infinite, no matter the probability of you creating these outcomes. You have no way to decide between them. And these uncertain outcomes get very confusing and difficult to deal with as well.
Joe Carlsmith: Definitely. Though I think that one is a slightly easier one to get around if you decide you don’t value things infinitely. If you have a bounded utility function, then you can avoid that conclusion. Whereas these impossibility results are just like, sorry, something has to give.
Rob Wiblin: Yeah. Are there any other impossibility results or challenging decision cases that you want to mention before we push on?
Joe Carlsmith: One other one that I find disturbing and instructive: a popular way of trying to rank infinite worlds, at least infinite worlds that have the same locations, is to do something like you have an expanding sphere of spacetime, and you’re looking at the utility within that sphere as it expands, and then comparing the two worlds based on how the utility compares in the limit as the sphere expands. I think this fits with various of our intuitions. If you imagine I was like, “If there’s a world where per unit area, it looks like everyone is happy,” then you might think that’s better.
A counterexample to at least some versions of this that I’ve found instructive is if you imagine an infinite line of happy planets, utopias, and they’re all spread out equally distanced — that’s World A. And then in World B, you pull each of those planets an inch closer together, but you also add an arbitrary finite number of hellish dystopian planets. On approaches that do this comparison of the expanding spheres, the fact that the utopia planets are closer together will eventually do enough to make the difference in the comparison between the two spheres — because the dystopias are finite, but eventually they’ll be outnumbered.
So these expanding-sphere approaches like the second world better. I was just like, “Well, no one noticed when you pulled the planets together. That wasn’t a morally significant act. What people did notice is when you added all these finite dystopias — all the people in those dystopias really noticed that.” So I find this result quite bad.
And you get sort of similar things; there’s quite a lot of order dependence. Basically with infinities you can just rearrange stuff a bunch. Intuitively, rearranging doesn’t matter. But for infinities, rearranging matters quite a lot, and that just breaks a lot of stuff.
Rob’s three-minute explainer about the expanding spheres approach [02:17:05]
Hey, listeners, Rob here. There was a lot there. And I’m just going to clarify one thing that Joe was talking about. Basically, if you have a universe that is infinite in size, then as we’re talking about, you get these great difficulties comparing the goodness of one hypothetical universe that is infinite in size with another one.
One way that you can try to make those comparisons, though, is to use what here Joe called the expanding sphere approach. So that’s one where rather than compare the entire universe A to the entirety of universe B, instead you compare, say, a sphere, you know, a sphere, say, a million kilometres in diameter in universe A centred on the Earth, say, presumably, and then you compare it with an equivalent sphere of equal size in world B. And you say, is the goodness of everything inside the sphere in world A better or worse or equally good as the goodness and badness of everything that’s inside the equivalent sphere in world B.
And so you can see, by limiting the scope of the volume, everything that you’re considering in these universes that technically are infinite, now you have a finite earth thing that you’re comparing in each case. Now, you call it the expanding sphere approach, because you can start with a very small sphere and compare it when the sphere only contains the Earth, say. And then you can just keep growing these spheres in world A and world B, and keep asking the question, is the sphere in world A at this size better or worse or equally good as the equivalent sized sphere in world B?
And a reasonable idea might be that if for any sized sphere, the one in world A is better than the one in world B, then we can say that even though these universes are infinite in size, and even though they might have an unlimited amount of good and bad in both cases, you still want to say that world A is better, because any sized sphere in world A is better than any equivalently sized sphere in world B, centred in the same location of course.
Now, as Joe was alluding to, this is something that I’m not so on top of, but using the expanding sphere approach has the perverse effect that if you move, say, planets that are good closer together, then that means that for any given size of sphere, you’re going to have more good things contained inside it. And so you would end up saying that a universe where there’s just less space between planets or less space between good things going on, is a much better world than one where there’s an equal amount of goodness in aggregate, but the things are somewhat further apart — because, well, you can imagine, if on average, world A contains good things, and then you just bring them closer together, then there’s more net goodness in a more concentrated world where all of the things are pulled closer together.
Now, that’s a somewhat counterintuitive effect of the expanding sphere approach, and maybe it calls into question whether it’s such a good solution to this infinite ethics issue. It’s not one we’re going to go into anymore here, but I thought I would just try to give a potted explanation there to help make sense of what Joe was just saying. OK, back to the episode.
Rob Wiblin: Yeah. A recurring theme of this chapter of your thesis is people have suggested maybe we could fix it this way, and you’re like, “Nah, you just run into different problems. This hasn’t really fixed it.” And then again and again and again and again. I guess also identifying some kind of new challenges that people hadn’t noticed before. This is a little bit maybe more on the philosophy technical maths side, but I think people can take it as given that the situation doesn’t look too good.
It sounds like you think we’re going to have to do, as philosophers say, “substantial violence to common sense” in order to come up with ethical theories that can accommodate the possible existence of infinite quantities. I think some other people remain more hopeful about our prospects for getting to a more satisfactory place on infinite ethics than you are. What might they say about this?
Joe Carlsmith: Well, I do think people bring different affects to this, and I’m not hopeless about it. I do think there’s progress to be made here; I think we’re at an early stage. I do think we can tell that it’s not going to be easy, or you’re not going to get everything you wanted out of it — which is often the case in philosophy. I think often progress in philosophy looks like telling you that you can’t have everything that you thought you wanted about a given thing, and you’re going to have to choose and rethink your orientation.
So one thing is people have different degrees of aversion to various of the bullets. My arguments in the paper are that all of these are just really bad theories, but you can still bite the bullets. You can bite the bullet, for example, about that planet case. And some people do, and they’re just like, “Yeah, it’s like the planets are closer.” Or like, “It’s more dense with value.”
Rob Wiblin: I see, you’re saying someone might bite the bullet and say, “No, the second world is better. You’ve added a lot of hellworlds, to be sure. However, you brought the planets in closer. That’s better.”
Joe Carlsmith: I mean, they would put it in some other way.
Rob Wiblin: Can you get yourself in the headspace of imagining saying that?
Joe Carlsmith: Yeah, to give some intuition for why that’s not that bad, naively imagine a world where it’s like one in every million people is sad; everyone else is happy. And then reverse it: one in every million people is happy; everyone else is sad. It’s quite intuitive, I think, to say the one in every million people being sad is better, because it’s like most people are happy. But we can just rearrange the people; it’s the same. We can just move the people around to make these worlds the same.
Rob Wiblin: In the infinite case.
Joe Carlsmith: In the infinite case. That’s right. It’s an infinite line of people in both cases. It sort of looks like rearranging, like it’s not that bad to say, “I guess I do care how they’re arranged.” It doesn’t feel great, because no one noticed when you moved them around, but there’s at least some intuitive support for it, in that naively you like the one in every million being sad instead.
Rob Wiblin: Just to pause there for a second for people who haven’t done maths of infinities. If you have just a countably infinite line of people, then you just have this result. Because let’s say you divide them into a happy group and a sad group. There’s an infinite supply of both of these groups, and then you can start pairing them into one happy person, then 999 sad people — and then just keep going with that forever. Or you can do the reverse and kind of categorise them this other way. Either way, they’re just different categorisations or just different ways that you’ve paired them up, and yet both of these can spring from the same well, the same supply of beings. And this is one of the threads that leads to very counterintuitive conclusions.
OK, so it’s going to be challenging, is kind of the idea. But maybe there are less counterintuitive ways than others of dealing with all of this. Maybe we can hope to find that at least?
Joe Carlsmith: Yeah, I think we can work on it. I think there are people working on it who feel like their approaches are promising in at least certain cases. It looks to me unlikely that we’ll find some master stroke that just takes care of the whole thing. It looks more like there’s going to be kind of a patchy, like, “All right, let’s work on this, see where things go.” Often you can tell that principles that people are proposing are not going to apply more broadly. It’s going to be incomplete, and silent about various cases, and stuff like that. Especially I think this thing about the larger infinities sounds really rough.
Rob Wiblin: For listeners who aren’t familiar, you might just have heard of “infinity,” but that’s just the beginning, because there’s all kinds of different types of infinity. There’s the infinity that’s the number of the natural numbers — so 1, 2, 3, 4, 5, 6, 7, on forever. That’s “countably infinite.”
Then there’s the number of numbers between zero and one. OK, the first one is zero. What’s the second one? The thing is, there’s no way of ordering them in some natural way because you could always just add more zeros to the next number and make it even smaller, like an even smaller distance from zero. That’s “uncountably infinite,” the number of points between zero and one on the real number line.
Then there’s more beyond that I’m not familiar with, but apparently the chaos continues, and at each layer you can say, “Well, the previous infinity, that’s nothing. Countably infinite: that’s trash. What we should really be thinking about is uncountably infinitely good things.” The problem is kind of going to recur in this ladder. Is that one way of putting it?
Joe Carlsmith: Yeah, that’s one way of putting it. Especially for the person who was like, “Of course infinities matter infinitely more than finite things; I will obsess about infinities.” It’s like, “Oops. OK, what about these bigger infinities?” Also, my understanding is it’s not just that there’s like some number. So every time you take the power set of a set, I think you get a bigger infinity out of it. So you can just do that unendingly. There’s just like an unending hierarchy of bigger and bigger infinities. So that’s rough in terms of finding the biggest one, if you’re obsessed with the biggest one.
Now, I do think there’s some really interesting questions here, but exactly how much do we have to incorporate this into our reasoning? So, unlike countable infinities, where there are these somewhat salient cosmological hypotheses — where there’s lots of people and stuff like that — I am not aware of salient hypotheses where there is like one person for every real number or something like that, and still less these higher-order infinities.
You could make these same arguments, like, “Well, you shouldn’t totally rule them out. Are you certain God’s not going to show up and be like, ‘Hey, I hereby offer to create a large cardinality heaven, like an uncountably large heaven.’ Are you certain the offer is false?” You can make these same arguments, but I think you get into these questions of how exotic do we need to get in terms of the scenarios that our philosophy covers? Do we need to have nonzero credence that consciousness is constituted by sourdough bread? Do you have nonzero credence that two plus two is five? That probabilities don’t have to add up to 100%? It’s just like a whole set of hypotheses that you can get sufficiently wacky about, and your decisions need to cover these. At a certain point you have a feeling like, “Actually no, that one I’m just…”
Rob Wiblin: “I’ve had enough.”
Joe Carlsmith: “I’ve had enough.” So I think there’s an intuition that that’s what we should say about these larger infinity worlds. I’m not sure, but I think even if you say that about the larger infinities, I think it’s much more compelling to me that you have to deal with these with just countable infinite case — and that one’s hard enough.
Infinite ethics and the utilitarian dream [02:27:42]
Rob Wiblin: Yeah. You say in your thesis that challenges with infinite ethics represent the death of the utilitarian dream. I think it might actually be in the title: “Infinite ethics and the utilitarian dream.” It’s evocative. Can you explain what you mean by that?
Joe Carlsmith: Sure. I think there’s a certain type of person who comes to philosophy and encounters ideas — a certain set of kind of simple and kind of otherwise elegant or theoretically attractive ideas. The ideas I most think of as in this cluster — and this is all separable — is total utilitarianism, Bayesianism, expected utility reasoning. That’s the broad cluster, and they look at that —
Rob Wiblin: I’m sitting right here, Joe. You could address me directly. [laughs] I think, like many people, I do have this impulse. Not 100%, but yeah, I think people can recognise it. Sorry, carry on.
Joe Carlsmith: I remember at one point I was talking with a friend of mine, who used to be a utilitarian, about one of his views. And I started to offer a counterexample to his views, and he just cut me off and he was like, “Joe, I bite all the bullets.” I was like, “You don’t even need to hear the bullets?” He’s like, “Yeah. It’s like, whatever. Does it fall out of my view? I bite it.”
So I think that a certain kind of person in this mindset can feel like, “Sure, there are bullets I need to bite for this view. I need to push fat men off of bridges; I need to create repugnant conclusions, even with a bunch of hells involved.” All this sort of stuff. But they feel like, “I am hardcore, I am rigorous, I have theorems to back me up. My thing is simple; these other people’s theories, they’re janky and incomplete and kind of made up.”
Rob Wiblin: It’s all epicycles.
Rob’s two-minute explainer about epicycles [02:29:30]
Hey, listeners, Rob here with another quick aside. This one’s a fun one. We just used the term epicycles, which philosophers throw around quite a bit, but I think I hadn’t heard until I started talking to people who’d done philosophy PhDs.
The idea of epicycles is basically you’re adding lots more complexity to a theory in order to salvage it from cases in which it doesn’t match reality or it doesn’t match intuition. But really what you want to do is throw out the theory altogether and then start again, because it’s the wrong approach.
It goes back to how the ancients used to predict the motions of planets when they thought that everything was going around the Earth one way or another. Obviously the planets weren’t going around the Earth fundamentally, so how did they manage to explain the apparent motions of the planets and them coming closer and then going further away? Basically, they modelled the planets travelling around the Earth, but they were also travelling in little circles around the circle that they were travelling around the Earth on. They were travelling at the same time on two different circles. Then this allowed them to match up the predictions of that model with their observations of where the planets were. So the main big circle that they were travelling on around the Earth was called the deferent, and the little circles that they were travelling on around that circle were called epicycles. I think it comes from the Greek meaning, circle moving on another circle.
Anyway, so you can see how this term has come to mean adding complexity to a model in order to salvage it, when really it’s the wrong model to start. What they should have done is realise that the planets were moving in ellipses around the sun, rather than adding more and more circles to try to match things up. Interestingly though, by adding in enough epicycles like this, they were able to make their predictions of the planet’s motions match almost exactly their observations, because they just had enough degrees of freedom in the model to allow them to place the planets in any particular place at any particular point in time. That’s what can happen when you add a lot of complexity to a model, even if it’s mistaken.
OK, that’s epicycles. Back to the show.
Joe Carlsmith: It’s all epicycles. It just has this flavour of you’re kind of unwilling to look the truth in the face. Like, make the lizards… Sorry, the lizards: One conclusion that falls out of total utilitarianism is the idea that for any utopia, there’s a better world with kind of a sufficient number of barely happy lizards plus arbitrary hells.
Rob Wiblin: [laughs] Carry on.
Joe Carlsmith: So I think that you can get in this narrative of like, “All I need to do is bite the bullets, and I always know what I’m getting out of the bullets: I’m getting more expected net pleasure. And I think that can be kind of attractive, even to people who aren’t in this mode. Like, I think even with deontologists, it can kind of be in the background as a default. It’s like sometimes you’re like doing your deontology and you’re adding another epicycle, like, “OK, what if the fat man is on a lazy Susan, and he’s twirling around, and the bomb is coming from the left, but you’re on the right, but the guy earlier was letting rather than killing it.”
Rob Wiblin: So you’re adding more and more complexities to the theory in order to match your intuitions in more cases. But you’re like, “Ugh. This is… ugh.”
Joe Carlsmith: That’s right. You can have that small voice say, “Oh, who doesn’t have this type of problem? Those utilitarians. The utilitarians have a nice simple theory.” Also, they know how to talk about lotteries. The deontologists barely talk about lotteries, they barely talk about risk. So the dream has allure even outside of people who go for it, I think.
But I think infinite ethics just breaks this narrative. And that’s part of why I wanted to work on this topic: I felt like I saw around me some people who were too enamoured of this utilitarian dream, who thought it was on better theoretical foundations than I think it is, who felt like it was more of a default, and more of a kind of simple, natural foundation than I think it is. I think what happens is, if you try to extend this to the infinite, you just can’t play the same game anymore.
Rob Wiblin: You realise you’re going to have to add a lot of complexity to make this work at all.
Joe Carlsmith: That’s right. You’re going to have to start giving stuff up. You’re going to be incomplete. You’re going to start playing a game that looks more similar to the game you didn’t want to play before, and more similar to the game that everyone else was playing. You’re not going to be able to say, “I’m just going to bite whatever bullets my theory says to bite.” You’re running out of track, or to the extent you’re on a crazy train, the crazy train just runs out of track. There are just horrific bullets if you want to bite them, but it’s just a very different story and you’re much more lost.
I think that’s an important source of humility for people who are drawn to this perspective. I think they should be more lost than they were if they were really jazzed by like, “I know it’s total utilitarianism. I’m so hardcore. No one else is willing to be hardcore.” I’m like, I think you should spend some time with infinite ethics and adjust your confidence in the position accordingly.
Obviously infinite ethics isn’t done. We might well find ways around these problems, or more theory might come forth. I think we can already tell, on the basis of the impossibility results I talked about before, that whatever comes out of this process — whatever the final, best theory of infinite ethics is, if there is such a thing — it’s not going to be the simple “totalism plus expected utility reasoning” thing that people wanted. It’s going to be something jankier, it’s going to be weirder, and I think we should learn that lesson ahead of time.
Rob Wiblin: Yeah, that makes sense. You can imagine that the simplest story would be like, “These people over here are bickering about whether to make seven compromises or 14 compromises. I’m going to make zero compromises. I’m special, because I’ve decided to deviate nil in order to accommodate all of these special cases.” I’m sorry, you’re going to have to make at least three or four compromises. Zero is just not an option. It’s not available. Actually, maybe we’re all kindred spirits in this unpleasant process of trying to figure out how to mesh our finite, strange human intuitions or desires with this extremely odd universe in which we find ourselves.
Joe Carlsmith: Yeah, yeah.
Rob Wiblin: OK, there’s a lot more one could say on this and some of the things you say in your thesis chapter, which we’ll link to in the show notes, and maybe we’ll stick up the links to some resources for understanding infinities better for people who are interested in that.
What to do with all of these weird philosophical ideas [02:35:28]
Rob Wiblin: Pushing on, we’ve briefly covered three topics where philosophy might feel disorientating even to people who are kind of familiar with feeling disorientated by philosophy. Maybe they encountered some of it in high school, and like we were saying earlier, they found out that we don’t really have a strong philosophical grounding for understanding causation or inference and so on — they’ll deal with that. But there’s much more. We keep finding new ones.
I want to talk now about, as human beings, what do we do in light of all of these issues that feel strange and unresolved? Both from an emotional point of view and also from a practical one? I guess we’ve been having a laugh here, because this stuff is both disorientating and kind of entertaining in its own way. But spending a lot of time reading your thesis and other stuff that you’ve written over the last week, I felt myself getting just kind of sad or dismayed. It forced me to come back and think about issues that I kind of knew were there in the background, but which, on a day-to-day basis, I put out of my mind for the sake of practicality and sanity.
We haven’t even gone through all of the stuff that you’ve written about. There’s other things that we’ll link to — the stuff about anthropics and the doomsday argument that are on a similar scale of strangeness to what we’ve talked about.
Anyway, it seems like if you commit to take philosophy seriously, and stay on the crazy train rather than just get off once you start to feel uncomfortable, then you can just pile on enough of these potentially crucial issues, these counterintuitive issues that anyone might feel somewhere between confused and I guess just depressed — or like, not know what to do. But you seem like a cheerful chap, Joe. How do you handle this personally? I mean, you’re in the coal mine here.
Joe Carlsmith: I think one piece of it for me is that the idea of taking philosophy seriously is a rich and subtle one. One thing that I think taking philosophy seriously doesn’t involve is kind of grabbing hold of any old philosophical idea that comes your way and freaking out about it, necessarily. I think holding philosophical ideas with the right lightness and maturity is part of doing philosophy well, and part of taking philosophy seriously.
That’s part of why I wanted to flag the possibility of existing simultaneously at different stops on the crazy train, or different parts of the wilderness, in different ways — where you can be somebody who’s like, “That’s interesting. That is something worth understanding. I think there’s more to learn there,” without upending your life about it and a bunch of stuff in that vicinity. I think that’s a skill that becomes all the more important when the ideas are especially strange or new. That’s just one thing I want to flag. I don’t think the difference is, like, take philosophy seriously and go get super disturbed about this stuff. It’s not right to be disturbed about an idea that hasn’t earned that, in terms of how much you understand it, how solid it is, and stuff like that.
Rob Wiblin: Maybe a more concrete case is: Does thinking a lot about a topic like infinite ethics reduce your motivation to improve the world? Because to me, it feels like it’s hollowing out the goal that I was aiming for: to try to improve the world as a whole. Now I’m like, maybe that’s just not even possible, and all good and bad things exist, and like, woe is me. It seems like you don’t quite feel that way about it.
Joe Carlsmith: Yes. Especially for that particular frame. I feel pretty comfortable saying that before, when I wanted to save this person’s life, or when I wanted to help this person, or… the stakes of that particular action haven’t changed. In general, the focus on the moral patient themselves, as opposed to the world, I think for me alleviates a good amount of that specific concern that I can’t make a difference to the world as a whole.
Another thing you could think is, “Actually, I think infinities are where all the action is, this is super important, but I don’t know how to reason about them.” Which is one lesson you can take away from the piece, but I don’t think it’s the right one.
One thing I want to flag there is that I do think people should be wary about assuming that they care about infinite stuff infinitely much. I think it’s possible. I think people should be honest with themselves. Certainly if you look at how you act, it might just be that there’s some finite stuff that you just care more about than you care about infinite things. And that’s OK, and that might be something we need to learn, and one of the lessons we learned from some of these cases — like the lotteries you talked about with Alan.
I also think that more generally, to the extent you think infinite stuff might matter a lot, we should be wary of cases where you started out caring about X, and then there’s something else that you think could be even more important, that doesn’t make X any less important. So if there’s one person in front of you who is dying, that’s still a whole life. That’s an entire life. If you learn about World War II, it could be that you think working to prevent World War II is more important, but it doesn’t make that person’s life any less significant. I think the same is going to be true of other expansions in scale, and that applies in this context too.
And then more generally, to the extent it’s hard to reason about infinities, I feel like it breaks your reasoning about this. I just think we should be really wary of cases where it feels like “Our philosophy is breaking, therefore my reasoning is breaking.” I think often it’s philosophy’s fault, and you should be wary. It’s not that your reasoning broke or that your ability to reason broke. Your philosophy broke.
An example is something like, say you don’t have a philosophical theory that tells you why you should be allowed to think that your hands aren’t about to turn into butterflies. OK, but your hands still aren’t going to turn into butterflies. The problem is that you haven’t done your theory thing. I think we should be very wary of mistaking ourselves for our theories, or being too theory-forward in our understanding of life.
I think some people in these sorts of contexts, and especially philosophers, really want to have a theory of how they’re living. Or like an abstract model of themselves, and they can kind of query that model — like they can say, “Suppose I were an expected utility reasoner with utilities on all the possible outcomes, what would I then be doing?” and then they want to do that.
I think it’s an important fact that’s not true of us: we’re not expected utility reasoners of this kind. So if our model stops being able to make decisions, that doesn’t mean we stop being able to make decisions. Philosophy has broken, but life has not broken.
Rob Wiblin: Life goes on.
Joe Carlsmith: And I think that’s just a general thing to come back to: it’s life first, and philosophy is a tool for living in a more clear-eyed and responsible and kind of mature way. If the tool breaks, you’ve got to push forward. And there are still stakes — there are still things you care about. I think if you find your philosophy saying, “Oh, it doesn’t matter if I eat my lunch; it doesn’t matter if I stab my pencil in my eye,” it’s like, “Are you sure you think that?”
Rob Wiblin: “I don’t think you’re going to do it.”
Joe Carlsmith: Yeah, it doesn’t seem like a good idea to not eat your lunch. It doesn’t seem like a good idea to stab [your eye]. You’re not going to do it. So you can come back to that and say, “Why exactly aren’t I stabbing my pencil in my eye?” Maybe actually there’s something there that’s more solid than your theoretical framework.
Rob Wiblin: Yeah. To make the case for dismay, to kind of go down the list, I read this first thing and I’m like, “Maybe I never realised this, but my actions are influencing all kinds of parallel worlds and influencing all these beings across the universe.” That’s number one. And then I thought I was living in this universe, so I kind of understood the nature of it. But no, it turns out probably I’m being simulated by some other beings. That’s more likely than not. What do they want? I don’t know. What does the basement world look like? I got no idea. What does that imply about what I should do? I don’t know, but it seems like it should imply something.
Now I’m just extremely confused about existence. I’m like, well, I might grasp at the idea that I could improve the world, I’ll make a difference. But all things maybe exist, and I could change maybe which universe I exist in, but other stuff will replace that in the grand scheme of things. There’s other stuff, like do we know that anything is good? Probably not. You think your own experiences maybe ground that, but it actually doesn’t.
And it’s a lot. It gives me anxiety. The other thing is it really casts doubt on any of the actions that you’re doing in order to accomplish some goal. There’s so many unpredicted effects of what you’re doing, so many unknowns. These are just the known unknowns — what about all the other stuff we haven’t even thought about yet that might change whether what I’m doing is good or bad?
At some point it just feels so disorientating that you’re like, maybe this activity is not for humans. I thought that I could make a difference, but the universe turns out it’s just too complicated. I’m too small; I’m but a mote of dust in this insane chaotic sandstorm. And if only I was smarter or longer lived or something, maybe I could figure this stuff out enough to make some predictable difference. But perhaps I should just focus on something more human scale, like me and my friends. I think I could impact that, hopefully — maybe not, but at least I might have a shot at that. Whereas this more cosmic-scale stuff, where you feel like you need to get all this stuff right in order to know what effect you’re going to have, maybe it’s just not something that we’re up to.
Joe Carlsmith: I think the way I would approach this is, I do think it’s very clear that with respect to many of these considerations, we are at square one in being able to act at all wisely with respect to them, and we should totally recognise that and not just march forward like we know what we’re doing. I think just in general, it’s very often the case that we’re sufficiently confused or uncertain about something that it’s not the right use of our time to try to figure it out. That can be true here, and it can be true in all sorts of areas of life, philosophical or no.
And ignoring stuff is a real option. In fact, it’s super common. It’s a very common approach to just ignore stuff. I think that’s always available for stuff that seems kind of too hopeless to act well with respect to. What happens when you ignore something? It’s not some grand cosmic sin. It’s just, when you ignore something, you lose agency with respect to it. It’s no longer the case that your influence on that thing is structured by your understanding of it and your choices. You’re sort of letting what happens with that thing be determined by whatever would have determined it, absent your input.
So ignoring something, as I see it, is like staying home on election day about something. It doesn’t go away; it’s just that you stop being able to have an influence on it. To the extent you cared about it, what you care about will just be subject to other forces. But sometimes that’s what you do. Sometimes you ignore something. I think that’s OK with a lot of these things.
I also think, to be clear, that people who find that interacting with this stuff is harmful to their motivations or just in general is not kind of sitting well or in a healthy way with their psychology, I think it’s totally fine to drop it. I don’t think everyone needs to be thinking about these things, and I don’t think everyone should be thinking about these things at all.
With respect to human scaleness: first of all, to be clear, this is not like my life. As I said, this isn’t most of what I work on, and even beyond working on big-picture questions, a lot of my life is at human scale: I have friends, I have family, and all that sort of thing. I think that’s healthy and important.
But I’m also wary of drawing some sharp line between what is human scale and what is cosmic scale. So say the dialectic is like, you see all these big philosophy ideas, you were really psyched about it — “I’m going to change the universe!” And then you run into a bunch of philosophy stuff. Now you come back to like, “OK, maybe I’ll just hang out with my friends.” So now you’re back at home. Cool. Home is good. It’s fine to be at home. You’re losing agency with respect to all these big things, but that’s OK. Sometimes you just don’t have agency about stuff. So it goes.
But now suppose you’re actually feeling kind of comfortable at home. You’ve got your friends, you’re feeling safe. Now you kind of come out, and you’re like, “It looks like I could save at least this person’s life.” OK, maybe that’s kind of human scale too, right? You actually start to think about that, what are the effects of that? I can’t know. It is true that it’s very complicated, but it’s also true that it’s complicated what happens when you do something with your friend — or there’s not, I think, a super deep difference there. I think there’s not a super deep difference as we start to expand out.
I think in general, we should be quite wary of if somehow coming back to a human scale made you indifferent to geopolitics, or indifferent to the prospect of World War III, or to US–China tensions. That’s a big red flag about your philosophy. It’s similar to being like, “Apparently I shouldn’t eat my lunch”: if you’re like, “Apparently war doesn’t matter,” or “Apparently nuclear war wouldn’t be bad,” then I think you should reexamine.
I want to say it’s fine to come back to something that feels real and familiar, and like you’re able to really orient towards it in a good way. But I also think there’s still a question of how far do you venture out? I think we do see that, as you venture out quite far, it becomes a bit of a wilderness, and we get more confused and there’s a need for humility and caution and understanding how little we know. I personally don’t, for example, stop caring about what happens in the future or caring about factory farming, or caring about all the things I cared about before. I think that would be a red flag about your philosophy if somehow it stopped mattering, if things that really mattered to you stopped mattering. I think that’s probably your philosophy’s fault, rather than the world’s.
Rob Wiblin: Yeah, I guess the thing that kind of runs through my head as I’m reading stuff is just like, existence is such a joke. I can’t believe that this has been imposed upon us, that we exist in this world. Insects that are created have no concept of what they are, of where they are, of how they ended up on this Earth, and then they die after a week or something. Sometimes you feel like you’re not like that, and then other times you’re like, “I am just so small.” There’s a dark angle to that, which is just feeling this kind of loss of agency.
I guess maybe finding it funny in some way can be consolation. There is something very amusing about our circumstance. I guess science fiction authors or people who’ve done sci-fi comedy stuff have definitely reached for this at times, that it’s like a clash between kind of the hubris of human beings, and our desire to understand and control things, and the fact that we are so finite in this infinite world of stuff that we’ve barely even thought about. I guess a kind of dark humour aspect of things sometimes I find helpful in other situations. Maybe I could find it useful here.
Joe Carlsmith: One thing on the ants point: An interesting difference between ants and humans is, in this case, you as an ant are sitting there going like, “I am an ant. Oh my god. I’m small. I’m realising that I’m really ignorant, and there’s this giant universe, and I didn’t realise that. Wow.” The ants actually aren’t doing that. So that’s interesting. That’s a sort of non-ant characteristic. Actually, I think in general, philosophy for me, a deep part of the aspiration is to come into a more conscious and aware and intentional relationship with our situation — including our ignorance, and our uncertainty, and our smallness, and all the rest.
Sometimes I imagine that if these superintelligent aliens — let’s set aside sim stuff for a second, and assume our empirical situation is fairly normal with respect to, like, it’s Earth, we’ve got a big future ahead of us — I imagine somehow if there were aliens who could see us, if we were going around being confidently, like, “I’ve solved infinite ethics; I know what it is,” I think the aliens are laughing at those people.
Rob Wiblin: “Lol!”
Joe Carlsmith: I think the aliens are like, “Oof, that looks really rough.” But I think people who are like, “Wow, I’m realising just how little we know. I’m realising just how early we are in the potential history of our civilisation, and just how much could be in the future.” There’s this line from Seneca in The Precipice where he’s like — I forget exactly the line — times will come where things not now known will become known, and stuff like that. And I read Seneca, I’m like, “That guy got it. He was on it. He’s realising how little he knows and how much there is to learn.” I think we should do that too.
I think what that looks like in practice for me, at least a general first pass, is working quite hard to make sure that our civilisation reaches a stage of much greater wisdom and empowerment — where we’ll be less ants; we will be more mature; we’ll be more able to look ourselves and the universe in the eye, and understand what’s really going on, and act maturely in light of these considerations and all sorts of other unknown unknowns that could come up. I actually think that’s just generally going to be, in many cases, a much more robust strategy, and a much more useful point of focus than trying to work directly on these issues now.
If someone interested in these topics comes along like, “We’ve got to figure this out,” I’m not sure we do need to figure it out. I think we should check if there’s anything that really matters for us now, that we can’t defer to future people, and I think there are questions there — but as a first-pass heuristic, I think it’s reasonable to mostly try to get our civilisation to a less ant-like state, one where it can kind of deal with these questions better.
I also think that that’s something that doesn’t have an aura of like, “Oh my god, that’s so intractable. Who knows what happens? Is everyone dying in a massive pandemic? Good or bad for that?” I’m like, “Eh, it’s probably bad.” And same with AI stuff. There’s just the kind of normal discourse about existential risk I think applies very much to whether our civilisation reaches a wiser state as well. That’s another important piece for me, and related to the ants thing.
Welfare longtermism and wisdom longtermism [02:53:23]
Rob Wiblin: Maybe let’s push onto that. I kind of split this into: how do we handle this emotionally and personally, and then what does this actually imply for our actions?
Despite the fact that it’s so disorientating, I think weirdly it maybe does spit out a recommendation, which you were just kind of saying. Which is that, if you really think that there’s a good chance that you’re not understanding things, then something that you could do that at least probably has some shot of helping is to just put future generations in a better position to solve these questions — once they have lots of time and hopefully are a whole lot smarter and much more informed than we are, in the same way that current generations have, I think, a much better understanding of things than we did 10,000 years ago.
Do you want to say anything further about why that still holds up, and maybe what that would involve?
Joe Carlsmith: Sure. In the thesis, I have this distinction between what I call “welfare longtermism” and “wisdom longtermism.”
Welfare longtermism is roughly the idea that our moral focus should be on specifically the welfare of the finite number of future people who might live in our lightcone.
And wisdom longtermism is a broader idea that our moral focus should be reaching a kind of wise and empowered civilisation in general. I think of welfare longtermism as a lower bound on the stakes of the future more broadly — at the very least, the future matters at least as much as the welfare of the future people matters. But to the extent there are other issues that might be game changing or even more important, I think the future will be in a much better position to deal with those than we are, at least if we can make the right future.
I think digging into the details of what does that actually imply — Exactly in what circumstances should you be focusing on this sort of longtermism? How do you make trade offs if you’re uncertain about the value of the future? — I don’t think it’s a simple argument, necessarily. It strikes me, when I look at it holistically, as quite a robust and sensible approach. For example, in infinite ethics, if someone comes to me like, “No, Joe, let’s not get to a wiser future; instead let’s do blah thing about infinities right now,” that’s sounding to me like it’s not going to go that well.
Rob Wiblin: “I feel like you haven’t learned the right lesson here.”
Joe Carlsmith: That’s what I think, especially on the infinity stuff. There’s a line in Nick Bostrom’s book Superintelligence about something like, if you’re digging a hole but there’s a bulldozer coming, maybe you should wonder about the value of digging a hole. I also think we’re plausibly on the cusp of pretty radical advances in humanity’s understanding of science and other things, where there might be a lot more leverage and a lot more impact from making sure that the stuff you’re doing matters specifically to how that goes, rather than to just kind of increasing our share of knowledge overall. You want to be focusing on decisions we need to make now that we would have wanted to make differently.
So it looks good to me, the focus on the long-term future. I want to be clear that I think it’s not perfectly safe. I think a thing we just generally need to give up is the hope that we will have a theory that makes sense of everything — such that we know that we’re acting in the safe way, that it’s not going to go wrong, and it’s not going to backfire. I think there can be a way that people look to philosophy as a kind of mode of Archimedean orientation towards the world — that will tell them how to live, and justify their actions, and give a kind of comfort and structure — that I think at some point we need to give up.
So for example, you can say that trying to reach a wise and empowered future can totally backfire. There are worlds where you should not do that, you should do other things. There are worlds where what you will learn when you get to the future, if you get there, is that you shouldn’t have been trying to do it — you should have been doing something else and now it’s too late. There’s all sorts of scenarios that you are not safe with respect to, and I think that’s something that we’re just going to have to live with. We already live with it. But it looks pretty good to me from where I’m sitting.
Rob Wiblin: OK, so there’s a bunch of these fundamental issues that we haven’t really sorted out. There’s probably a bunch more that we haven’t even noticed that we haven’t sorted out yet. But if we put our descendants into a better position, they’re smarter than us, they’ve got plenty of time, and they’re even more numerous, perhaps, so they can throw a million people at the infinite ethics thing and try to make the best of it that they can. They’ll make the best of each of these different things and figure stuff out in a way that you and I can’t really imagine doing, and then they’ll figure out what actions that might imply and they can do it.
I think that makes a tonne of sense. And especially after the last week, like you, I’m definitely on board with the wisdom longtermism approach. This seems like the best crack of the whip that we have — certainly compared to committing to something today, if we can avoid it.
As you say, it doesn’t completely evade it, because maybe it’s actually bad to do that, basically. Maybe there’ll be some consideration that will show that in fact being smarter and more informed backfires on you. Or maybe it will be better if, instead of becoming more resilient and more organised, we went extinct. For reasons that we can’t yet understand, it actually is going to be worse to continue. I suppose you just have to bank on the hope that that’s not the case. Or at least it’s like less than 50% likely that’s the case — that we’re better off progressing and trying to get wiser than not doing it, even though we don’t know where it’s going to lead precisely. I think I can be OK with that.
It also does seem like it points towards particular actions about trying to preserve flexibility, trying to make sure that our children are in a good situation, the world is stable, there’s little conflict. If artificial intelligence is going to be involved in all of that, that we turn it to the right purposes, and also a concern about the wellbeing of future digital minds and so on all makes sense.
Is there anything you want to say about why we shouldn’t be so concerned that in fact it will be better to become ignorant or just disappear? That on balance, it’s better to try to become better, smarter, more informed agents?
Joe Carlsmith: I guess I want to bring in similar heuristics that in general, we’re always making bets. Things could always be wrong and bad in lots of ways. That’s true of even the most robust-seeming stuff, like surviving and becoming smarter and understanding ourselves better and understanding the world better.
But there are these basic things, like: if you understand things better, often you can orient towards them better. If you care about stuff, often your existence will be a way of being a good force for caring about it. It seems like, in general, the track record of humanity learning more and progressing has some stuff to be said for it, though it’s not unambiguous.
I do think there’s more work to be done here. And really what you’d want to do, if you’re assessing different candidate strategies here, is to get into your probabilities on different sorts of futures, and how good will they be, and what’s the value of information, and what are the alternatives? I think it’s actually quite an involved analysis to make a more rigorous argument that various types of focus on increasing the future people’s wisdom and empowerment is the right call, relative to other hypotheses.
I think the most salient alternative is like, not think more about this stuff. At least with respect to crazy train stuff, there’s a salient alternative to try to think more about it now. I think my best guess is it’s worth having a few people doing that. I think checking if there’s stuff that matters right now and that can’t be punted to future people.
I also think we should be wary of having kind of Pollyannaish or naive pictures of what the future will be. Sometimes there’s this assumption that if we just don’t go extinct then the future will be this wise thing, and the people will act on the wisdom and it’ll be great — and all we have to do is not fall off the train. I think that might also be a kind of misleading picture of the landscape ahead of us, and there may be other forms of influence that are more important or other factors that matter.
So it’s a big question, and I don’t claim to have answered it, but the intuition I want to pump more generally is that it’s possible to think about this. This is a decent first pass. Like, don’t stab pencils in your eye. Don’t stop saying war matters. I don’t know. I don’t think this dislodges our basic orientation towards the world — I think it just adds some complexity, adds the need for humility, and adds some additional reason to get to a wiser and more empowered state.
Rob Wiblin: I suppose an alternative framing that you could try is: Let’s say that your goal was already to try to produce a wiser future for our civilisation. And then people start bringing you all of these specific things about, “What if we’re in a simulation? What if our decision theory is broken?” And just be like, “You’re getting lost in the details. We’ll figure it out. Right now what we’re doing is building the city and making sure there’s plenty of people to eventually solve all of these issues.” You might say you’re missing the forest for the trees here. I think maybe that is a sensible mentality to have, at least as an individual who hasn’t decided to specialise in seeing what we can make of those issues, inasmuch as they matter today.
Joe Carlsmith: Yeah, I think the standard I would use when someone comes to you with an argument like that is not, “Could this matter at all?” but more like, “Does this suggest something that seems actively better than having the future people deal with it?” Sometimes they can make arguments of that form, but I think it’s just a higher standard, and that’s the standard I would use.
Epistemic learned helplessness [03:03:10]
Rob Wiblin: Yeah. One concept that I thought we might introduce earlier, but maybe we should do now, is “epistemic learned helplessness.” Which is an expression I really like, because it slightly blindsides you, because I’d assumed for a long time that epistemic learned helplessness was a bad thing. It certainly doesn’t sound great. But it actually probably has a lot of merit to it. Can you explain what it is?
Joe Carlsmith: Sure. Epistemic learned helplessness is a term coined by Scott Alexander in a blog post about this. I think the basic idea is that he, at an early part of life, was sort of enamoured of various conspiracy theories. He would read one conspiracy theory, and then he would read a rebuttal, and he would be convinced by each in turn. And he eventually kind of gave up, and he was like, “I’m just going to trust the experts, because apparently I can be convinced of anything.”
I do think this is a real dynamic. I think there are people where, if they talk to someone and that person seems smart and confident, they really kind of take in that belief. And then they go and talk to someone else who disagrees, and they’re sort of ping-ponging back and forth between people who seem credible, or between arguments in general. So epistemic learned helplessness is a response where you realise, like, “Apparently evaluating arguments is not for me.”
Rob Wiblin: “It’s not my strength.”
Joe Carlsmith: “It’s not my strength.” If something seems convincing, that’s just not an important signal about whether it’s convincing. There’s a question like, well, what do you do instead? Sometimes there’s a notion like “trust common sense” or “trust the experts” or something. Of course that in itself is not safe. It’s not like the experts are always right. I think stuff of this type influences people’s response to all sorts of things, including the crazy train stuff. I think it’s reasonable, if you know of yourself that you can be convinced of anything by anyone who sounds smart and confident, or who you decide is smarter than you. There’s a lot of people smarter than you. Certainly the standard should not be, “I met someone who’s smarter than me and they think something blah…”
Rob Wiblin: “So I’m just going to take that run with it.”
Joe Carlsmith: Yeah, you’ll just be hijacked by the first smart person you meet. We need some sort of discernment there.
My own approach to this is to basically try to really notice which things I feel like I’ve evaluated enough to be a signal with respect to. I mean, obviously this is all on a spectrum, but for me, roughly, there’s categories of ideas where I’ve heard about it — and this is where the simulation argument, for example, used to be for me: I’d heard about it, I was like, “That’s kind of interesting. Various people I know who I take seriously take this argument seriously too. It feels kind of slippery. It feels like if I thought about this, I’d probably find some problems with it that they’re not saying, or there’s probably objections that I haven’t heard.” And so I logged it as “Interesting to be learned more about,” but not “World upending, oh my God.” I just think that was an appropriate reaction.
I’ve now, though, written a whole paper about it, and really thought about it, and looked at the literature — and I feel like my relationship to it is actually different at this point. I feel like I can see the structure for myself. I don’t think I’m deferring to a tonne of people on various bits of it. I feel like I have more direct clarity about the ways it could be wrong and the parts of it that could be confused. I don’t feel like I know what the upshot should be, but I feel kind of empowered with respect to the structure of the argument and the considerations at stake. I feel like I’m in a different position, where I would have discovered if there was some really obvious hole in the argument — I think at this point, I would have hit it, at least relative to my current epistemic capacities. I don’t think I should just be totally epistemically learned helpless about it. I think other people shouldn’t necessarily trust me on that.
Generally, a thing I try to do with lots of these… There’s a lot of wild ideas running around here. I’ve worked on AI risk. Part of what I was trying to do with AI risk was like, “OK, this is a really important idea, but I really want to think about it.” So I went and I wrote this whole report trying to really think through the issue myself. And part of that is an attempt to retain the ability to get signal from investigation and research in the midst of a bunch of ideas and a bunch of smart people running around.
Rob Wiblin: Yeah. We should probably wrap up this section and move onto the next one. You mentioned earlier this idea that the effects of our actions could be far broader than we had initially appreciated — it’s possible to kind of picture us as one civilisation in a much broader system of agents that are trying to figure out how to make their world better. Can you explain how that idea works?
Joe Carlsmith: One upshot of the acausal stuff is what I see as a kind of basic reorientation in the stage on which you could be possibly acting. I think often folks who are longtermists are working within that sort of narrative. Imagine there’s this single Earth and they’re facing entirely forward: they’re just looking at the lightcone ahead of them and this affectable universe and the causal impacts in that domain. What longtermism says is, “Hey, look at this space. There’s actually a tonne of moral patients there.” It points to this big place and to all the moral patients there, and says, “That’s important.”
I think a thing the acausal thing can do is something analogous, where instead of looking forward, you also start to look kind of horizontally — like your lens widens substantially, and maybe even starts to include the past or a much bigger space. — and again points to that space and says, “Look at all those moral patients.” I think when that happens we should sit up straight. I think there’s a decent track record for “big thing with lots of moral patients” plausible argument for it. I think it’s worth noticing that.
That’s just one thing I’ll say about it. For me, there’s less of a clear “This is the specific action-relevant upshot of the acausal stuff” and more like there’s a kind of basic reorientation in where your vision is going, and you’re suddenly maybe looking to different places in terms of what your actions do.
So in the context of acausal stuff, and also to some extent sims, and just in general — once you start taking seriously that maybe there is such a thing as really advanced civilisations and technological maturity and stuff like that — I think there is a sense in which you can become kind of humbler about humanity’s place in the universe in our particular time. I think often there’s the narrative — when you just focus on our specific section of the universe — that we’re alone, and the main question is: what happens with us and Earth? And also what happens with our future in particular — the stars and galaxies or whatever we could have affected?
But I think that, in combination, a number of these considerations about acausal stuff and sims and other things suggest a somewhat different picture, where in fact it’s a much more already inhabited place. Like there’s possibly just many civilisations already out there — very, very advanced civilisations. Civilisations towards which we’re kind of newcomers on the scene. And obviously this is totally speculative, but I think in some sense it may be that this is kind of a community, or in some sense, there’s like a whole scene, a whole set of interacting civilisations and agents and all sorts of stuff going on.
I think this might call to mind a different set of heuristics and ethical principles and orientations, which have more to do with what does it mean to be a kind of citizen, or to be joining this community in some sense? What are the norms you want to uphold? What does good cosmic citizenship look like, as opposed to optimising your stars or something like that? What is it to join this community in a healthy way and to be a contributor?
Now, this is just one vibe. It’s very speculative. There’s a lot of complexity and a lot of different vibes that could be appropriate here. but I do think there’s something about the change in narrative that some of this offers that could end up important — that we may not be as alone as the mainstream longtermist narrative can suggest.
Rob Wiblin: Yeah, I find it a little bit inspiring. The universe is a very big place. One positive thing about it is that there’ll be a lot of other civilisations out there wishing us the best, hoping that we manage to get to the same place that they managed to get to. And some of them will have figured these things out, and we can maybe aspire to that. I suppose if the acausal stuff is legitimate, maybe we can coordinate with them in some very peculiar way.
Power-seeking AI [03:12:41]
Rob Wiblin: Pushing on, two major things you’ve worked on since 2019 are “How much computational power does it take to match the human brain?,” and this other report, “Is power-seeking AI an existential risk?” We haven’t focused on those two today, because you felt — I think, reasonably — that quite a few guests have talked about similar themes on the show before.
But I wanted to just briefly ask you what AI risk scenario you were analysing in that second report on power-seeking AI, and how worried about it you ended up being. First of all, what is the issue of power-seeking AI?
Joe Carlsmith: The basic issue is that it looks like we’re on track — and, in fact, trying very hard — to build AI systems that look a lot like a second advanced species on this planet, and a species that’s a set of agents that’s able to use technology, and do science, and that’s just smarter than humans. The basic thought is that that’s just an extremely serious and scary thing to do — that you’re kind of playing with a hotter fire than we’ve ever played with as a civilisation. The sort of fire that accounts for humanity’s existing impact on the planet — which has been quite dramatic and unprecedented — and that it’s the sort of fire that could just get out of control very easily. And if it gets out of control, you can’t recover. So that’s the basic issue.
Somewhat more specifically, I think there are plausible arguments that intelligent agents by default will seek various forms of power over their environment, basically because power helps you achieve your goals. It looks like, from our current perspective, it’s going to be hard to build agents that don’t do that but that are suitably powerful to do the other things we wanted them for in the first place.
There are various technical problems there, and more broadly we just really don’t understand the AI systems that we’re building right now. But it looks like there are going to be a lot of actors in the space, and there are going to be a lot of very strong incentives to push forward with these systems. People are going to have varying levels of caution. It’s going to be hard to coordinate. And so it looks very disturbingly plausible that we just barrel ahead anyway, and build agents that we can’t control and we don’t understand, and then disaster ensues and we can’t recover.
Rob Wiblin: I suppose in terms of power-seeking, there’s this very natural argument that no matter what specific goals you happen to have, you’re probably not going to want to be destroyed or turned off and you probably also want to accumulate influence and maybe resources as well. Humans have not a super wide range, but some range of things that they want to pursue in life: by and large, they try not to die and they also try to have enough money to potentially exchange it for the things that they care about.
I guess this is sometimes called instrumental convergence — where no matter what, ultimately what you’re aiming for is very often you’ll want to survive and gain influence. Is that the idea behind why artificial intelligence systems, no matter what they’ve kind of evolved to specifically care about, might well end up seeking power? Just because it’s useful almost regardless?
Joe Carlsmith: That’s the basic argument. I think there’s more we can say about why we should expect commercially incentivised AI systems to have goals in the first place, and to have various forms of understanding and awareness of their world, and the incentives at play with respect to those goals. But yeah, roughly speaking, the thought is that, for a wide variety of goals, various forms of power and resources and influence and survival are going to be important. So to the extent AI systems will have goals — and I think there are reasons to think that — we should worry.
Rob Wiblin: I guess these two reports were some of the first things that you worked on when you joined Open Phil. Why did you decide to work on that, rather than one of the other research questions that they have on the boil?
Joe Carlsmith: For the power-seeking report in particular, I had a few goals. One I mentioned earlier is just that this is a really important question, it’s a really important issue, and I was potentially going to spend a lot more time on it. I really wanted to have vetted it and understood it myself.
At the time I was writing, there was some energy in different places — including a little bit on your podcast, actually — of people saying, “I feel like these arguments actually haven’t been vetted and haven’t been written up clearly. I feel like there’s a bunch of ambiguity about how they work. If you let go of certain assumptions, then I think they fall apart.” I felt like that wasn’t right, but I did feel that there was a kind of deficit of people having really gone through and made the case as a whole and tried to lay it out as clearly as possible. And so I wanted to do that.
I also felt like there were some narratives that didn’t seem to me right. People would say, for example, that the argument really rests on certain assumptions about the speed of what’s known as “takeoff” — which is the transition from broadly human-level systems to really superintelligent systems. Or that they really rest on the notion that a single system will take over the world or something like that. And more broadly, there were various narratives floating around where people were saying there’s so many different types of arguments: There’s the Yudkowsky argument, and then there’s the Paul Christiano 1 argument, and then there’s the Paul Christiano 2 argument, and then there was just a sense of like, is this a kind of disorganised mess?
Actually, my own take was that no, I think there’s a single core argument here. In my opinion, it’s this concern about power-seeking and about agents seeking power — and that a lot of the other scenarios were sort of subcategories of that, but that there was a kind of unified core here that it was going to be important to bring out. Part of what I was trying to do was to clarify that, and set it up for what seemed to me like a more productive debate.
Rob Wiblin: Yeah, that makes a lot of sense. Just setting aside all of the specifics, the idea that we’re going to create this new species that is substantially more capable than us in all kinds of different ways, you’d be like, is that going to go well? Maybe, but also maybe not. It seems like the analogy of playing with fire, especially that fire can spread, and so can agents that can reproduce and copy themselves.
I almost feel like if we looked and we were like, “No, it’s going to be completely fine,” I’d be glad to say, “Maybe we should check that again.” Because I feel like it’s like saying, “Walk across this tightrope across this rushing river. It’s completely safe. Don’t worry about it.” I’d be like, “I feel like we haven’t understood the situation very well.” It’s that kind of initial raw intuition that does a lot of the work for me.
After thinking and talking about this for a couple of years, did you end up maybe more worried than you did coming in? Or were you perhaps a bit reassured, having interrogated some of the specific disagreements more closely?
Joe Carlsmith: When I first wrote the report, mostly what I wanted to do was lay out the issues and just really have kind of structured the debate. But I also included a section which was offering a kind of loose subjective estimate of the risk by 2070 from this scenario. I tried to not just have a number that came totally from nowhere; I tried to set up a kind of premise, premise, conclusion argument, and assign credences to the premises and get a number out at the end.
And the number I got there was 5% risk by 2070, which I had a bunch of caveats on and stuff like that. I’ve since come to think that that number was too low. And that’s for various reasons; I talked with lots of people and kept thinking about it.
There were a few major factors. One was I just noticed that I was more scared than that implied. If I tried to really condition on “We are building superintelligent agents and these are going to be in the world” — which I had very high probability on — I think then my numbers were implying that I was like 90% that it was going to be fine. I was like, I think actually that’s not how I’m going to feel. It’s related to the thing you said about “Maybe we should check that again. I think I’m going to be a lot more scared than 10% in that scenario.” There’s various related intuitions of trying to really condition on, “OK, you’re really seeing blah things are true. Blah things are true. Blah things are true. How are you feeling?” Working through that made a difference for me.
Another one was I had some feeling like, if the risk is that low, I should have better stories about why it’s fine and how things go well. I think when I listen to people talk about how we’ll solve it this way or something, I don’t feel super reassured. I think that was another flag for me. More broadly, I briefly spent a little bit of time trying to build alternative models, and alternative ways of setting up the argument, and assigning credences in different ways. Those were spitting out substantially higher numbers.
Anyway, I kind of got pulled into some other things, so I never finished up that work. As a kind of half-intermediate measure, I threw a note into the report saying I think this number should be higher — I said above 10% — and I just haven’t gotten a chance to return to that.
There’s also been a bunch of additional commentary on the report, like a bunch of people wrote reviews, a bunch of people kind of built off of it or expressed their disagreements and critiques. I found that really helpful. People can find that — maybe we can link to it in the show notes. I do feel like with the report, my main goal was to stimulate discussion and spark debate and get additional clarity, and I feel good about its impact in that respect.
Rob Wiblin: Wonderful. Yeah, we’ll definitely return to these issues before too long. And I suppose folks who want to read them properly can Google “How much computational power does it take to match the human brain?” and “Is power-seeking AI an existential risk?” I think they’re on your audio feed, Joe Carlsmith Audio, and you have a video presentation of the latter one.
Rob Wiblin: OK, despite this being a pretty lengthy conversation, we had to cut a bunch of stuff for time — ideas for other topics that we wanted to cover. Two I just wanted to bring to listeners’ attention specifically, because they’re things that you wrote in order to rebut ideas that have appeared on this podcast, among many other places.
The first is “Against meta-ethical hedonism,” which, among other things, explains why you’re not convinced by the arguments that Sharon Hewitt Rawlette put forward in her book The Feeling of Value and talked about quite a bit back in episode #138 on why pleasure and pain are the only things that intrinsically matter.
The other one is “Against the normative realist’s wager,” which is an article you wrote that explains why you’re not convinced that we should act as if normative realism is correct, even if we think it’s probably not. That idea, I think, has been raised on the show a couple of different times. People could definitely find it in my first interview with Will MacAskill, which is episode #17 on moral uncertainty, utilitarianism, and how to avoid being a moral monster.
Yeah, we just couldn’t find time to cram them in today. But I think people who have found those ideas convincing should go and check out your responses and see what they think of them.
As a final question, we’ve talked about a bunch of somewhat strange ideas today, and our general view is that you should hold them fairly lightly and inspect them at a distance, and not go all in and start taking big actions on the basis of them. Are there any kind of funny things that you actually have done in your own personal life on the basis of strange ideas that you’ve encountered in your philosophy work?
Joe Carlsmith: I think the decision theory stuff we talked about has influenced how I think about various forms of cooperation and the value of keeping commitments to myself and to other people. Just a small example: at times I’ll make deals with different parts of myself, and I think that the decision theory work has influenced my sense of the importance of staying true to those deals. So if I say to one part of myself, “We’ll do that later,” I want to make sure I really do it. You can do that from a causal decision theory perspective too, but I think in general, something about the importance of being able to count on each other and commitments and trust has come forward for me in virtue of the decision theory work.
I think the weirdest thing I’ve done is even thinking about these things at all. I think I’ve started thinking much more than the vast majority of humans ever about acausal interaction with aliens — so that’s just like a weird thing to have happened in my life.
Rob Wiblin: You have this great line in one of your essays. I think it’s like if you ever find yourself taking actions, hoping that the thing that you’re doing might make people in the past have acted better, you should seriously contemplate that perhaps you’ve completely lost your marbles. It’s fair enough.
I hope your increasing interest in keeping promises and so on means that if you’re ever stranded in the desert, you definitely get picked up and brought back to the city so we can have more of your articles on your website.
Joe Carlsmith: Yeah. I’m so paying in the city. That one’s so easy.
Rob Wiblin: Do you hear that, people in trucks who are driving through deserts? Joe’s your man.
Joe Carlsmith: It’s $10,000. Like, come on. It’s like your whole life!
Rob Wiblin: Just pay the money! My guest today has been Joe Carlsmith. Thanks so much for coming on The 80,000 Hours Podcast, Joe.
Joe Carlsmith: Thanks for having me, Rob.
Rob’s outro [03:25:45]
Rob Wiblin: As always there’s lots of links if you’d like to learn more on the blog post associated with this episode, lovingly compiled by Katy Moore, who also puts in the effort to polish the transcripts and stick in links to any unusual terms that come up.
And if you’d like more episodes with challenging moral philosophy in them, then I can suggest going back and listening to episodes like:
- #86 – Hilary Greaves on Pascal’s mugging, strong longtermism, and whether existing can be good for us
- #115 – David Wallace on the many-worlds theory of quantum mechanics and its implications
- #137 – Andreas Mogensen on whether effective altruism is just for consequentialists
- #98 – Christian Tarsney on future bias and a possible solution to moral fanaticism
All right, The 80,000 Hours Podcast is produced and edited by Keiran Harris.
Audio mastering and technical editing by Milo McGuire and Ben Cordell.
And as I said full transcripts and an extensive collection of links to learn more are available on our site and put together by Katy Moore.
Thanks for joining, talk to you again soon.
Learn more
Related episodes
About the show
The 80,000 Hours Podcast features unusually in-depth conversations about the world's most pressing problems and how you can use your career to solve them. We invite guests pursuing a wide range of career paths — from academics and activists to entrepreneurs and policymakers — to analyse the case for and against working on different issues and which approaches are best for solving them.
The 80,000 Hours Podcast is produced and edited by Keiran Harris. Get in touch with feedback or guest suggestions by emailing [email protected].
What should I listen to first?
We've carefully selected 10 episodes we think it could make sense to listen to first, on a separate podcast feed:
Check out 'Effective Altruism: An Introduction'
Subscribe here, or anywhere you get podcasts:
If you're new, see the podcast homepage for ideas on where to start, or browse our full episode archive.