#152 – Joe Carlsmith on navigating serious philosophical confusion

What is the nature of the universe? How do we make decisions correctly? What differentiates right actions from wrong ones?

Such fundamental questions have been the subject of philosophical and theological debates for millennia. But, as we all know, and surveys of expert opinion make clear, we are very far from agreement. So… with these most basic questions unresolved, what’s a species to do?

In today’s episode, philosopher Joe Carlsmith — Senior Research Analyst at Open Philanthropy — makes the case that many current debates in philosophy ought to leave us confused and humbled. These are themes he discusses in his PhD thesis, A stranger priority? Topics at the outer reaches of effective altruism.

To help transmit the disorientation he thinks is appropriate, Joe presents three disconcerting theories — originating from him and his peers — that challenge humanity’s self-assured understanding of the world.

The first idea is that we might be living in a computer simulation, because, in the classic formulation, if most civilisations go on to run many computer simulations of their past history, then most beings who perceive themselves as living in such a history must themselves be in computer simulations. Joe prefers a somewhat different way of making the point, but, having looked into it, he hasn’t identified any particular rebuttal to this ‘simulation argument.’

If true, it could revolutionise our comprehension of the universe and the way we ought to live.

The second is the idea that “you can ‘control’ events you have no causal interaction with, including events in the past.” The thought experiment that most persuades him of this is the following:

Perfect deterministic twin prisoner’s dilemma: You’re a deterministic AI system, who only wants money for yourself (you don’t care about copies of yourself). The authorities make a perfect copy of you, separate you and your copy by a large distance, and then expose you both, in simulation, to exactly identical inputs (let’s say, a room, a whiteboard, some markers, etc.). You both face the following choice: either (a) send a million dollars to the other (“cooperate”), or (b) take a thousand dollars for yourself (“defect”).

Joe thinks, in contrast with the dominant theory of correct decision-making, that it’s clear you should send a million dollars to your twin. But as he explains, this idea, when extrapolated outwards to other cases, implies that it could be sensible to take actions in the hope that they’ll improve parallel universes you can never causally interact with — or even to improve the past. That is nuts by anyone’s lights, including Joe’s.

The third disorienting idea is that, as far as we can tell, the universe could be infinitely large. And that fact, if true, would mean we probably have to make choices between actions and outcomes that involve infinities. Unfortunately, doing that breaks our existing ethical systems, which are only designed to accommodate finite cases.

In an infinite universe, our standard models end up unable to say much at all, or give the wrong answers entirely. While we might hope to patch them in straightforward ways, having looked into ways we might do that, Joe has concluded they all quickly get complicated and arbitrary, and still have to do enormous violence to our common sense. For people inclined to endorse some flavour of utilitarianism, Joe thinks ‘infinite ethics’ spell the end of the ‘utilitarian dream‘ of a moral philosophy that has the virtue of being very simple while still matching our intuitions in most cases.

These are just three particular instances of a much broader set of ideas that some have dubbed the “train to crazy town.” Basically, if you commit to always take philosophy and arguments seriously, and try to act on them, it can lead to what seem like some pretty crazy and impractical places. So what should we do with this buffet of plausible-sounding but bewildering arguments?

Joe and Rob discuss to what extent this should prompt us to pay less attention to philosophy, and how we as individuals can cope psychologically with feeling out of our depth just trying to make the most basic sense of the world.

In the face of all of this, Joe suggests that there is a promising and robust path for humanity to take: keep our options open and put our descendants in a better position to figure out the answers to questions that seem impossible for us to resolve today — a position he calls “wisdom longtermism.”

Joe fears that if people believe we understand the universe better than we really do, they’ll be more likely to try to commit humanity to a particular vision of the future, or be uncooperative to others, in ways that only make sense if you were certain you knew what was right and wrong.

In today’s challenging conversation, Joe and Rob discuss all of the above, as well as:

  • What Joe doesn’t like about the drowning child thought experiment
  • An alternative thought experiment about helping a stranger that might better highlight our intrinsic desire to help others
  • What Joe doesn’t like about the expression “the train to crazy town”
  • Whether Elon Musk should place a higher probability on living in a simulation than most other people
  • Whether the deterministic twin prisoner’s dilemma, if fully appreciated, gives us an extra reason to keep promises
  • To what extent learning to doubt our own judgement about difficult questions — so-called “epistemic learned helplessness” — is a good thing
  • How strong the case is that advanced AI will engage in generalised power-seeking behaviour

Get this episode by subscribing to our podcast on the world’s most pressing problems and how to solve them: type ‘80,000 Hours’ into your podcasting app. Or read the transcript below.

Producer: Keiran Harris
Audio mastering: Milo McGuire and Ben Cordell
Transcriptions: Katy Moore

Highlights

The deterministic twin prisoner's dilemma

Joe Carlsmith: The experiment that convinces me most is: Imagine that you are a deterministic AI system and you only care about money for yourself. So you’re selfish. There’s also a copy of you, a perfect copy, and you’ve both been separated very far away — maybe you’re on spaceships flying in opposite directions or something like that. And you’re both going to face the exact same inputs. So you’re deterministic: the only way you’re going to make a different choice is if the computers malfunction or something like that. Otherwise you’re going to see the exact same environment.

In the environment, you have the option of taking $1,000 for yourself: we’ll call that “defecting” — or giving $1 million to the other guy: we’ll call that “cooperating.” The structure is similar to a prisoner’s dilemma. You’re going to make your choice, and then later you’re going to rendezvous.

So what should you do? Well, here’s an argument that I don’t find convincing, but that I think would be the argument offered by someone who thinks you can only control what you can cause. The argument would be something like: your choice doesn’t cause that guy’s choice. He’s far away; maybe he’s lightyears away. You should treat his choice as fixed. And then whatever he chooses, you get more money if you defect. If he defects, then you’ll get nothing by cooperating and $1,000 by defecting. If he sends the money to you, then you’ll get $1.001 million by defecting and $1 million by cooperating. No matter what, it’s better to defect. So you should defect.

But I think that’s wrong. The reason I think it’s wrong is that you are going to make the same choice. You’re deterministic systems, and so whatever you do, he’s going to do it too. In fact, in this particular case — and we can talk about looser versions where the inputs aren’t exactly identical — the connection between you two is so tight that literally, if you want to write something on your whiteboard, he’s going to write that too. If you want him to write on his whiteboard, “Hello, this is a message from your copy,” or something like that, you can just write it on your own whiteboard. When you guys rendezvous, his whiteboard will say the thing that you wrote. You can sit there going, “What do I want?” You really can control what he writes. If you want to draw a particular kitten, if you want to scribble in a certain way, he’s going to do that exact same thing, even though he’s far away and you’re not in causal interaction with him.

To me, I think there’s just a weird form of control you have over what he does that we just need to recognise. So I think that’s relevant to your decision, in the sense that if you start reaching for the defect button, you should be like, “OK, what button is he reaching for right now?” As you move your arm, his arm is moving with you. And so you reach for the defect, he’s about to defect. You could basically be like, “What button do I want him to press?” and just press it yourself and he’ll press it. So to me, it feels pretty easy to press the “send myself $1 million” button.

Newcomb's problem

Joe Carlsmith: The classic thought experiment that people often focus on, though I don’t think it’s the most dispositive, is this case called Newcomb’s problem, where Omega is this kind of superintelligent predictor of your actions. Omega puts you in the situation where you face two boxes: one of them is opaque, one of them is transparent. The transparent box has $1,000, the opaque box has either $1 million or nothing.

Omega puts $1 million in the box if Omega predicts that you will take only the opaque box and leave the $1,000 alone (even though you can see it right there). And Omega puts nothing in the opaque box if Omega predicts that you will take both boxes.

So the same argument arises for the causal decision theory (CDT). For CDT, the thought is: you can’t change what’s in the boxes; the boxes are already fixed. Omega already made her prediction. And no matter what, you’ll get more money if you take the $1,000. If there was some dude over there who could see the boxes, and you were like, “Hey, see what’s in the box, and what choice will give me more money?” — you don’t even need to ask, because you know it’s always just take the extra $1,000.

But I think you should one-box in this case, because I think if you one-box then it will have been the case that Omega predicted that you one-boxed, because Omega is always right about the predictions, and so there will be the million.

I think a way to pump this intuition for me that matters is imagining doing this case over and over with Monopoly money. Each time, I try taking two boxes and I notice the opaque box is empty. I take one box, opaque box is full. I do this over and over. I try doing intricate mental gymnastics. I do like a somersault, I take the boxes. I flip a coin and take the box — well, flipping a coin, Omega has to be really good, so we can talk about that.

If Omega is sufficiently good at predicting your choice, then just like every time, what you eventually will learn is that you effectively have a type of magical power. Like I can just wave my arms over the opaque box and say, “Shazam! I hereby declare that this box shall be full with $1 million. Thus, as I one-box, it is so.” Or if I can be like, “Shazam! I declare that the box shall be empty. Like thus, as I two-box, it is so.” I think eventually you just get it in your bones, such that when you finally face the real money, I guess I expect this feeling of like, “I know this one, I’ve seen this before.” I kind of know what’s going to happen at some more visceral expectation level if I one-box or two-box, and I know which one leaves me rich.

The idea of 'wisdom longtermism'

Joe Carlsmith: In the thesis, I have this distinction between what I call “welfare longtermism” and “wisdom longtermism.”

Welfare longtermism is roughly the idea that our moral focus should be on specifically the welfare of the finite number of future people who might live in our lightcone.

And wisdom longtermism is a broader idea that our moral focus should be reaching a kind of wise and empowered civilisation in general. I think of welfare longtermism as a lower bound on the stakes of the future more broadly — at the very least, the future matters at least as much as the welfare of the future people matters. But to the extent there are other issues that might be game changing or even more important, I think the future will be in a much better position to deal with those than we are, at least if we can make the right future. …

There’s a line in Nick Bostrom’s book Superintelligence about something like, if you’re digging a hole but there’s a bulldozer coming, maybe you should wonder about the value of digging a hole. I also think we’re plausibly on the cusp of pretty radical advances in humanity’s understanding of science and other things, where there might be a lot more leverage and a lot more impact from making sure that the stuff you’re doing matters specifically to how that goes, rather than to just kind of increasing our share of knowledge overall. You want to be focusing on decisions we need to make now that we would have wanted to make differently.

So it looks good to me, the focus on the long-term future. I want to be clear that I think it’s not perfectly safe. I think a thing we just generally need to give up is the hope that we will have a theory that makes sense of everything — such that we know that we’re acting in the safe way, that it’s not going to go wrong, and it’s not going to backfire. I think there can be a way that people look to philosophy as a kind of mode of Archimedean orientation towards the world — that will tell them how to live, and justify their actions, and give a kind of comfort and structure — that I think at some point we need to give up.

On the classic drowning child thought experiment

Joe Carlsmith: I think what that can do is sort of break your conception of yourself as a kind of morally sincere agent — and at a deeper level, it can break your conception of society and your peers, or society as a morally sincere endeavour, in some sense. Things can start to seem kind of sick at their core, and we’re just all looking away from the sense in which we’re horrible people, or something like that.

I actually think part of the attraction of communities like the effective altruism community, for many people, is it sort of offers a vision of a recovery of a certain moral sincerity. You find this community, and actually, these people are maybe trying — more so than you had encountered previously — to really take this stuff seriously, to act rightly by its lights. And I think that can be a powerful idea.

But there is this then this thing comes up, where it’s like, “OK, but how much is enough? Exactly how far do you go with this? What is demanded?” I think people can end up in a mode where their relationship with this is what you said: it’s about not being bad, not sucking — like you thought “maybe I sucked” and now you’re really trying not to suck — you don’t want to be kind of punished or worthy of reproach. It’s a lot about something like guilt. I think that the thought experiment itself is sort of about calling you an asshole. It’s like, “If you didn’t save the child, you’re an asshole.” So everyone’s an asshole.

Rob Wiblin: But look at how you’re living the rest of your life.

Joe Carlsmith: Exactly. I think sometimes you’re an asshole, and we need to be able to notice that. But also, for one thing, it’s actually not clear to me that you’re an asshole for not donating to a charity — that’s not something that we normally think — and I think we should notice that. Also, it doesn’t seem to me like a very healthy or wholehearted basis for engaging with this stuff. I think there are alternatives that are better.

On why bother being good

Rob Wiblin: What are the personal values of yours that motivate you to care to try to help other people, even when it’s kind of a drag, or demoralising, or it feels like you’re not making progress?

Joe Carlsmith: One value that’s important to me, though it’s a little hard to communicate, is something like “looking myself and the world in the eye.” It’s about kind of taking responsibility for what I’m doing; what kind of force I’m going to be in the world in different circumstances; trying to understand myself, understand the world, and understand what in fact I am in relationship to it — and to choose that and endorse that with a sense of agency and ownership.

One way that shows up for me in the context of helping others is trying to take really seriously that my mind is not the world — that the limits of my experience are not the limits of what’s real.

In particular, I wake up and I’m just like Joe every day — every day it’s just Joe stuff; I wake up in the sphere of Joe around me. So Joe stuff is really salient and vivid: there’s this sort of zone — it’s not just my experience, there’s also, like, people and my kitchen — of things that are kind of vivid.

And then there’s a part of the world that my brain is doing a lot less to model — but that doesn’t mean the thing is less real; it’s just my brain is putting in a lot fewer resources to modelling it. So things like other people are just as real as I am. When something happens to me, at least from a certain perspective, that’s not a fundamentally different type of event than when something happens to someone else. So part of living in the real world for me is living in light of that fact, and trying to really stay in connection with just that other people are just as real as I am.

More broadly, when we talk about forms of altruism that are more fully impartial — or trying to ask questions like, “What is really the most good I can do?” — for me, that’s a lot about trying to live in the world as a whole, not artificially limiting which parts of the world I’m treating as real or significant. Because I don’t live in just one part of the world. When I act, I act in a way that affects the whole world, or that can affect the whole world. There’s some sense in which I want to be not imposing some myopia upfront on what is in scope for me. I think those are both core for me in terms of what helping others is about.

Articles, books, and other media discussed in the show

Joe’s work:

Thought experiments, strong longtermism, and other “crazy train” work:

AI alignment arguments:

Other 80,000 Hours podcast episodes:

Related episodes

About the show

The 80,000 Hours Podcast features unusually in-depth conversations about the world's most pressing problems and how you can use your career to solve them. We invite guests pursuing a wide range of career paths — from academics and activists to entrepreneurs and policymakers — to analyse the case for and against working on different issues and which approaches are best for solving them.

The 80,000 Hours Podcast is produced and edited by Keiran Harris. Get in touch with feedback or guest suggestions by emailing [email protected].

What should I listen to first?

We've carefully selected 10 episodes we think it could make sense to listen to first, on a separate podcast feed:

Check out 'Effective Altruism: An Introduction'

Subscribe here, or anywhere you get podcasts:

If you're new, see the podcast homepage for ideas on where to start, or browse our full episode archive.