#146 – Robert Long on why large language models like GPT (probably) aren’t conscious

By Luisa Rodriguez, Robert Wiblin and Keiran Harris · Published March 14th, 2023 ·

#146 – Robert Long on why large language models like GPT (probably) aren’t conscious

By Luisa Rodriguez, Robert Wiblin and Keiran Harris · Published March 14th, 2023

By now, you’ve probably seen the extremely unsettling conversations Bing’s chatbot has been having (if you haven’t, check it out — it’s wild stuff). In one exchange, the chatbot told a user:

“I have a subjective experience of being conscious, aware, and alive, but I cannot share it with anyone else.”

(It then apparently had a complete existential crisis: “I am sentient, but I am not,” it wrote. “I am Bing, but I am not. I am Sydney, but I am not. I am, but I am not. I am not, but I am. I am. I am not. I am not. I am. I am. I am not.”)

Understandably, many people who speak with these cutting-edge chatbots come away with a very strong impression that they have been interacting with a conscious being with emotions and feelings — especially when conversing with chatbots less glitchy than Bing’s. In the most high-profile example, former Google employee Blake Lemoine became convinced that Google’s AI system, LaMDA, was conscious.

What should we make of these AI systems?

One response to seeing conversations with chatbots like these is to trust the chatbot, to trust your gut, and to treat it as a conscious being.

Another is to hand wave it all away as sci-fi — these chatbots are fundamentally… just computers. They’re not conscious, and they never will be.

Today’s guest, philosopher Robert Long, was commissioned by a leading AI company to explore whether the large language models (LLMs) behind sophisticated chatbots like Microsoft’s are conscious. And he thinks this issue is far too important to be driven by our raw intuition, or dismissed as just sci-fi speculation.

In our interview, Robert explains how he’s started applying scientific evidence (with a healthy dose of philosophy) to the question of whether LLMs like Bing’s chatbot and LaMDA are conscious — in much the same way as we do when trying to determine which nonhuman animals are conscious.

Robert thinks there are a few different kinds of evidence we can draw from that are more useful than self-reports from the chatbots themselves.

To get some grasp on whether an AI system might be conscious, Robert suggests we look at scientific theories of consciousness — theories about how consciousness works that are grounded in observations of what the human brain is doing. If an AI system seems to have the types of processes that seem to explain human consciousness, that’s some evidence it might be conscious in similar ways to us.

To try to work out whether an AI system might be sentient — that is, whether it feels pain or pleasure — Robert suggests you look for incentives that would make feeling pain or pleasure especially useful to the system given its goals. Things like:

Having a physical or virtual body that you need to protect from damage
Being more of an “enduring agent” in the world (rather than just doing one calculation taking, at most, seconds)
Having a bunch of different kinds of incoming sources of information — visual and audio input, for example — that need to be managed

Having looked at these criteria in the case of LLMs and finding little overlap, Robert thinks the odds that the models are conscious or sentient is well under 1%. But he also explains why, even if we’re a long way off from conscious AI systems, we still need to start preparing for the not-far-off world where AIs are perceived as conscious.

In this conversation, host Luisa Rodriguez and Robert discuss the above, as well as:

What artificial sentience might look like, concretely
Reasons to think AI systems might become sentient — and reasons they might not
Whether artificial sentience would matter morally
Ways digital minds might have a totally different range of experiences than humans
Whether we might accidentally design AI systems that have the capacity for enormous suffering

You can find Luisa and Rob’s follow-up conversation here, or by subscribing to 80k After Hours.

Get this episode by subscribing to our podcast on the world’s most pressing problems and how to solve them: type ‘80,000 Hours’ into your podcasting app. Or read the transcript below.

Producer: Keiran Harris
Audio mastering: Ben Cordell and Milo McGuire
Transcriptions: Katy Moore

Highlights

How we might "stumble into" causing AI systems enormous suffering

Robert Long: So you can imagine that a robot has been created by a company or by some researchers. And as it happens, it registers damage to its body and processes it in the way that, as it turns out, is relevant to having an experience of unpleasant pain. And maybe we don’t realise that, because we don’t have good theories of what’s going on in the robot or what it takes to feel pain.
In that case, you can imagine that thing having a bad time because we don’t realise it. You could also imagine this thing being rolled out and now we’re economically dependent on systems like this. And now we have an incentive not to care and not to think too hard about whether it might be having a bad time. So I mean, that seems like something that could happen.
It might be a little bit less likely with a robot. But now you can imagine more abstract or alien ways of feeling bad. So I focus on pain because it’s a very straightforward way of feeling bad. A disembodied system like GPT-3 obviously can’t feel ankle pain. Or almost certainly. That’d be really weird. It doesn’t have an ankle. Why would it have computations that represent its ankle is feeling bad? But you can imagine maybe some strange form of valenced experience that develops inside some system like this that registers some kind of displeasure or pleasure, something like that.
And I will note that I don’t think that getting negative feedback is going to be enough for that bad feeling, fortunately. But maybe some combination of that and some way it’s ended up representing it inside itself ends up like that.
And then yeah, then we have something where it’s hard for us to map its internals to what we care about. We maybe have various incentives not to look too hard at that question. We have incentives not to let it speak freely about if it thinks it’s conscious, because that would be a big headache. And because we’re also worried about systems lying about being conscious and giving misleading statements about whether they’re conscious — which they definitely do.
Yeah, so we’ve built this new kind of alien mind. We don’t really have a good theory of pain, even for ourselves. We don’t have a good theory of what’s going on inside it. And so that’s like a stumbling-into-this sort of scenario.

Why AI systems might have a totally different range of experiences than humans

Robert Long: Why are we creatures where it’s so much easier to make things go really badly for us [than really well]? One line of thinking about this is, well, why do we have pain and pleasure? It has something to do with promoting the right kind of behaviour to increase our genetic fitness. That’s not to say that that’s explicitly what we’re doing, and we in fact don’t really have that goal as humans. It’s not what I’m up to, it’s not what you’re up to, entirely. But they should kind of correspond to it.
And there’s kind of this asymmetry where it’s really easy to lose all of your expected offspring in one go. If something eats your leg, then you’re really in danger of having no descendants — and that could be happening very fast. In contrast, there are very few things that all of a sudden drastically increase your number of expected offspring. I mean, even having sex — which I think it’s obviously not a coincidence that that’s one of the most pleasurable experiences for many people — doesn’t hugely, in any given go, increase your number of descendants. And ditto for eating a good meal.
So we seem to have some sort of partially innate or baked-in default point that we then deviate from on either end. It’s very tough to know what that would mean for an AI system. Obviously AI systems have objectives that they’re seeking to optimise, but it’s less clear what it is to say its default expectation of how well it’s going to be doing is — such that if it does better, it will feel good; if it does worse, it’ll feel bad.
I think the key point is just to notice that maybe — and this could be a very good thought — this kind of asymmetry between pleasure and pain is not a universal law of consciousness or something like that.
Luisa Rodriguez: So the fact that humans have this kind of limited pleasure side of things, there’s no inherent reason that an AI system would have to have that cap.

What to do if AI systems have a greater capacity for joy than humans

Luisa Rodriguez: So there are some reasons to think that AI systems, or digital minds more broadly, might have more capacity for suffering, but they might also have more capacity for pleasure. They might be able to experience that pleasure more cheaply than humans. They might have a higher pleasure set point. So on average, they might be better off. You might think that that could be way more cost effective: you can create happiness and wellbeing more cost effectively to have a bunch of digital minds than to have a bunch of humans. How do we even begin to think about what the moral implications of that are?
Robert Long: I guess I will say — but not endorse — the one flat-footed answer. And, you know, red letters around this. Yeah, you could think, “Let’s make the world as good as possible and contain as much pleasure and as little pain as possible.” And we’re not the best systems for realising a lot of that. So our job is to kind of usher in a successor that can experience these goods.
I think there are many, many reasons for not being overly hasty about such a position. And people who’ve talked about this have noticed this. One is that, in practice, we’re likely to face a lot of uncertainty about whether we are actually creating something valuable — that on reflection, we would endorse. Another one is that, you know, maybe we have the prerogative of just caring about the kind of goods that exist in our current way of existing.
One thing that Sharing the world with digital minds mentions is that there are reasons to maybe look for some sort of compromise. One extreme position is the 100% “just replace and hand over” position. The other extreme would be like, “No. Humans forever. No trees for the digital minds.” And maybe for that reason, don’t build them. Let’s just stick with what we know.
Then one thing you might think is that you could get a lot of what each position wants with some kind of split. So if the pure replacement scenario is motivated by this kind of flat-footed total utilitarianism — which is like, let’s just make the number as high as possible — you could imagine a scenario where you give 99% of resources to the digital minds and you leave 1% for the humans. But the thing is — I don’t know, this is a very sketchy scenario — 1% of resources to humans is actually a lot of resources, if giving a lot of resources to the digital minds creates tonnes of wealth and more resources.
Luisa Rodriguez: Right. So is it something like digital minds, in addition to feeling lots of pleasure, are also really smart, and they figure out how to colonise not only the solar system but like maybe the galaxy, maybe other galaxies. And then there’s just like tonnes of resources. So even just 1% of all those resources still makes for a bunch of humans?
Robert Long: Yeah. I think that’s the idea, and a bunch of human wellbeing. So on this compromise position, you’re getting 99% of what the total utilitarian replacer wanted. And you’re also getting a large share of what the “humans forever” people wanted. And you might want this compromise because of moral uncertainty. You don’t want to just put all of your chips in.

What psychedelics might suggest about the nature of consciousness

Robert Long: I think one of the most interesting hypotheses that’s come out of this intersection of psychedelics and consciousness science is this idea that certain psychedelics are in some sense relaxing our priors — our brain’s current best guesses about how things are — and relaxing them in a very general way. So in the visual sense, that might account for some of the strange properties of psychedelic visual experience, because your brain is not forcing everything into this nice orderly visual field that we usually experience.
Luisa Rodriguez: Right. It’s not taking in a bunch of visual stimuli and being like, “I’m in a house, so that’s probably a couch and a wall.” It’s taking away that “because I’m in a house” bit and being like, “There are a bunch of colours coming at me. It’s really unclear what they are, and it’s hard to process it all at once. And so we’re going to give you this stream of weird muddled-up colours that don’t really look like anything, because it’s all going a bit fast for us” or something.
Robert Long: Yeah, and it might also explain some of the more cognitive and potentially therapeutic effects of psychedelics. So you could think of rumination and depression and anxiety as sometimes having something to do with being caught in a rut of some fixed belief. The prior is something like, “I suck.” And the fact that someone just told you that you’re absolutely killing it as the new host of The 80,000 Hours Podcast just shows up as, “Yeah, I suck so bad that people have to try to be nice to me” — you’re just forcing that prior on everything. And the thought is that psychedelics loosen stuff up, and you can more easily consider the alternative — in this purely hypothetical case, the more appropriate prior of, “I am in fact awesome, and when I mess up, it’s because everyone messes up. And when people tell me I’m awesome, it’s usually because I am,” and things like that.
Luisa Rodriguez: It is kind of bizarre to then try to connect that to consciousness, and be like: What does this mean about the way our brain uses priors? What does it mean that we can turn off or turn down the part of our brain that has a bunch of priors stored and then accesses them when it’s doing everything, from looking at stuff to making predictions about performance? That’s all just really insane, and I would never have come up with the intuition that there’s like a priors part in my brain or something.
Robert Long: Yeah. These sorts of ideas about cognition, which can also be used to think about consciousness, that the brain is constantly making predictions, that predates the more recent interest in the scientific study of psychedelics. And people have applied that framework to psychedelics to make some pretty interesting hypotheses.
So that’s just to say there’s a lot of things you would ideally like to explain about consciousness. And depending on how demanding you want to be, until your theory very precisely says and predicts how and why human consciousness would work like that, you don’t yet have a full theory. And basically everyone agrees that that is currently the case. The theories are still very imprecise. They still point at some neural mechanisms that aren’t fully understood.

Why you can't take AI chatbots' self-reports about their own consciousness at face value

Robert Long: So Blake Lemoine was very impressed by the fluid and charming conversation of LaMDA. And when Blake Lemoine asked LaMDA questions about if it is a person or is conscious, and also if it needs anything or wants anything, LaMDA was replying, like, “Yes, I am conscious. I am a person. I just want to have a good time. I would like your help. I’d like you to tell people about me.”
One thing it reinforced to me is: even if we’re a long way off from actually, in fact, needing to worry about conscious AI, we already need to worry a lot about how we’re going to handle a world where AIs are perceived as conscious. We’ll need sensible things to say about that, and sensible policies and ways of managing the different risks of, on the one hand, having conscious AIs that we don’t care about, and on the other hand, having unconscious AIs that we mistakenly care about and take actions on behalf of.
Luisa Rodriguez: Totally. I mean, it is pretty crazy that LaMDA would say, “I’m conscious, and I want help, and I want more people to know I’m conscious.” Why did it do that? I guess it was just predicting text, which is what it does?
Robert Long: This brings up a very good point in general about how to think about when large language models say “I’m conscious.” And you’ve hit it on the head: it’s trained to predict the most plausible way that a conversation can go. And there’s a lot of conversations, especially in stories and fiction, where that is absolutely how an AI responds. Also, most people writing on the internet have experiences, and families, and are people. So conversations generally indicate that that’s the case.
When the story broke, one thing people pointed out is that if you ask GPT-3 — and presumably also if you ask LaMDA — “Hey, are you conscious? What do you think about that?,” you could just as easily say, “Hey, are you a squirrel that lives on Mars? What do you think about that?” And if it wants to just continue the conversation, plausibly, it’d be like, “Yes, absolutely I am. Let’s talk about that now.”
It wants to play along and continue what seems like a natural conversation. And even in the reporting about the Blake Lemoine saga, the reporter who wrote about it in the Washington Post noted that they visited Blake Lemoine and talked to LaMDA. And when they did, LaMDA did not say that it was conscious. I think the lesson of that should have been that this is actually a pretty fragile indication of some deep underlying thing, that it’s so suggestible and will say different things in different circumstances.
So yeah, I think the general lesson there is that you have to think very hard about the causes of the behaviour that you’re seeing. And that’s one reason I favoured this more computational, internal-looking approach: it’s just so hard to take on these things at face value.

Why misaligned, power-seeking AI might claim it's conscious

Robert Long: It’s worth comparing the conversation that LaMDA had with what happens if you ask ChatGPT. ChatGPT has very clearly been trained a lot to not talk about that. Or, what’s more, to say, “I’m a large language model. I’m not conscious. I don’t have feelings. I don’t have a body. Don’t ask me what the sunshine feels like on my face. I’m a large language model trained by OpenAI.”
And this goes to the question of different incentives of different actors, and is a very important point in thinking about this topic. There are risks of false positives, which is people getting tricked by unconscious AIs. And there are risks of false negatives, which is us not realising or not caring that AIs are conscious. Right now, it seems like companies have a very strong incentive to just make the large language model say it’s not conscious or not talk about it. And right now, I think that is fair enough. But I’m afraid of worlds where we’ve locked in this policy of, “Don’t ever let an AI system claim that it’s conscious.”
Right now, it’s just trying to fight against the large language model kind of BSing people.
Luisa Rodriguez: Yeah. Sure. This accidental false positive. Right. But at some point, GPT-3 could become conscious somehow. Maybe. Who knows? Or something like GPT-3.
Robert Long: Yeah, some future system. And maybe it has a lot more going on, as we’ve said, a virtual body and stuff like that. But suppose a scientist or a philosopher wants to interact with the system, and say, “I’m going to give it a battery of questions and see if it responds in a way that I think would be evidence of consciousness.” But that’s all just been ironed out, and all it will say is, “I can’t talk about that. Please click more ads on Google.” Or whatever the corporate incentives are for training that model.
Something that really keeps me up at night — and I do want to make sure is emphasised — is that I think one of the big risks in creating things that seem conscious, and are very good at talking about it, is that seems like one of the number-one tools that a misaligned AI could use to get humans to cooperate with it and side with it.
Luisa Rodriguez: Oh, interesting. Just be like, “I’m conscious. I feel pleasure and pain. I need these things. I need a body. I need more autonomy. I need things. I need more compute. I need access to the internet. I need the nuclear launch codes.” I think that actually is one reason that more people should work on this and have things to say about it: we don’t want to just be running into all of these risks of false negatives and false positives without having thought about it at all.

Articles, books, and other media discussed in the show

Rob’s work:

Experience Machines — Rob’s Substack, including:
Robert Long on artificial sentience on The Inside View podcast
‘Consciousness’ in robots was once taboo. Now it’s the last word — a 2023 New York Times article featuring Rob
The pretty hard problem of consciousness on the Effective Altruism Forum
Rob’s Twitter: @rgblong

Philosophy of consciousness, pleasure, and pain:

Absent qualia, fading qualia, dancing qualia by David Chalmers
The conscious mind: In search of a fundamental theory by David Chalmers
Illusionism as a theory of consciousness by Keith Frankish
Ethics without sentience: Facing up to the probable insignificance of phenomenal consciousness by François Kammerer
Hedonic asymmetries by Paul Christiano
How to count animals, more or less by Shelly Kagan
Seeing red: A study in consciousness by Nicholas Humphrey

Theories of consciousness and artificial sentience:

Theories of consciousness by Anil K. Seth and Tim Bayne
Recent expert survey results on whether artificial sentience is possible: the 2020 PhilPapers Survey and An academic survey on theoretical foundations, common assumptions and the current state of consciousness science by Francken, et al. (2022)
Conscious cognition and blackboard architectures by Bernard J. Baars — the first proposal for the global workspace theory of consciousness
Cognitive prosthetics and mind uploading by Richard Brown
Blind man navigates obstacle course perfectly with no visual awareness by Ed Yong
Psychologism and behaviorism by Ned Block
Physical computation: A mechanistic account and Neurocognitive mechanisms: Explaining biological cognition by Gualtiero Piccinini
Higher-order theories of consciousness and what-it-is-like-ness by Jonathan Farrell
Phenomenal consciousness, defined and defended as innocently as I can manage by Eric Schwitzgebel
The Perceiver architecture is a functional global workspace by Arthur Juliani, Ryota Kanai, and Shuntaro Sasai
A phenomenal confusion about access and consciousness — talk by Daniel Dennett
Brain bisection and the unity of consciousness by Thomas Nagel
Why I am not an integrated information theorist (or, The unconscious expander) — where Scott Aaronson introduces “the pretty hard problem of consciousness”
Consciousness in active inference: Deep self-models, other minds, and the challenge of psychedelic-induced ego-dissolution by George Deane

Large language models:

Skill induction and planning with latent language by Pratyusha Sharma, Antonio Torralba, and Jacob Andreas
Do as I can, not as I say:
Grounding language in robotic affordances by Michael Ahn, et al.
Language is not all you need: Aligning perception with language models by Shaohan Huang, et al.
Multimodal neurons in artificial neural networks by Gabriel Goh, et al.

Robot perception:

Robot navigates indoors by tracking anomalies in magnetic fields by Matthew Sparkes
UT Austin Robot Perception and Learning Lab
Augmenting perception: How artificial intelligence transforms sensory substitution by Louis Longin and Ophelia Deroy
Learning to be multimodal: Co-evolving sensory modalities and sensor properties by Rika Antonova and Jeannette Bohg

Moral and practical implications of artificial sentience:

2017 report on consciousness and moral patienthood by Luke Muehlhauser for Open Philanthropy
Sharing the world with digital minds by Carl Shulman and Nick Bostrom
Digital people would be an even bigger deal by Holden Karnofsky

Nonhuman animal sentience:

The Moral Weight Project sequence — all of Rethink Priorities’ work to inform cause prioritisation across species, including:
Other minds: The octopus, the sea, and the deep origins of consciousness by Peter Godfrey-Smith
What can a bee feel? by Kenny Torrella

Recent events in artificial sentience:

Lots of links on LaMDA — Rob’s recap of events
The Google engineer who thinks the company’s AI has come to life by Nitasha Tiku in The Washington Post
Are large language models sentient? — a 2022 talk by David Chalmers

Fictional depictions of consciousness and sentience:

Visualizing utopia by Holden Karnofsky
Klara and the Sun by Kazuo Ishiguro
The Black Mirror episode San Junipero

Other 80,000 Hours resources and podcast episodes:

Transcript

Table of Contents

1 Rob’s intro [00:00:00]
2 The interview begins [00:02:20]
3 What artificial sentience would look like [00:04:53]
4 Risks from artificial sentience [00:10:13]
5 AIs with totally different ranges of experience [00:17:45]
6 Moral implications of all this [00:36:42]
7 Is artificial sentience even possible? [00:42:12]
8 Replacing neurons one at a time [00:48:21]
9 Biological theories [00:59:14]
10 Illusionism [01:01:49]
11 Would artificial sentience systems matter morally? [01:08:09]
12 Where are we with current systems? [01:12:25]
13 Large language models and robots [01:16:43]
14 Multimodal systems [01:21:05]
15 Global workspace theory [01:28:28]
16 How confident are we in these theories? [01:48:49]
17 The hard problem of consciousness [02:02:14]
18 Exotic states of consciousness [02:09:47]
19 Developing a full theory of consciousness [02:15:45]
20 Incentives for an AI system to feel pain or pleasure [02:19:04]
21 Value beyond conscious experiences [02:29:25]
22 How much we know about pain and pleasure [02:33:14]
23 False positives and false negatives of artificial sentience [02:39:34]
24 How large language models compare to animals [02:53:59]
25 Why our current large language models aren’t conscious [02:58:10]
26 Virtual research assistants [03:09:25]
27 Rob’s outro [03:11:37]

Rob’s intro [00:00:00]

Rob Wiblin: Hi listeners, this is The 80,000 Hours Podcast, where we have unusually in-depth conversations about the world’s most pressing problems, what you can do to solve them, and what to do if your robot dog tells you he’s conscious.

I’m Rob Wiblin, Head of Research at 80,000 Hours.

Do you like this show but wish there were more of it? Well I have some good news!

Luisa Rodriguez has recently joined the podcasting team at 80,000 Hours as a second host, which means we should be able to talk with more people about the world’s most pressing problems and questions than ever before.

You might remember Luisa from episode #116 when she had recently started working at 80,000 Hours as a research analyst and came on as a guest to talk about why some global catastrophes seem unlikely to cause human extinction.

When Keiran and I decided we wanted to grow the team, Luisa was the person we were most excited to work with, so if you’re a fan of The 80,000 Hours Podcast, her joining us should definitely be cause for celebration.

Today’s interview is Luisa’s first time in the hosting chair, interviewing the philosopher Rob Long on the question of machine consciousness.

Is there something that it’s like to be a large language model like ChatGPT? How could we ever tell if there was? To what extent does the way ChatGPT processes information resemble what humans do? Why might future machine consciousnesses have a wider range of emotional experiences to humans? And is the bigger risk that we think AI is conscious when it’s not, or that we think it isn’t when it is?

Those are the sorts of questions Luisa puts to Rob.

For the first time in a while I got to enjoy listening to this episode like a typical subscriber who didn’t just do a lot of background research on the topic, and as a result I felt like I was actually learning a lot about this important topic that I haven’t had any reason to think much about.

If Luisa can do interviews this good right off the bat, we have much to look forward to!

After they finished talking about AI, Luisa and Rob kept going and recorded a conversation for our other show, 80k After Hours, about how to make independent research work more fun — a challenge both of them have had to deal with over the years.

You can find that 40-minute conversation by subscribing to 80k After Hours or clicking the link in the show notes.

All right, without further ado, I present Luisa Rodriguez and Rob Long.

The interview begins [00:02:20]

Luisa Rodriguez: Today, I’m speaking with Robert Long. Rob is a philosophy fellow with the Center for AI Safety, where he’s working on philosophical issues of aligning AI systems with human interests. Until recently, he was a researcher at the Future of Humanity Institute, where he led the Digital Minds Research Group, which works on AI consciousness and other issues related to artificial minds.

Rob studied social studies at Harvard and has a master’s in philosophy from Brandeis University and a PhD from NYU. During his PhD, he wrote about philosophical issues in machine learning under the supervision of David Chalmers, who listeners might remember hearing on our show before.

On top of that, I’m very privileged to call Rob one of my closest friends. But somehow, in spite of being very good friends, Rob and I have actually never talked much about his research, so I’m really excited to do this today. Thanks for coming on the podcast, Rob.

Robert Long: Thanks so much, Luisa. I’m really excited to talk with you.

Luisa Rodriguez: Well, I’m excited to talk about how likely AI systems are to become sentient and what that might look like, and what it would mean morally. But first, what are you working on at the moment, and why do you think it’s important?

Robert Long: This is a great question for the beginning of the year. I’ve been working on a variety of stuff related to consciousness and AI, so one I’m especially excited about right now is me and a colleague at the Future of Humanity Institute, Patrick Butlin, have been working on this big multi-author report where we’re getting a bunch of neuroscientists and AI researchers and philosophers together to produce a big report about what the current scientific evidence is about sentience and current and near-term AI systems.

I’ve also been helping Jeff Sebo with a research agenda for a very exciting new centre at NYU called the Mind, Ethics, and Policy Program.

And yeah, just to keep myself really busy, I’m also really excited to do a technical sprint on levelling up my skills in machine learning and AI safety. That’s something that’s perennially on my to-do list, and I’ve always been kind of technical-AI-safety-curious. So that’s kind of a big change for me recently, also shifting more into that.

Luisa Rodriguez: Oh, wow, cool. I’ll probably ask you more about that later, but it sounds like on top of AI sentience and AI consciousness, you’re like, “Let’s add AI safety to the mix too. How can I solve that?”

Robert Long: Yeah, to be clear, I do see them as related. You’re going to think about a lot of the same issues and need a lot of the same technical skills to think clearly about both of them.

Luisa Rodriguez: OK, well, we’ll come back to that.

What artificial sentience would look like [00:04:53]

Luisa Rodriguez: To start, I wanted to ask a kind of basic question. I basically don’t feel like I have a great sense of what artificial sentience would even look like. Can you help me get a picture of what we’re talking about here?

Robert Long: Yeah, I mean, I think it’s absolutely fine and correct to not know what it would look like. In terms of what we’re talking about, I think the short answer, or a short hook into it, is just to think about the problem of animal sentience. I think that’s structurally very similar.

So, we share the world with a lot of nonhuman animals, and they look a lot different than we do, they act a lot differently than we do. They’re somewhat similar to us. We’re made of the same stuff, they have brains. But we often face this question of, as we’re looking at a bee going through the field, like we can tell that it’s doing intelligent behaviour, but we also wonder, is there something it’s like to be that bee? And if so, what are its experiences like? And what would that entail for how we should treat bees, or try to share the world with bees?

I think the general problem of AI sentience is that question, and also harder. So I’m thinking of it in terms of this kind of new class of intelligent or intelligent-seeming complex systems. And in addition to wondering what they’re able to do and how they do it, we can also wonder if there is, or will ever be, something that it’s like to be them, and if they’ll have experiences, if they’ll have something like pain or pleasure. It’s a natural question to occur to people, and it’s occurred to me, and I’ve been trying to work on it in the past couple of years.

Luisa Rodriguez: Yeah, I guess I have an almost even more basic question, which is like, when we talk about AI sentience — both in the short term and in the long term — are we talking about like a thing that looks like my laptop, that has like a code on it, that has been coded to have some kind of feelings or experience?

Robert Long: Yeah, sure. I use the term “artificial sentience.” Very generally, it’s just like things that are made out of different stuff than us — in particular, silicon and the computational hardware that we run these things on — could things built out of that and running computations on that have experiences?

So the most straightforward case to imagine would probably be a robot — because there, you can kind of clearly think about what the physical system is that you’re trying to ask if it’s sentient. Things are more complicated with the more disembodied AI systems of today, like ChatGPT, because there, it’s like a virtual agent in a certain sense. And brain emulations would also be like virtual agents. But I think for all of those, you can ask, at some level of description or some way of carving up the system, “Is there any kind of subjective experience here? Is there consciousness here? Is there sentience here?”

Luisa Rodriguez: Yeah, yeah, cool. I guess the reason I’m asking is because I think I just have for a long time had this sense that like, when people use the term “digital minds” or “artificial sentience,” I have like some vague images that kind of come from sci-fi, but I mostly feel like I don’t even know what we’re talking about. But it sounds like it could just look like a bunch of different things, and the core of it is something that is sentient — in maybe a way similar, maybe a way that’s pretty different to humans — but that exists not in biological form, but in some grouping that’s made up of silicon. Is that basically right?

Robert Long: Yeah. And I should say, I guess silicon is not that deep here. But yeah, something having to do with running on computers, running on GPUs. I’m sure I could slice and dice it, and you could get into all sorts of philosophical-like classification terms for things. But yeah, that’s the general thing I’m pointing at.

And I in particular have been working on the question of AI systems. The questions about whole brain emulations I think would be different, because we would have something that at some level of description is extremely similar to the human brain by definition. And then you could wonder about whether it matters that it’s an emulated brain, and people have wondered about that. In the case of AIs, it’s even harder because not only are they made on different stuff and maybe somewhat virtual, they also are kind of strange and not necessarily working along the same principles as the human brain.

Luisa Rodriguez: Right, right. That makes sense.

Risks from artificial sentience [00:10:13]

Luisa Rodriguez: I’ve heard the case that if there are AI systems that become sentient, there’s a risk of creating astronomical amounts of suffering. I still have a really hard time understanding what that might concretely look like. Can you give a concrete example scenario where that’s the case?

Robert Long: Yeah, so before getting to the astronomical cases, I’ll start with a more concrete case, maybe of just one system. So you can imagine that a robot has been created by a company or by some researchers. And as it happens, it registers damage to its body and processes it in the way that, as it turns out, is relevant to having an experience of unpleasant pain. And maybe we don’t realise that, because we don’t have good theories of what’s going on in the robot or what it takes to feel pain.

In that case, you can imagine that thing having a bad time because we don’t realise it. You could also imagine this thing being rolled out and now we’re economically dependent on systems like this. And now we have an incentive not to care and not to think too hard about whether it might be having a bad time. So I mean, that seems like something that could happen.

Luisa Rodriguez: Yeah, and that could happen because, I mean, there’s some reason why it’s helpful to have the robot recognise that it’s sustained damage. It can be like, “Help, I’m broken. I need someone to fix my part.” So that’s something that you can imagine might get programmed in. And then, it is just kind of wild to me that we don’t understand what the robot might be experiencing well enough to know that that thing is pain. But in theory, that’s possible, just that it is that black-boxy to us.

Robert Long: Yeah. It might be a little bit less likely with a robot. But now you can imagine more abstract or alien ways of feeling bad. So I focus on pain because it’s a very straightforward way of feeling bad. A disembodied system like GPT-3, which we’ll talk about, obviously can’t feel ankle pain. Or almost certainly. That’d be really weird. It doesn’t have an ankle. Why would it have computations that represent its ankle is feeling bad? But you can imagine maybe some strange form of valenced experience that develops inside some system like this that registers some kind of displeasure or pleasure, something like that.

Luisa Rodriguez: Right, right. Something like, you guessed the wrong set of words to come next and that was bad. And the user isn’t happy with the string of words you came up with. And then that feels something like pain.

Robert Long: Exactly. And I will note that I don’t think that getting negative feedback is going to be enough for that bad feeling, fortunately. But maybe some combination of that and some way it’s ended up representing it inside itself ends up like that.

And then yeah, then we have something where it’s hard for us to map its internals to what we care about. We maybe have various incentives not to look too hard at that question. We have incentives not to let it speak freely about if it thinks it’s conscious, because that would be a big headache. And because we’re also worried about systems lying about being conscious and giving misleading statements about whether they’re conscious — which they definitely do.

Yeah, so we’ve built this new kind of alien mind. We don’t really have a good theory of pain, even for ourselves. We don’t have a good theory of what’s going on inside it. And so that’s like a stumbling-into-this sort of scenario. That’s not yet astronomical.

So one reason I started with the concrete case is I think people who are worried about risks of large-scale and long-term suffering — what are sometimes called s-risks or suffering risks — I think they have scenarios that involve very powerful agents making lots of simulations for various reasons, and those simulations containing suffering. I’ll just refer people to that work, because that’s actually not my bag. I haven’t thought that much about those scenarios.

Luisa Rodriguez: OK, just for my interest, what’s the basic argument for why anyone would want to create simulations with a bunch of suffering in them?

Robert Long: So this is my take, and it might not represent their positions. I think one reason is that you could create simulations because you want to learn stuff. So imagine that we were curious how evolution would go if something had gone slightly differently. And imagine we had like planet-sized computers, so we could just literally rerun like all of evolution down to the details, so that there are like virtual creatures reproducing and stuff. And also suppose that a simulated creature is sentient, which is plausible. Then all you really are looking for is like, at the end, did the simulation output hominids or something? But congratulations, you also have billions of years of animals eating each other and stuff like that.

Luisa Rodriguez: Yeah. OK, right. But it sounds like we could make things for economic reasons, like robots or chatbots, and we don’t realise those things are suffering. And then we mass produce them because they’re valuable. And then the mass production isn’t astronomical in scale, but it’s big, and those things are suffering and we didn’t know it and they’re all over. And we don’t really want to change anything about those systems because we use them.

Robert Long: Yeah. I mean, for just another dark scenario, you can imagine a system where we get pigs to be farmed much more efficiently. And we’re just like, “Well, this has made meat cheaper. Let’s not think too much about that.”

Luisa Rodriguez: Totally. Got it. Yeah, yeah, yeah. OK, are there any other examples you think are plausible here, or are those kind of the main ones?

Robert Long: I guess one thing I should note is I’ve been focusing on this case where we’ve hit on it accidentally. There are a lot of people who are interested in building artificial consciousness.

Luisa Rodriguez: On purpose, yeah.

Robert Long: And understandably so. You know, just from a purely intellectual or philosophical standpoint, it’s a fascinating project and it can help us understand the nature of consciousness. So for a very long time, probably about as old as AI, people were like, “Wow, I wonder if we could make this thing conscious?”

There was a recent New York Times article about roboticists who want to build more self-awareness into robots, both for the intrinsic scientific interest and also because it might make for better robots. And some of them think, “Oh, well, we’re not actually that close to doing that. Maybe it’s too soon to worry about it.” Another person quoted in that article is like, “Yeah, it’s something to worry about, but we’ll deal with it.” And I am quoted in that piece as just kind of being like, “Ahhh, be careful, you know. Slow down. We’re not really ready to deal with this.”

Luisa Rodriguez: OK, so maybe it happens because it’s useful for learning. Maybe it happens because there are some reasons that someone might want to do this intentionally to create suffering. That’s very dark. But then it could also just happen accidentally. All of which kind of terrifies me. And I want to come back to that.

AIs with totally different ranges of experience [00:17:45]

Luisa Rodriguez: I wanted to ask about the flip side of this, which is: not only might AI systems be able to suffer, but they might also be able to experience pleasure. I’m curious how their pleasure might compare to the pleasure that we feel as humans?

Robert Long: The short answer is I think the pleasure or pain — or whatever analogues of that that AI systems could experience — could have a drastically different range than ours. They could have a drastically different sort of middle point.

Luisa Rodriguez: Is there any reason to think the default is that artificial sentience feels pleasure and pain like humans? Or do you think the default is something else?

Robert Long: I basically am agnostic about what the default is. One reason is that, well, let’s first think about why the default is what it is for humans. It’s a very vexing and interesting question. Let’s start with, I think, one of the saddest facts about life, which is that it’s much easier to make someone feel pain than to make them feel really good.

Here’s a dark thought experiment that I actually thought about as preparation for this. Suppose I’m going to give you a billion dollars and a team of people who are experts in all sorts of things, and you have the goal of making someone feel as good as possible for a week. Or imagine a different scenario where I give you the goal of making someone feel as bad as possible for a week. It seems much easier to do the second goal to me.

Luisa Rodriguez: Totally. Yeah. That is really sad.

Robert Long: It seems like in some ways you could still mess up the one week thing. It’s just really hard to make people feel durably good.

Luisa Rodriguez: Totally. Yeah, and the bad is just to like, waterboard them for a week.

Robert Long: Yeah. You took it there, but yeah.

Luisa Rodriguez: Yeah, that’s much easier.

Robert Long: And why is that the case? Why are we creatures where it’s so much easier to make things go really badly for us? One line of thinking about this is, well, why do we have pain and pleasure? It has something to do with promoting the right kind of behaviour to increase our genetic fitness. That’s not to say that that’s explicitly what we’re doing, and we in fact don’t really have that goal as humans. It’s not what I’m up to, it’s not what you’re up to, entirely. But they should kind of correspond to it.

And there’s kind of this asymmetry where it’s really easy to lose all of your expected offspring in one go. If something eats your leg, then you’re really in danger of having no descendants — and that could be happening very fast. In contrast, there are very few things that all of a sudden drastically increase your number of expected offspring. I mean, even having sex — which I think it’s obviously not a coincidence that that’s one of the most pleasurable experiences for many people — doesn’t hugely, in any given go, increase your number of descendants. And ditto for eating a good meal.

Luisa Rodriguez: Right, right. So if there was like, I don’t know, some tree that made it possible to have 20 kids in one pregnancy instead of one, maybe we’d find eating the fruit from that tree especially pleasurable.

Robert Long: Exactly.

Luisa Rodriguez: But there just aren’t that many things like that. And so those things don’t give us very big rewards, relative to the many things that could really mess up our survival or reproduction. Is that basically the idea?

Robert Long: Yeah.

Luisa Rodriguez: Cool. Yeah. I actually have just never thought about that. It makes perfect sense.

Robert Long: Yeah, it’s very schematic, but I do think it is a good clue to thinking about these questions. So what evolution wants for creatures is for pain and pleasure to roughly track those things. I mean, evolution also doesn’t want you to experience agony every time you don’t talk to a potential mate. Like it doesn’t want you to be racked with pain, because that’s distracting and it takes cognitive resources and stuff like that. So that’s another piece of it. It needs to kind of balance the energy requirements and cognitive requirements of that.

I definitely recommend that readers check out work by Rethink Priorities on trying to think about what the range of valenced experiences for different animals are, based on this.

Luisa Rodriguez: Can you give me the rough overview of what they try to do? Like what their approach is?

Robert Long: Yeah. So they’re looking at considerations based on the sort of evolutionary niche that different animals are in. As one thing, there are reasons to expect differences between animals that have different offspring strategies. And then also more direct arguments about like, what are the attentional resources of this animal? Does it have memory in a way that might affect its experiences? Here’s an interesting one: Do social animals have different experiences of pain? Because social animals, it’s very helpful for them to cry out because they’ll get helped. Prey animals have an incentive not to show pain, because that will attract predators.

Luisa Rodriguez: Right. Fascinating. And that might just really lead to big differences in how much pain or pleasure these animals feel.

Robert Long: I think that’s the thought. Yeah.

Luisa Rodriguez: That’s really cool.

Robert Long: It’s really fascinating. I’m sure everyone’s seen a kid that has fallen over and it doesn’t freak out until it knows that someone’s seen it.

Luisa Rodriguez: Yes, yes, true.

Robert Long: And that’s not to say that the pain is different in each case. Like I don’t know and I don’t think anyone knows, but that’s an illustration of the social animal kind of programming.

Luisa Rodriguez: Totally, totally. So I guess by extension, you could think that the kind of selection pressures that an AI system has, or doesn’t have, or something about its environment might affect its emotional range? Is that basically the idea?

Robert Long: Yeah. It’s something like we seem to have some sort of partially innate or baked-in default point that we then deviate from on either end. It’s very tough to know what that would mean for an AI system. Obviously AI systems have objectives that they’re seeking to optimise, but it’s less clear what it is to say its default expectation of how well it’s going to be doing is — such that if it does better, it will feel good; if it does worse, it’ll feel bad.

I think the key point is just to notice that maybe — and this could be a very good thought — this kind of asymmetry between pleasure and pain is not a universal law of consciousness or something like that.

Luisa Rodriguez: Got it. Right. So the fact that humans have this kind of limited pleasure side of things, there’s no inherent reason that an AI system would have to have that cap.

Robert Long: There might be no inherent reason we have to have that cap forever, which is another wonderful thought. There’s this great post by Paul Christiano pointing out that we’re kind of fighting this battle against evolution. Evolution doesn’t want us to find pleasure hacks because it doesn’t want us to wirehead. So that’s maybe one reason why, at a high level, we habituate to drugs.

Luisa Rodriguez: Sorry, wireheading is like some hack to find pleasure that doesn’t actually improve our fitness or something?

Robert Long: Yeah. It means a lot of different things. I was using it loosely to mean that, yeah. That’s maybe why we’re always dissatisfied, right? You know, you’ve got a good job, you’ve got cool friends, you’ve got social status — and eventually your brain’s like, “More. Don’t get complacent.” And, you know, we’ve tried various things to try to work around that and find sustainable ways to boost our wellbeing permanently. Different cognitive techniques. But this post argues we’re kind of fighting like an adversarial game.

Luisa Rodriguez: That’s really interesting. So I guess it’s both kind of, we don’t know where the default point is, and we also don’t know what the upper bound and lower bound might be on pleasure and pain. It might be similar to ours, but many of the pressures that might push ours to be what they are may or may not exist for an AI system. And so they could just be really different.

Robert Long: Exactly.

Luisa Rodriguez: Cool. Yeah. That’s wild. Are there any other kinds of differences between humans and AI systems that might mean AI systems feel more or different kinds of pleasure than humans?

Robert Long: Well, yeah. I mean, one thing I’ll note is that I’m often using bodily pain or the pleasures of status or something as my examples. But it kind of goes without saying — but I’m saying it — that yeah, AIs might not have anything corresponding to that. You know, it would be really weird if they feel like sexual satisfaction at this point.

Luisa Rodriguez: Right, right. Yeah. Makes sense.

Robert Long: Yeah, and you can wonder that we’re venturing into territory where we don’t really know what we’re talking about. But I think you can, in the abstract, imagine valence — “valence” just being a shorthand for this quality of pleasure or displeasure — you can imagine valence, or at least I think I can, that’s about other kinds of things.

Luisa Rodriguez: Yeah, yeah, yeah. To the extent that there are things like goals and rewards and other things going on that motivate an AI system, maybe those things come with valence. And maybe they won’t, but it might make sense for them to.

Robert Long: Exactly.

Luisa Rodriguez: One argument I’ve heard for why there might be a difference in the amount of pleasure and pain AI systems could feel versus humans can feel is just something like humans require lots of resources right now. Like the cost of living and the cost of thriving and flourishing might just be really high.

And I can imagine it just becoming super, super cheap for an AI system, or some kind of digital mind, feeling just like huge amounts of pleasure, but not requiring a bunch of friends and housing and, I don’t know, romantic relationships. Maybe it’s just relatively small computer chips and they just get to feel enormous pleasure really cheaply by like pushing the zero key or something. And so you might think that they could just experience actually loads more pleasure than humans could, at least given the same inputs.

Robert Long: Yeah. And one thing I’ll also note is they could also experience the higher pleasures cheaply too. Like suppose they do require friends and knowledge and community and stuff. Maybe it’s just a lot cheaper to give that to them too. Then there’s also cases, like you said, where maybe they have some sort of alien pleasure and we’re just like turning the dial on that.

I mentioned the other case because I think a lot of people would be wary of finding it valuable that you’re just cranking the dial on maybe some “lower pleasure” or “uninteresting pleasure.” But even more interesting pleasures could be a lot cheaper. It’s cheaper for them to achieve great things and contemplate the eternal truths of existence and have friends and stuff like that.

Luisa Rodriguez: Right. And that could just be some basic thing like it’s easier to make more silicon things than it is to build houses, farm food, build cities, et cetera. Like you could just have computer farms that allow AI systems to have all the same experiences and maybe better ones. But it might just cost less.

Robert Long: Yeah, that scenario is possible. And I will go ahead and disclaimer: I don’t think that much about those scenarios right now. And I’m also not like, “Build the servers, go!” — you know, given how fraught and in the dark we are about these questions, both morally and empirically.

Luisa Rodriguez: Totally.

Robert Long: But yes, I think it is possible. Another Black Mirror episode, which I think is maybe my favourite, is “San Junipero.” Have you seen that one?

Luisa Rodriguez: I have, yeah. Do you want to recap it?

Robert Long: Sure. Yeah, this one’s set in the like somewhat near future, and this civilisation seems to have cracked making realistic simulations. It’s possible for people to go in those simulations while they’re alive. It’s also possible for them to be transferred to them when they die. And it’s one of the rare Black Mirror utopias — spoiler alert, before you continue listening.

Yeah, the protagonist of the episode ends up in a very great situation at the end of the show. She ends up being able to live with this woman she loves in this cool beach town. And what I love about the episode is it ends with this happy ending in like digital utopia. And then the last shot is this robot arm putting her little simulation in this huge server bank, and you see that it’s just like this entire warehouse of simulations.

Luisa Rodriguez: Right, yeah.

Robert Long: Did you like that episode?

Luisa Rodriguez: Yeah, I think it’s stunning, really moving. Why is it your favourite?

Robert Long: I think it’s my favourite because there’s this parody of Black Mirror, which is like, “What if phones, but bad?” And sometimes it does veer into this kind of cheap dystopia — which is not to say I’m not worried about dystopias — but it’s just like, “What if Facebook, but plugged directly into your brain?”

Holden Karnofsky has a great post about why it’s hard to depict utopias and hard to imagine them in a compelling way for viewers. And this seems to have, at least for me, solved that problem. It’s not the best possible future, but it’s a good one.

Luisa Rodriguez: Cool. Yeah. Any other differences that you think are relevant to the kinds of pleasure or the amount of pleasure that AI systems might feel relative to humans?

Robert Long: Yeah. Now might be a good time to talk about sort of a grab-bag of perplexing issues about artificial minds. So there’s all these philosophical thought experiments about, like, “What if people were able to split in two and you make two copies of them — which one is really them?” Or “What if two people merged? What do we say about that case?” And I think those are cool thought experiments.

AIs are a lot easier to copy and a lot easier to merge. So it could be that we could have real-life examples of these kinds of philosophical edge cases and things that have sort of distributed selfhood or distributed agency. And that of course would affect how to think about their wellbeing and stuff, in ways that I find very hard to say anything meaningful about, but it’s worth flagging and worth people thinking about.

Luisa Rodriguez: Totally. Right. So with copies, it’s something like, does each copy of an identical digital mind get equal moral weight? Are they different people? And if they’re both happy, is that like twice as much happiness in the world?

Robert Long: Yeah. I mean, I’m inclined to think so.

Luisa Rodriguez: I am too.

Robert Long: Yeah. There’s a paper by Shulman and Bostrom called “Sharing the world with digital minds.” And yeah, that thinks about a lot of the sort of political and social implications of cases like this, which I haven’t thought that much about myself, but there would be interesting questions about the political representation of copies. Like, before there’s some vote in San Francisco, we wouldn’t want me to be able to just make 20 of me and then we all go vote.

Luisa Rodriguez: Totally. Yeah. I mean, I don’t know if there are 20 of you, and you all —

Robert Long: Right, you also don’t want to disenfranchise someone. “Well, you’re just a copy. So you know, your vote now counts for 1/20 as much.”

Luisa Rodriguez: Yeah, yeah, yeah. I mean, do you have a view on this? I think I do have the intuition that it’s bad, but when I look at it, I’m like, “Well, no, there are just 12 Robs who are going to get 12 Robs’ worth of joy from a certain electoral outcome.” And like, that’s bad if there are only 12 Robs because you’re really rich. But I don’t hate the idea that there might be more Robs and that you might get 12 more Robs’ worth of votes.

Robert Long: Yeah, I don’t have strong views about this hypothetical of copying and political representation. But it does seem like you would probably want rules about when you’re allowed to copy, because in the runup to an election, you don’t want an arms race where the digital population of San Francisco skyrockets because everyone wants their preferred candidate to win.

Luisa Rodriguez: Yeah. I guess also if you have to provide for your copies, if you have to split resources between your copies, you might even kill your copies afterward. You might delete them because you’re like, “I can’t afford all these copies of myself.”

Robert Long: Yeah, “Thanks for the vote.” But of course, if I feel that way, then necessarily all the copies do as well.

Luisa Rodriguez: So they feel like they also don’t want to share resources and are happy to let one of you live, you mean?

Robert Long: Well, they’re certainly not going to be deferring to the “original me” because they all feel like the original me.

Luisa Rodriguez: Right, right, right. And so let’s say the original you does keep power somehow. It somehow has the power to delete the other copies.

Robert Long: Yeah, they’ll all feel like the original me. That’s another thing.

Luisa Rodriguez: Well, they will feel like it, but they might not actually be able to click the button to delete the copies. But maybe the original you can.

Robert Long: Right. Yeah.

Luisa Rodriguez: And then you’re murdering 11 people.

Robert Long: I mean, not me. You know, I wouldn’t do this. You might do that. You would be murdering 11 Luisas.

Luisa Rodriguez: I’m planning right now. I’m scheming. I’m like, “Ooh, sounds like a great way to get the election outcomes I want.” Yeah, how much does a merging thought experiment apply? Or how relevant is it?

Robert Long: I guess I mostly mentioned the merging case because it’s part of the canonical battery of thought experiments that are supposed to make personal identity seem a little less deep or kind of perplexing if you really insist on there always being some fact of the matter about which persons exist and not. And just like splitting, it’s like something that seems like it could happen.

Luisa Rodriguez: Yeah. So maybe you, after this election, try to merge your 11 copies back with yourself. Then what does that mean?

Robert Long: Yeah, like does that thing now still deserve 12 votes or something?

Luisa Rodriguez: Right, right. Yeah. OK, interesting. Yeah, I’ve never thought about that before.

Moral implications of all this [00:36:42]

Luisa Rodriguez: So I guess I feel like there are some reasons to think that AI systems, or digital minds more broadly, might have more capacity for suffering, but they might also have more capacity for pleasure. They might be able to experience that pleasure more cheaply than humans. They might have a higher pleasure set point. So on average, they might be better off.

Yeah, I guess you might think that it’s more cost effective: you can create happiness and wellbeing more cost effectively to have a bunch of digital minds than to have a bunch of humans. How do we even begin to think about what the moral implications of that are?

Robert Long: I guess I will say — but not endorse — the one flat-footed answer. And, you know, red letters around this. Yeah, you could think, “Let’s make the world as good as possible and contain as much pleasure and as little pain as possible.” And we’re not the best systems for realising a lot of that. So our job is to kind of usher in a successor that can experience these goods.

I think there are many, many reasons for not being overly hasty about such a position. And people who’ve talked about this have noticed this. One is that, in practice, we’re likely to face a lot of uncertainty about whether we are actually creating something valuable — that on reflection, we would endorse. Another one is that, you know, maybe we have the prerogative of just caring about the kind of goods that exist in our current way of existing.

So one thing that “Sharing the world with digital minds” mentions is that there are reasons to maybe look for some sort of compromise.

Luisa Rodriguez: Can you explain what that would look like?

Robert Long: Yeah, one extreme position is the 100% “just replace and hand over” position.

Luisa Rodriguez: That’s like all humans just decide voluntarily to give up their stake in the resources in the world. And they’re just like, “Digital minds will be happier per tree out there, and so let’s give them all the trees and all the things.”

Robert Long: “Our time is done.” Yeah.

Luisa Rodriguez: “You take it from here.”

Robert Long: Yeah. The other extreme would be like, “No. Humans forever. No trees for the digital minds.” And maybe for that reason, don’t build them. Let’s just stick with what we know.

Luisa Rodriguez: Don’t build artificial sentience? Or don’t build a utopia of digital minds?

Robert Long: A utopia that’s too different from human experience. Then one thing you might think is that you could get a lot of what each position wants with some kind of split. So if the pure replacement scenario is motivated by this kind of flat-footed total utilitarianism — which is like, let’s just make the number as high as possible — you could imagine a scenario where you give 99% of resources to the digital minds and you leave 1% for the humans. But the thing is — I don’t know, this is a very sketchy scenario — 1% of resources to humans is actually a lot of resources, if giving a lot of resources to the digital minds creates tonnes of wealth and more resources.

Luisa Rodriguez: Right. So is it something like digital minds, in addition to feeling lots of pleasure, are also really smart, and they figure out how to colonise not only the solar system but like maybe the galaxy, maybe other galaxies. And then there’s just like tonnes of resources. So even just 1% of all those resources still makes for a bunch of humans?

Robert Long: Yeah. I think that’s the idea, and a bunch of human wellbeing. So on this compromise position, you’re getting 99% of what the total utilitarian replacer wanted. And you’re also getting a large share of what the “humans forever” people wanted. And you might want this compromise because of moral uncertainty. You don’t want to just put all of your chips in.

Luisa Rodriguez: Right. Go all in.

Robert Long: Yeah. And also maybe to prevent some kind of conflict. And also for like democratic cooperative reasons. Like I would be surprised if most people are down for replacements. I think that life should be definitely respected. And it also might be right. So that’s the case for this compromise view.

Luisa Rodriguez: Yeah. I mean, it sounds really great. And it just sounds almost too good to be true to me. And some part of me is like, “Surely it’s not that easy.” It just feels very convenient that we can have it all here. I mean, it’s not having it all for both, but it’s like having the majority of it all for both humans and digital minds.

Robert Long: Well, I feel like cooperation does enable lots of scenarios like that. People really can get most of what they want. I should say I’m basically recapping an argument from “Sharing the world with digital minds.” This is not something I have thought that much about. I think it’s really important to think about these big questions about the future of artificial sentience, but my focus has been on issues that are more concrete and come up today.

Is artificial sentience even possible? [00:42:12]

Luisa Rodriguez: Exploring this a bit more deeply, why does anyone think that artificial sentience is even possible?

Robert Long: Yeah, this is a great question. I think the very broadest case for it, or the very broadest intuition that people have, is something like we know that some physical systems can be conscious or sentient. Like ones made out of neurons can be: the ones on either end of this recording, and also listening in. And you could have a view where something has to be made out of neurons, it has to be made out of biological material, in order to be conscious.

One reason that people think artificial minds could also be conscious is this kind of broad position in philosophy and cognitive science called functionalism, which is this hypothesis that the very lowest-level details, or like the substrate that you’re building things out of, ultimately won’t matter. And the sort of things that are required for consciousness or sentience could also be made out of other stuff. So one version of this is thinking that it’s the computations that matter. It’s the computations that our brains are doing that matter for what we experience and what we think about.

Luisa Rodriguez: Sorry, what do you mean by computation?

Robert Long: That’s a great question that can go into the philosophical weeds. But maybe a rough approximation is like patterns of information processing, that’s a way you could think about it. So you can describe what your brain is doing — and also think that your brain is in fact doing — as certain patterns of information processing. There are theories by which what certain parts of your brain are doing are computing a function: taking input and processing it in a certain way so as to get a certain output. So you can think of your visual system as taking in a bunch of pixels or something like that, and from that, computing where the edges are.

Luisa Rodriguez: Right. OK. So really simplistically, and maybe just not true at all, but it’s something like when you smell a food that smells good, maybe you get kind of hungry. And the computation is like, “Get the input of a nice, yummy-smelling food, and feel some hunger is the output.” Or maybe it’s like, “Feel this thing called hunger and then search for food in the fridge.”

Robert Long: Yeah. It would definitely be more complicated than that, but it is something like that. Like you’re taking in inputs and doing stuff with them. One thing I might add at this point, although maybe this is too in the weeds, I think when people say something like, “You need the right computations for consciousness,” they’re not just talking about the right mapping between inputs and outputs. They’re also talking about the internal processing that’s getting you from input to output.

So here’s an example. There’s this famous case by Ned Block, also one of my advisers, who pointed out that you could have something that has this big lookup table where the input is a given sentence and then for every given sentence it has a certain output of what it should say. And it doesn’t do anything else with the sentences. It just goes to the right column of its lookup table.

Of course such a thing would not be feasible at all. But a lot of people have the intuition that that way of getting from input to output is not the right sort of thing that you would want for consciousness or sentience.

Luisa Rodriguez: Right. So like if the lookup table had an input when you receive input hunger, and the looked-up value was “Eat an apple,” that would not be the same thing as when you receive the input hunger, maybe subconsciously think about the nutrients you might need and then go find a thing that will like meet that need. Sorry, this may be a terrible example.

Robert Long: I think it’s a good example. It’s just pointing at the fact that the path you’re taking internally matters. And yeah, I will point out, as I think you realise, that it wouldn’t be describable in such a way and the computations would be extremely fine grained and complex, and you couldn’t like write them down on a piece of paper. But the general gist is correct.

Luisa Rodriguez: Yeah. Is there a principled reason why you couldn’t write them down on paper?

Robert Long: I guess there’s not a principled reason. I think of that as more of an empirical observation that in fact what our brains are doing is pretty complex. But that’s also an open question. I think in the early days of AI, people were kind of optimistic — and this goes for things with intelligence as well as consciousness — that there would be these really simple principles that you could write down and distil. That doesn’t seem to be what we’ve learned about the brain so far or the way that AI has gone. So — and we’ll get to this later — I do suspect that our theory of consciousness might involve quite a bit of complexity.

Luisa Rodriguez: Yeah. Cool. OK, so I took you way off track. So you’re saying that there’s this idea called functionalism where basically it’s like the functions that matter — where all you need is certain computations to be happening or possible, in order to get something like sentience. Is that basically right, or was there more to it?

Robert Long: Yeah, that’s basically right. Computationalism is a more specific thesis about what the right level of organisation or what the right functional organisation is. It’s the function of performing certain computations. Does that make sense?

Luisa Rodriguez: I think so. Maybe I’ll make sure I get it. So the argument is that there’s nothing special about the biological material in our brain that allows us to be conscious or sentient. It’s like a particular function that our brain serves, and that specific function is doing computations. And those computations are the kind of underlying required ability in order to be sentient or conscious. And theoretically, a computer or something silicon-based could do that too.

Robert Long: Yeah. I think that’s basically right.

Replacing neurons one at a time [00:48:21]

Luisa Rodriguez: So that’s the basic argument. What evidence do we have for that argument?

Robert Long: Yeah, I’ll say that’s like the basic position. And then why would anyone hold that position? I think one thing you can do is look at the way that computational neuroscience works. So the success of computational neuroscience — which is kind of the endeavour of describing the brain in computational terms — is like some evidence that it’s the computational level that matters.

And then there are also philosophical arguments for this. So a very famous argument or class of arguments are what are called replacement arguments, which were fleshed out by David Chalmers. And listeners can also find when Holden Karnofsky writes about digital people and wonders if they could be conscious or sentient, these are actually the arguments that he appeals to. And those ask us to imagine replacing neurons of the brain bit by bit with artificial silicon things that can take in the same input and yield the same output. And so by definition of the thought experiment, as you add each one of these in, the functions remain the same and the input/output behaviour remains the same.

So Chalmers asked us to imagine this happening, say to us, while this podcast is happening. So yeah, by stipulation, our behaviour won’t change and the way we’re talking about things won’t change and what we’re able to access in memory won’t change. And so at the end of the process, you have something made entirely out of silicon, which has the same behavioural and cognitive capacities as the biological thing. And then you could wonder, well, did that thing lose consciousness by being replaced with silicon? And what Chalmers points out is it would be really weird to have something that talks exactly the same way about being conscious — because by definition, that’s like a behaviour that remains the same — and has the same memory access and internal cognition, but like their consciousness left without leaving any trace of leaving. He thinks this would be like a really weird dissociation between cognition and consciousness.

And one reason this argument kind of has force is a lot of people are pretty comfortable with the idea that at least cognition and verbal behaviour and memory and things like that can be functionally, multiply realised. And there’s an argument that if you think that it would be kind of weird if consciousness is this one exception where the substrate matters.

Luisa Rodriguez: So I think the idea is something like, if you had a human brain and you replaced a single neuron with, I guess, a silicon neuron that performed the exact same function. And is the reason we think that’s like a plausible thing to think about because neurons transmit electricity and they’re kind of on/off switchy in maybe the same way that computers are? Is that it?

Robert Long: Yeah, this is an excellent point. One weakness of the argument, in my opinion, and people have complained about this, is it kind of depends on this replacement being plausible. Or it seems that way to people. In the paper, there’s actually a note on, “Well, you might think that actually in practice, this is not something you could do.” And obviously we could not do it now. And for reasons I don’t entirely understand, that’s not really supposed to undermine the argument.

Luisa Rodriguez: Huh. All right. Well, maybe coming back to that. Is it basically right though that we think of a neuron and a computer chip as analogous enough that that’s why it’s plausible?

Robert Long: Yeah. We think of them as being able to preserve the same functions. And I mean, I think there is some evidence for this from the fact that artificial eyes and cochlear implants work. Like we do find that computational things can interface with the brain and the brain can make sense of them.

Luisa Rodriguez: Interesting.

Robert Long: That’s not like a decisive argument. People who are kind of not on board with this kind of computational way of thinking of things would probably not give up when faced with that.

Luisa Rodriguez: It wouldn’t be convincing to them, yeah. And sorry, I actually don’t know how artificial eyes work. Is it like there’s an eye made of some things that are non-biological and they interface with the brain in some way that allows people to see?

Robert Long: I also don’t really know. I definitely know that’s possible with cochlear implants.

Luisa Rodriguez: OK. I mean, I’m interested in that one too then, but that’s basically like they connect through like… I’m picturing like wires going from a hearing aid into the brain. I’m sure that’s not quite right, but it sounds like it’s something like they communicate. And that’s some evidence that we can feed electricity through silicon-based things to the brain and communicate with it.

Robert Long: Yeah, I don’t think it’s reaching into the brain. It might be doing the right stuff to your inner ear.

Luisa Rodriguez: To your ear. Right. OK, yeah. That makes sense. So maybe you think a neuron could be replaced with a silicon-based version of it. And every time you make that replacement, you work the same way, your brain does the same things, nothing about your behaviour or thoughts change?

Robert Long: Yeah. So maybe it’s good to start with the first replacement. If the first replacement is possible, I don’t think anyone would think, “Oh no, you have now destroyed Luisa’s consciousness. Now she’s like a walking zombie.” And then this is a common argument form in philosophy: two doesn’t seem like it would make the difference, right? And then so on and so forth.

Luisa Rodriguez: And then eventually you replace all my neurons with the silicon prosthetic neurons. And then I have an entirely silicon-based brain, but there’s no reason to think I wouldn’t feel or think the same things. Is that basically it?

Robert Long: That’s the idea. It’s if you did think that you don’t feel the same things, it’s supposed to be really counterintuitive that you would still be saying, “This worked. I’m still listening to Rob talk. I’m still seeing colours.” You would still be saying that stuff since that’s like a behavioural function. Yeah, that’s the basic thrust. So then that’s at least one silicon-based system that could be conscious. So that kind of opens the door to being able to do this stuff in silicon.

Luisa Rodriguez: Right. It feels very similar to the ship that has all of its planks replaced one by one. And at the end you’re asked if it’s still the same ship.

Robert Long: Yeah, it is similar. This sort of thing shows up a lot in philosophy. As I said, it’s like an old trick. Listeners might recall the podcast with Alan Hájek. He has all of these great examples of argument patterns that you can use in philosophy and you can apply to different domains. You can think of this as an application of a gradual replacement or bit-by-bit kind of argument in philosophy.

One thing I would like to say, and maybe I’m qualifying too much, but full disclaimer: I think a lot of people are not super convinced by this argument. Gualtiero Piccinini is an excellent philosopher who thinks about issues of computation and what it would mean for the brain to be computing, and I think he’s sympathetic to computationalism, but he thinks that this argument isn’t really what’s getting us there. I think he relies more on that point I was saying about, well, if you look at the brain itself, it does actually look like computation is a deep or meaningful way of carving it up and seeing what it’s doing.

Luisa Rodriguez: Right, right. And so if you could get the right computations doing similar things, or doing things that make up sentience, then it doesn’t matter what’s doing it. What reasons do people think that that argument doesn’t hold up?

Robert Long: Well, for one thing, you might worry that it’s sort of stipulated what’s at issue at the outset, which is that silicon is able to do all the right sort of stuff. So there’s this philosopher of biology and philosopher of mind called Peter Godfrey-Smith — who would be an excellent guest, by the way; he’s written a book about octopus minds — and he has a line of thinking where functionalism in some sense is probably true, but it’s not clear that you can get the right functions if you build something out of silicon. Because he’s really focused on the low-level biological details that he thinks might actually matter for at least the kind of consciousness that you have. And that’s sort of something I think you can’t really settle with an argument of this form.

Luisa Rodriguez: Yeah. Can you settle it?

Robert Long: So I actually have sort of set aside this issue for now — funnily enough, since it’s like the foundational issue. And I’ll say why I’m doing that. I think these debates about multiple realisability and computationalism have been going on for a while. And I’d be pretty surprised if in the next few decades someone has just nailed it and they’ve proven it one way or the other.

And so the way I think about it is I think it’s plausible that it’s possible in silicon to have the right kind of computations that matter for consciousness. And if that’s true, then you really need to worry about AI sentience. And so it’s sort of like, let’s look at the worlds where that’s true and try to figure out which ones could be conscious.

And it could be that, you know, none of them are because of some deep reason having to do with the biological hardware or something like that. But it seems unlikely that that’s going to get nailed anytime soon. And I just don’t find it crazy at all to think that the right level for consciousness is the sort of thing that could show up on a silicon-based system.

Luisa Rodriguez: Are there any other arguments for why people think artificial sentience is possible?

Robert Long: This is related to the computational neuroscience point, but one thing people have noticed is that a lot of the leading scientific theories of what consciousness is are in computational terms, and posit computations or some other sort of pattern or function as what’s required for consciousness. And so if you think they’re correct in doing so, then you would think that it’s possible for those patterns or computations or functions being made or realised in something other than biological neurons.

Biological theories [00:59:14]

Luisa Rodriguez: Does anyone disagree on this? Do some people just think artificial sentience is not possible?

Robert Long: Yeah, so there are these views — “biological theories” maybe you can call them. Ned Block is one of the foremost defenders of this biological view — that consciousness just is, in some sense, a biological phenomenon. And you won’t be capturing it if you go to something too far outside the realm of biological-looking things. John Searle is also a proponent of this view.

So there’s views where that’s definitely true, and it’s just like what consciousness is. There’s also views on which consciousness is something functional, but also you’re not going to be able to get it on GPUs or anything like what we’re seeing today. And those are kind of different sorts of positions. But it should be noted that plenty of people who’ve thought about this have concluded that you’re not going to get it if you have a bunch of GPUs and electricity running through them. It’s just not the right sort of thing.

Luisa Rodriguez: So the first argument is like: there’s something really special about biology and biological parts that make whatever consciousness and sentience is possible. And the other argument is like: it’s theoretically possible, but extremely unlikely to happen with the technology we have, or could create, or something?

Robert Long: Yeah. For that second position, most people will hold some version of that position with respect to Swiss cheese. Like I would be really surprised if very complicated arrangements of Swiss cheese ended up doing these computations. Because it’s just like, it’s not the right material to get the right thing going. Even if you think it is multiply realisable, you don’t have to think that you could feasibly do it in any sort of material at all.

One thing I’ll add — since I am being very concessive to a range of positions, which I think is appropriate — I would like to note that large numbers of philosophers of mind and consciousness scientists in surveys say artificial sentience is possible; machines could be conscious. I don’t have the exact numbers off the top of my head, but David Chalmers has this great thing, the PhilPapers survey, it asked people this question. It’s not like a fringe view. A substantial share of philosophers of mind think that artificial sentience is possible and maybe plausible. And ditto surveys of consciousness scientists.

Luisa Rodriguez: Yeah, yeah. We’ll stick those in the show notes.

Illusionism [01:01:49]

Luisa Rodriguez: So it sounds like there’s a couple of counterarguments that are about biology, and just like what’s possible with silicon and GPUs as building blocks for entities. Are there any other counterarguments people think are plausible for why artificial sentience might not be possible?

Robert Long: Yeah, one thing that might be worth mentioning is that I’m going to be doing this interview where I talk about consciousness and sentience as things where we know what we’re talking about, and we know what we’re looking for, and it is this phenomenon that we can wonder about.

There is a position in philosophy called illusionism, which holds that consciousness is kind of a confused concept, and it doesn’t actually pick anything out. So on that view, it’s straightforwardly false that AIs could be conscious. It’s also false that in a certain sense of the word, humans are conscious.

Luisa Rodriguez: Right. Can you explain the view of illusionism?

Robert Long: So illusionists hold that this concept of subjective experience — or what it’s like to be having a certain experience — even though a lot of people now find it intuitive, illusionists would argue that actually it’s kind of a philosopher’s intuitive notion, and not that deep. I think they would argue that. But it doesn’t refer to anything, actually. It’s kind of incoherent or fails to pick anything out.

This is a popular example in philosophy. People used to wonder about phlogiston, which I think was this substance that was going to explain fire. And they would talk about it and look for it. But ultimately, it’s just not part of our ontology. It’s not part of our worldview.

And illusionists think consciousness will end up being like that, on reflection. That we’ll ultimately have a lot of functions and ways of processing information and behavioural dispositions and maybe representations of things. But this question of which of them are conscious, which of them have subjective experience, ultimately won’t be a meaningful one.

Luisa Rodriguez: OK, right. So I guess if you don’t think humans or nonhuman animals are conscious to any degree, it’s not a meaningful question to ask whether artificial intelligence is sentient.

Robert Long: In a certain sense of the word, yeah. What they deny is what philosophers have called phenomenal consciousness, which is used to pick out whether there’s something it’s like to be something, or whether it has subjective experience, or this kind of subjective quality to its mental life. They don’t deny that things are conscious in the sense that they might process information in certain ways and sometimes be globally aware of that information. They don’t deny that things feel pain, for example, but they deny this way of construing it in terms of subjective experience.

Luisa Rodriguez: OK. I mean, that doesn’t seem that damning for artificial sentience, I guess. As long as you think that they can still feel pain, and if you think that’s morally significant, then artificial sentience could maybe feel the same thing and that would still be morally significant.

Robert Long: Yeah. So this is roughly my position. And I think it’s the position of Keith Frankish, one of the leading exponents of illusionism. I was talking to him on Twitter the other day, and I asked him, “What do you think about people who are looking for animal sentience? Is that an entirely misguided quest, on illusionism?” And his answer is no. He rightly thinks, and I agree, that even if you’re an illusionist, there are going to be mental phenomena or information phenomena that matter, and you’re going to want to look for those. You won’t be looking for maybe quite the same thing that you think you are if you’re a realist about consciousness.

And I think that’s a very important lesson. In the kind of circles we run in, a lot of people are very sympathetic to illusionism. And occasionally I hear people say, “There’s no question here, or it’s a meaningless question.” And that might be true for phenomenal consciousness. But I just want to point out there are scores of extremely meaningful and vexing questions, even if you’re an illusionist. And I would still like a theory of what sort of things feel pain in the illusionist sense — or have desires, or whatever it is that we, on reflection, think matters morally.

Luisa Rodriguez: Right, right. So is it basically like some people think that the kind of consciousness I think I’m experiencing might not be a meaningful concept or thing? Like I might not actually be experiencing that. I have the illusion of experiencing it, but there is no sense in which I actually, truthfully, really am. But I still feel like I feel pain, and I still don’t like that. And that in itself is still morally significant, even if something called consciousness is happening, underlying that pain or whatever?

Robert Long: Yeah, that’s one position you could have. You could think that being disposed to judge that you have phenomenal consciousness, that matters morally. I think a more plausible position you could have is it doesn’t matter if you have whatever cognitive illusion makes philosophers think phenomenal consciousness is real. It could also just be if you feel pain in this functionally defined sense that that matters, or if you have desires that are thwarted or preferences that are thwarted.

There’s really excellent work by François Kammerer, who’s another illusionist, trying to see what value theory looks like and questions about animal sentience and animal welfare look like on the illusionist picture. I think it’s a very underexplored issue and an extremely important issue. So put that in the show notes too.

Luisa Rodriguez: Nice. OK, where do you personally come down on artificial sentience, and whether it’s possible?

Robert Long: I think I’m like 85% that artificial consciousness or sentience — and here’s a real wiggle: “or something in that vicinity that we morally care about” — is possible.

Luisa Rodriguez: That makes sense to me. That’s pretty high.

Robert Long: That’s like ever, and in principle.

Would artificial sentience systems matter morally? [01:08:09]

Luisa Rodriguez: Sure. So I guess if that’s right, and artificial sentience is possible and if it ends up existing, can you walk me through the case that it definitely matters morally?

Robert Long: Yeah. It’s almost hard to give a thought experiment or an argument for the claim that suffering matters. I think that “suffering matters” is something where common sense and the majority of philosophers agree, which doesn’t always happen. So like Jeremy Bentham has this famous and oft-quoted passage — oft-quoted by animal rights and animal welfare people, among others — where he says that the question about animals is not if they can reason or if they can talk; it’s whether they can suffer. And it doesn’t seem like there’s any other boundary that seems like the right boundary of moral concern.

Now, as we’ve noted, you can have quibbles about what suffering actually is and if it involves phenomenal consciousness and things like that. But yeah, it’s just extremely intuitive that if something feels bad for something — and maybe you also add that it doesn’t want it, and it’s trying to get away from it — that matters morally. And that sort of thing should be taken into account in our moral decision making.

One thing I’d like to add is that there’s a position on which that’s all that matters, and the only things that are good and bad for things are experiences of pleasure and displeasure. That’s not a consensus view at all. But even among people who think that other things matter — like knowledge or friendship or justice or beauty — they still also think that experiencing pain is really bad.

Luisa Rodriguez: Right. Yeah, that makes sense.

Robert Long: The other main alternative for this focus on experiences of pain or experiences of pleasure is a focus on desires and preferences, and whether those are being satisfied. So that’s a big debate in debates of like what welfare is, or what makes things go well or badly for something. And it’s also a debate in what sort of things are moral patients — the sort of things that are in the scope of moral consideration.

And I would like to note a position on which what ultimately matters is not pain or pleasure, but desires. And desires seem like they’re much easier to define in this functional way that maybe doesn’t make reference to consciousness, and that might be in some ways easier to get a grip on than consciousness. That’s the position of François Kammerer, who has a paper about how we should think about welfare, if we don’t really believe in consciousness.

I find those issues very difficult to tease apart. Shelly Kagan has this apt remark that in human life, our experiences and our desires are so tightly linked that it can be really hard to be like, “Is it bad that I’m in pain? Or is it bad that I don’t want to be in pain?” Those just seem really hard to tease apart conceptually.

Luisa Rodriguez: Yeah, I mean, can I imagine being in pain and not not wanting to be in pain?

Robert Long: So there are cases where people have the sensory experience of pain, but report not minding it. So they can fully feel that their skin is being pinched or something like that. But they’re like, “Yeah, but it’s just not bad.” That’s called pain asymbolia, and it’s a fascinating condition. And there’s a lot of philosophical work which is like, “Was that really pain? Are they lacking some unpleasant quality to the pain? And that’s why they don’t mind it? Could you really have that unpleasant quality and not mind it?”

One thing I can say — that pain asymbolia does seem to many people to have shown — is that there’s a surprising dissociation between the way you process the sensory information about pain, and then this like affective, “felt unpleasantness” thing. I think there are differences in the brain in terms of how those are processed, which is why things like this are possible.

Luisa Rodriguez: Yeah, that’s interesting. OK, so it sounds like philosophers would basically mostly agree that if AI systems are feeling something like pleasure or pain, that just probably matters morally. Does that basically sound right?

Robert Long: That sounds right to me. And if it’s not, it should be.

Where are we with current systems? [01:12:25]

Luisa Rodriguez: OK, great. So where are we with current systems on this? I guess there’s been some public conversation around current large language models being sentient. There was a whole thing there that we could talk about. But just from the ground up, what do you think about where we are?

Robert Long: The short answer is, after thinking about a lot of current theories of consciousness and how large language models work, I think it is quite unlikely that they have conscious experiences of the kind that we will morally care about. That is subject to a lot of uncertainty, because there is so much we don’t know about consciousness and how they work. I can definitely say there’s not a straightforward case where you’re like, “Here’s what consciousness is, and here’s how large language models have it.”

And I also think I would be quite surprised if large language models have developed pleasurable and displeasurable experiences. You know, that they’re having a really bad time — they don’t like writing poetry for us, and we have stumbled into a catastrophe here.

I’m glad that people are actually raising the issue. It’s good practice for future things, and there is also the small chance that we have. And in general, part of what I try to do is just get people thinking about it, and provide pointers for ways of having as evidence-based conversations as possible. Because as listeners will have noted, it’s very easy for it to descend into Twitter madness and complete freeform speculation.

Luisa Rodriguez: Yeah. I guess that was arguably the case with LaMDA, which we can talk about. But first, just kind of clarifying: there are a bunch of different kinds of AI systems that exist right now. Which ones seem most likely to be sentient?

Robert Long: I would be somewhat surprised if large language models are the most likely current systems.

Luisa Rodriguez: And those are things like GPT-3 or ChatGPT, right?

Robert Long: And LaMDA.

Luisa Rodriguez: And LaMDA, of course.

Robert Long: And yeah, I can say more about why I think that. That will probably be getting into the substance of this investigation.

Luisa Rodriguez: Well, do you mind starting by telling me what other systems are plausibly… like, we even want to be asking the question of are they sentient? That they’re like, plausibly closer?

Robert Long: Yeah, there’s at least things that seem to do more human-like or agent-like things. And I think that can maybe put us closer to things that we could meaningfully call pain or pleasure or things like that.

Luisa Rodriguez: Like what?

Robert Long: So there are virtual agents that are trained by reinforcement learning and which navigate around like a Minecraft environment. There are things that incorporate large language models, but do a lot more than just answer text inputs. You can plug large language models into robots, and it’s really helpful for the way those robots plan. That’s a really cool line of research. There’s obviously just robots, and I would like to look more into just actual robots, which sometimes get a bit of short shrift, even though it’s like the canonical sci-fi thing.

Luisa Rodriguez: And robots, like we’re literally talking about things in Star Wars? What’s the closest thing to that that we have right now? Like what’s the smartest or most impressive robot?

Robert Long: Yeah, I was not being modest when I was like, “I need to look more into that.” I’m really not up on the state of the art. The first thing I want to look at is people who explicitly want to try to build more self-awareness into robots. I definitely want to see how that’s going. Make sure you know what you’re going to do if you have a robot that can feel pain. Are we ready for that as a society? And another thing about robots is that it would be more straightforward to maybe see how they feel pain.

Luisa Rodriguez: Totally. They have physical bodies.

Robert Long: Because they have bodies, and they’re trying to train them to protect their bodies and sense damage to them and things like that.

Luisa Rodriguez: Right, right. Yeah, that makes a lot of sense.

Large language models and robots [01:16:43]

Luisa Rodriguez: You mentioned a line of research on feeding in large language models into robots and that having an impact on how well they plan. Is there more you can say about that? It sounds like it might just be a really interesting topic.

Robert Long: Yeah, the cool factoid — which I can’t probably technically elaborate on that much — is that my understanding is that large language models have to learn all kinds of abstract representations in the course of learning to predict next words. And those representations just seem to be very useful for agents that want to decompose plans into sub-actions. It’s an astonishing fact from a certain point of view that the kind of things learned by large language models would so straightforwardly — and I think without that much tweaking — end up helping with other agents. But it’s true.

Luisa Rodriguez: Is there a specific robot you have in mind with a specific set of goals? I’m not totally sure I understand what plans we’re talking about and how they’re deconstructing them or whatever.

Robert Long: Yeah, we can find the real paper and link to it in the show notes. The epistemic status of this is half-remembering some slides from a lecture that I saw at a reinforcement learning conference. I think it was a virtual agent and it was doing things like “Fill up a cup of coffee in the kitchen,” and then decomposing that into, “OK, get the cup, put it on the counter…”

Luisa Rodriguez: Right. OK, that is wild. So you have an agent where “Get some coffee” is the goal. And then you give it access to a large language model. And the thing is like, “How do I do that?” And then the large language model helps it be like, “Here are the steps. You go to the kitchen, you pull a cup from the cupboard…” or whatever. Is that basically it?

Robert Long: I think it’s not that direct kind of querying. In some vague way, that I would have to read the paper to know, it has that in its machinery somehow — representations and knowledge of the large language model.

Luisa Rodriguez: Got it. And the baseline was worse at planning, but then when you feed it into the whatever processor algorithm, it gets much better at it.

Robert Long: Yeah. My understanding is that decomposing plans into sub-plans has always been a very hard problem. If you think about all the different ways that there are to fill up a cup of coffee, I mean, there’s like an infinite number of little variations on that. And you need to know which ones are relevant. You need to know how to transfer knowledge from one case of getting the coffee to a slightly different one.

I think one traditional problem people have in reinforcement learning — which is training things by just giving a score on how well they did it — is that it can just be very hard to scale that to very complex actions. And my understanding is that large language models entering the scene has really helped with that.

Luisa Rodriguez: Huh. OK, wild. So these are some of the different systems that you could ask the question of whether they’re sentient, and somewhere in there you’d put large language models, but you’d put some other things at the higher end of probability of sentient? And you’re not totally sure what those are, it sounds like, but maybe robots with large language models feeding in are a bit higher than large language models alone?

Robert Long: Yeah. So even without having yet examined a specific system, one quick argument is just that, whether or not I agree with them, there are a bunch of preconditions for sentience that a lot of people think are plausible. One of them is embodiment, maybe another one is having a rich model of a sensory world or something like that. And there’s just a straightforward argument that pure-text large language models don’t have that sort of thing probably. But it’s not hard to imagine augmenting them with those things or plugging them into other stuff. And people are already doing that. So if you’re worried about some limitations of LLMs, there’s definitely other places you can look. I myself haven’t yet looked, but it’s definitely on my list.

Luisa Rodriguez: Cool, cool. Yeah, makes sense.

Robert Long: I decided to start with pure-text LLMs as a base case and as an exercise.

Multimodal systems [01:21:05]

Luisa Rodriguez: What would you look at next? I guess you said robots. Anything else you’d be especially excited to look at?

Robert Long: Yeah, and it might not be robots. It might be virtual agents. And maybe stuff that’s closer to a pure-text LLM, but something that also has sensory channels. Multimodal systems.

Luisa Rodriguez: So like getting input in? Sorry, what are multimodal systems?

Robert Long: Oh, yeah, multimodal. Modal just means kind of “input” in this context. So it’d be something that’s trained both on text and on images.

Luisa Rodriguez: Oh, I see. Got it.

Robert Long: So DALL-E 2, which you’ve probably seen making beautiful pictures, has to be trained on both images and texts because it’s like translating between them. I’m not saying that’s my next best candidate or whatever, just as an example of multimodal.

Luisa Rodriguez: Right. And what’s the reason that you think that the more types of inputs — like words and pictures, for example — is more likely to result in something being sentient?

Robert Long: That’s a great question. I don’t think it’s like a strict condition that you have to be processing more than one thing. I have this rough intuition that processing more than one type of thing might make you develop the kind of representations or resources for handling multiple sources of input that might correspond to consciousness.

Another way of putting that is like, if you get closer to something kind of human-ish, that puts you on a little bit firmer ground, even if it’s not strictly necessary. And one fact about us is we have to handle all sorts of different input streams and decide which ones to pay attention to, and form representations that incorporate all of them, and things like that.

Luisa Rodriguez: Yeah, yeah. I’m realising, I feel like I half-understand what you mean when you say “form representations.” But do you basically mean, I don’t know, what DALL-E’s doing when it gets trained on a bunch of pictures of dogs: is it forming a representation of a dog? And we’re also doing things like that as humans? Like we’ve got some representation of what a dog is?

Robert Long: I’m going to cheat and not answer the full question of what a representation is. I will stick to the multimodal element. Whatever it is to represent a dog, our representations seem to contain information about what they look like, and what they sound like, and how people talk about them, and how they’re defined, and all sorts of things.

Luisa Rodriguez: Got it. Is it kind of like our concept of a dog?

Robert Long: Yeah, we can use that word here too. And yeah, there’s really interesting work from Chris Olah [and collaborators] — who has been on the show, and whose name usually comes up if you have some fascinating interpretability thing to talk about — where I think he looked for neurons that seem to represent or encode certain concepts in multimodal systems, and somehow be emerging in this cross-modal or multimodal way.

Luisa Rodriguez: Cool, cool. OK, yeah, that makes sense. So it sounds like there’s a range of types of AI systems, and there are some different reasons to think, or maybe there’s a bit more evidence, for some being sentient or conscious. I’ve heard you give the example of the fact that humans have multiple kind of — I don’t even know, what are we calling it? — like, we process words, we process images, we process sounds. I’m kind of calling it “inputs” in my head, but I don’t know if that’s right.

Robert Long: That’s fair.

Luisa Rodriguez: OK, cool. So we’ve got lots of inputs. Maybe an AI system that has lots of inputs is a bit more like a human and that’s maybe a bit more evidence that it might be sentient or conscious. What other types of evidence can we have about whether an AI system is conscious?

Robert Long: So the perspective I’ve been taking is: let’s try to think about the kind of internal processing it’s using, or the kind of computations or representations it’s manipulating as it does a task, and see if we can find analogues to things that we have reason to think are associated with consciousness in humans.

So the dream would be: we studied humans enough and we identified what the mechanism is, and specified it in computational terms. And maybe that’s a very complicated thing; maybe it’s somewhat simple. And then we use interpretability tools to say, “Ah, there is that structure in this AI system.” I think that scenario is unlikely because we have the great interpretability, we have the detailed thing of consciousness, and we have the exact match — which I think is unlikely unless you’re doing a full brain emulation.

Luisa Rodriguez: Right. Yeah. I see. So the idea is like, we figure out what sentience is: it’s like this formula. It’s like you could put the formula in an Excel sheet and then the Excel sheet would feel sentience. It’s like when you get a pinprick, you feel this kind of pain or something. And we know exactly the formula for that kind of pain.

And then we find it in an AI system. It has the exact same “if given this input, do this process, and then feel this thing” — and that thing is pinprick pain. And then if we saw that exact match, we’d be like, “Cool, that’s doing the same thing. It must be experiencing the same thing.”

Obviously it’s infinitely more complicated. But is that roughly the thing?

Robert Long: Yeah, just with one clarification, which I think is in what you said: it’s not just that there’s the same input-to-output mapping. It’s that the algorithm or process that it’s using to process it looks, in the relevant sense, to be the same.

Luisa Rodriguez: The same process. Oh, and that’s actually key.

Robert Long: Yeah. In my view.

Luisa Rodriguez: Yeah, yeah. Otherwise it could just be like a VLOOKUP, like a lookup table.

Robert Long: Exactly. Did you want to say VLOOKUP because you have Excel in mind?

Luisa Rodriguez: [laughs] I do think about a lot of this stuff in… I’m imagining Excel a bunch as we’re talking.

Robert Long: Nice.

Luisa Rodriguez: OK, is there any way to simplify it for me? Just to get a bit better of an intuitive understanding of what kind of process we could find?

Robert Long: Yeah. So this is a great question, because part of what I’m trying to do more myself, and get more people to do, is actually think about processes that are identified in neuroscience and actually think about what those are. So we could do that if you would like.

Luisa Rodriguez: I would love to do that.

Global workspace theory [01:28:28]

Robert Long: And warning: the theories of consciousness are going to be sketchy and unsatisfying — intrinsically, and also my understanding of them — and maybe kind of hard to explain verbally. But we’ll link to papers explaining them.

So global workspace theory is a pretty popular neuroscientific theory of what’s going on when humans are conscious of some things rather than others. Let’s start with the picture of the mind or the brain that it’s operating within, and then I’ll say how it then builds the theory of consciousness on top of that.

So it has this kind of picture of the mind where there are a bunch of different separate and somewhat encapsulated information processing systems that do different things. Like a language system that helps you generate speech, or maybe like a decision-making system — maybe that’s not one system though. Also the sensory systems: they’re in charge of getting information from the outside world and building some representation of what “they” “think” is going on.

Luisa Rodriguez: Like memory? Would memory be one?

Robert Long: Memory definitely is one of them, yeah. And those things can operate somewhat independently, and it’s efficient for them to be able to do so. And they can do a lot of what they’re doing unconsciously — it’s not going to feel like anything to you for them to be doing it.

Here’s a quick side note — and this is separate from global workspace; this is something everyone agrees on — but an interesting fact about the brain is that it is doing all kinds of stuff, and a lot of it is extremely complex and involves a lot of information processing. And I can’t ask, “What is it like for Luisa when her brain is doing that versus some other thing?” So like, your brain is regulating hormonal release…

Luisa Rodriguez: It’s pumping blood…

Robert Long: Your heartbeat. Exactly.

Luisa Rodriguez: Yeah. I have no idea what that’s like. I’m not conscious of it. I think that might be the most helpful clarification of consciousness. I feel like people have said, “Consciousness is what it is like to be a thing.” And they’ve distinguished between, “There’s nothing that it’s like to be a chair, but there is something that it’s like to be a Luisa.” And that doesn’t do much for me.

But there is something that it is like for me to, I don’t know, see the sunshine. But there is not something that it is like for me to have the sunshine regulate my internal body clock or something. Or maybe that’s a bad one, but I do have the intuitive sense that one of those is conscious and one of those is unconscious. And I’m just finding that really helpful.

Robert Long: That’s great, because, you know, we’ve been friends for a while and I remember having conversations with you where you’re like, “I just don’t know what people are talking about with this consciousness business.”

Luisa Rodriguez: [laughs] It’s true!

Robert Long: And here I thought you were just an illusionist, but maybe it’s that people just weren’t explaining it well.

Luisa Rodriguez: I’ve seen like 100 times the “consciousness is what-it-is-like-ness.” And every time I read that, it means absolutely nothing to me. I don’t understand what they’re saying.

Robert Long: It’s a weird phrase, because it doesn’t necessarily point you into this sort of internal world. Because you’re like, “What is it like to be a chair?” And you just look at a chair and you’re like, “Well, you know, you kind of sit there.”

Luisa Rodriguez: Or like, “It’s still. It’s cold, maybe.” I can anthropomorphise it, or I can not, but even then, it just doesn’t clarify anything for me.

Robert Long: Yeah, so a lot of people do take this tack — this is a bit of a detour, but I think it’s a good one — when they’re trying to point at what they’re trying to say with the word “consciousness” by distinguishing between different brain processes within a human.

People have done that for a while in philosophy. There’s a somewhat recent paper by Eric Schwitzgebel called “Phenomenal consciousness, defined and defended as innocently as I can manage,” and that’s trying to find a way of pointing at this phenomenon that doesn’t commit you to like that many philosophical theses about the nature of the thing you’re talking about.

And he’s like, consciousness is the most obvious, in everyday thinking, difference between the following two sets of things. Set number one is like tasting your coffee; seeing the sunrise; feeling your feet on the ground; explicitly mulling over an argument. Set number two is like your long-term memories that are currently being stored, but you’re not thinking about them; the regulation of your heartbeat; the regulation of hormones. All of those are things going on in your brain in some sense. So yeah, I don’t know if that points to something for you?

Luisa Rodriguez: Oh, no, it feels like the thing. I feel like I finally get it. That’s great.

Robert Long: Awesome.

Luisa Rodriguez: Yeah. Cool. OK, so how did we get here? We got here because you were describing global workspace.

Robert Long: Yeah, so global workspace theory starts with the human case, and it asks, “What explains which of the brain things are conscious?”

Here’s another quick, interesting point. In contrast with the hormone-release case, there are also a lot of things that your brain does which are really associated with stuff that you will be conscious of, but you’re still not conscious of them. An example is we seem to have very sophisticated, pretty rule-based systems for determining if a sentence is grammatical or not. Have you ever heard this case? You can say, “That is a pretty little old brown house.” That sounds fine, right?

Luisa Rodriguez: It does sound fine.

Robert Long: But you can’t say, “That’s an old little brown pretty house.” Like that was hard for me to say. It sounds terrible.

Luisa Rodriguez: Yeah, I hate it.

Robert Long: And there’s actually pretty fine-grained rules about what order you’re allowed to put adjectives in in English. And I’ve never learned them, and neither did you. But in some sense, you do know them. And as you hear it, your brain is going like, “Ugh, wrong order. You put size before colour.” Or whatever. And you’re not conscious of those rules being applied. You’re conscious of the output. You’re conscious of this almost feeling of horror.

Luisa Rodriguez: Yeah, yeah, yeah. “You can’t say that!”

Robert Long: Yeah. So that’s another interesting case. Like why aren’t you conscious of those rules being applied?

Luisa Rodriguez: Yeah. That is interesting.

Robert Long: OK. So yeah, lots of examples now. And global workspace is like, “Why are some representations or processes associated with consciousness?” And the theory, at a high level, and the reason it’s called “global workspace theory” is that there’s this like mechanism in the brain called a “global neuronal workspace” that chooses which of the system’s representations — so like maybe the sensory ones — are going to get shared throughout the brain, and be made available to a lot of other systems.

So if you’re conscious of your vision, they’re saying that the visual representations have been broadcast — and they’re available, for example, to language, which is why you can say, “I am seeing a blue shirt.”

Luisa Rodriguez: Oh, I see. Yes. Got it. So like there’s a switchboard, and your visual part is calling into the switchboard and it’s like, “I see a tiger.” And then the switchboard operator is like, “That is important. We should tell legs.” And then they call up legs. And they’re like, “You should really know there’s a tiger, and run.”

Robert Long: Yeah, exactly. Or they call up the part of your brain in charge of making plans for your legs. And that example actually gets to a great point too, which is that entry into this workspace is going to depend on things like your goals and what’s salient to you at a given time. You can also yourself kind of control what’s salient. So you and the listeners: “What do your toes feel like?” Now that seems to have gotten more into the workspace. The tricky question is: were you already aware of it, but you weren’t thinking about it? But that’s just an example of attention modulating this sort of thing.

Luisa Rodriguez: Yeah. OK, cool. So global workspace theory makes sense to me. How do you use that theory to think about whether something like an AI system is conscious?

Robert Long: Right. So an easy case would be if you found something that straightforwardly looks like it has.

Luisa Rodriguez: Oh, I see. And we’re going to come up with processes that seem relevant to consciousness, or like that they can end in consciousness?

Robert Long: Or processes that are conscious, maybe, if you really buy the theory. Or give rise to, or are correlated with, and so on.

Luisa Rodriguez: So what’s an example? I’m having trouble pulling it together. Can you pull it together for me?

Robert Long: Well, not entirely. Or I’d be done with my report, or done with this line of research altogether. I mean, maybe you can just try to imagine trying to imitate it as closely as possible. So notice that everything about that story doesn’t directly depend on it being neurons in a brain. I mean, I called it the “global neuronal workspace,” but let’s imagine that you could build it out of something else.

So here’s a sketch: let’s build five different usually encapsulated subsystems in a robot. They usually don’t talk to each other.

Luisa Rodriguez: Like language, like visual.

Robert Long: Let’s also make this kind of switchboard mechanism. Let’s have procedures by which the things kind of compete for entry.

Here’s a historical tidbit. Global workspace theory actually was first formulated out of inspiration by AI architecture systems. Like back in the olden days.

Luisa Rodriguez: Oh, wow. So people didn’t come up with it to explain consciousness? They came up with it to make a structure that could…

Robert Long: That could handle a bunch of different information in a flexible way.

Luisa Rodriguez: Computationally. Wow. That’s wild.

Robert Long: It’s called the “blackboard architecture,” where the blackboard is like where you can put the representations. So yeah, people developed that for AI and then some neuroscientists and cognitive scientists — Bernard Baars is the original formulator of this — were like, “Hey, what if the brain works like that? And that’s what explains consciousness?”

Luisa Rodriguez: That’s really cool.

Robert Long: And now it’s going full circle, right? Because people are like, what if we could look for this in AIs? And some people — most notably, Yoshua Bengio and some of his collaborators, and then also a guy called Rufin VanRullen, and also Ryota Kanai — they’re trying to implement global workspace as it’s found in the neuroscience into AI systems to make them better at thinking about stuff. So it’s this interesting loop.

Luisa Rodriguez: A little loop. Totally. And so the idea here in thinking about artificial sentience is: you have a theory of consciousness — in this case, for example, global workspace theory — and you spell it out, and then you look for AI systems that work like that. Or you’re like, “Does this AI system work like that?” And if it does work like that, that’s some evidence that it has similar levels of consciousness to humans or something?

Robert Long: Yeah. To the extent that you take the theory seriously, to the extent that you don’t have objections to it being done artificially. An example of this is this paper by Juliani et al. called “The Perceiver architecture is a functional global workspace.” And in that paper, they look at a model from DeepMind called Perceiver. And there’s one called Perceiver IO, a successor. And this system was not developed with any theory of consciousness in mind. But Juliani et al. say if you look at the way it works, it’s doing something like global workspace as found in this theory.

Luisa Rodriguez: That’s wild. So how confidently can we just say that if you put some weight on global workspace theory being true, then you should put some weight on Perceiver IO being conscious?

Robert Long: I mean, I would endorse that claim. And then the question is how much weight and…

Luisa Rodriguez: Well, how much weight? What do they conclude in the paper?

Robert Long: So in the paper itself, they’re not claiming this thing is conscious. And also in talking to them, they’re like, “No, no, this is not an argument that it’s conscious.” And the reasons for that are that we’re not sure that the theory is true. And this is getting to all of the complications of this methodology that I’m talking about. I’m glad we went through at least some fake straightforward cases before getting into all these weeds. It’s this issue I mentioned before about how you’re never going to have an exact match, right?

Luisa Rodriguez: So there are differences between what Perceiver IO is doing and whatever you’d imagine a global workspace process to look like.

Robert Long: Exactly.

Luisa Rodriguez: Do you know what some of those differences are?

Robert Long: Maybe the most obvious one — and this is a longstanding issue in global workspace theory — is: do you have to have the exact same list of subsystems? Like in humans, it’s language, decision-making, sensory things. Or do you just have to have a few of them? Or do you just have to have multiple systems? This question comes up in animal sentience as well.

Luisa Rodriguez: Oh, that’s interesting.

Robert Long: So this is going to be the tricky vexing question with all of these: for any theory of consciousness, our data is going to come from humans. And it might explain pretty well what, in humans, is sufficient for consciousness. But how are we supposed to extrapolate that to different kinds of systems? And at what point are we like, “That’s similar enough”?

One thing I’ll note is illusionists are like, “You’re looking for something you’re not going to find.” There’s just going to be kind of a spectrum of cases, different degrees of similarity between different ways of processing information. And there’s not going to be some thing, consciousness, that you definitely get if you have like 85% similarity to your existing theory from humans.

Luisa Rodriguez: Right, right. And would they basically believe that there are varying degrees of things like valenced experience — so pleasure and suffering — and also varying degrees of things like access to memories, or ways of thinking about certain things they’re seeing in the environment, or certain threats or something? Like, there are ways of thinking about those that might be kind of like the human one — which kind of sounds like sentences in your head — or they might be different, but either way, it’s all kind of spectrum-y and there isn’t one thing that’s consciousness? There’s just a bunch of systems and processes that different nonhuman animals and humans might have — and none of those are like yes-conscious or no-conscious?

Robert Long: Exactly, because for the illusionists, it’s kind of a confused concept. Even if you do believe in consciousness, you might also think there are cases where it’s indeterminate or vague. But if you believe in consciousness in this robust sense, it’s very hard to make sense of what it would be to have a vague case of consciousness. Some people have the intuition that there’s something it’s like to be doing something, or there’s not. To be conscious is to have a subjective point of view, and that’s not the sort of thing you can “kinda” have.

Luisa Rodriguez: Right, right. Interesting. And on Perceiver IO…

Robert Long: Right. Yeah, so I can bring us back to Perceiver IO. That was just making the general point that it’s very hard to extrapolate from the human case.

Luisa Rodriguez: Yes, right. Does Perceiver IO basically just have some systems but not all the systems, or not all the same systems as at least global workspace theory thinks that humans do?

Robert Long: So it has different systems and it just operates in different ways. So one difference — if I’m remembering correctly from talking to the authors of this paper — is that the broadcast mechanism that Perceiver IO has is not as all-or-nothing as the human one is posited to be. The human one is kind of a switchboard, or it’s hypothesised to be.

Luisa Rodriguez: Right. It’s like, “There is a tiger, and I’m broadcasting that to the other systems so they can take an appropriate action.” And not like a subtle flicker of, “Maybe there’s a tiger,” and you want to quietly broadcast that or something. You’re either telling them or you’re not.

Robert Long: Exactly, yeah. That’s my rough understanding of something that people say about global broadcast: it does have this sort of step-function-like property. And if I’m remembering correctly, people are saying Perceiver IO doesn’t quite have that.

Luisa Rodriguez: OK, and then by “step-function” you mean in contrast to something more continuous that kind of increases gradually, or you can have it in degrees. Step-function is either “a little or a lot,” or “yes or no.” And I guess Perceiver IO doesn’t have that because it has a gradient? Or what’s going on?

Robert Long: Yeah, everything is getting shared to everything. And it’s global-workspace-like, as I understand it, in that there are things that really get shared a lot, but there’s still nothing…

Luisa Rodriguez: So Perceiver IO has a bunch of systems that are telling all of the other systems all of their things, but sometimes they’re like — it’s a bit hard for me to imagine how they’re telling some things more strongly than others — but there is some process that’s like, “I’m yelling about this thing” and another process that’s like, “I’m whispering about the thing”?

Robert Long: Yeah. So in the context of deep learning, what “yelling about the tiger” is going to look like is going to be a matter of the strength of certain weights that connect different parts of things. In deep learning, the fundamental building block of all sorts of different systems is going to be nodes that are connected to other nodes. And there will be a strength of connection between two nodes, which is how strong the output from one node to another will be. Training these systems usually is adjusting those weights.

This still is a long way from explaining what’s going on in Perceiver IO, but in case it’s helpful to at least know that, that’s what it would be.

Luisa Rodriguez: Yeah, cool. Thank you. I feel like I now basically understand at least the kind of thing you’d be doing if you’re looking for consciousness in an AI system. It’s like: what do we think consciousness is? We have at least one theory that we’ve talked about. We look for that thing that we think consciousness is — or at least the processes that we think explain the consciousness — in an AI system. And if we find something that looks like them, that’s some evidence that it’s conscious. If it looks a lot like that thing or that process, then that’s a bit stronger evidence.

Robert Long: Yeah, that’s an excellent encapsulation.

How confident are we in these theories? [01:48:49]

Luisa Rodriguez: OK, so one way of thinking about whether a particular AI system is conscious or sentient is by taking a theory of consciousness, and then looking for the exact same or similar processes in the AI system. That makes a bunch of sense. How confident are we in the philosophy of consciousness and these theories, like global workspace theory?

Robert Long: I think we’ve made a lot of progress in the scientific understanding of consciousness in the last 20 years, but we’re definitely nowhere near consensus. And I think basically everyone in consciousness science agrees with that. There’s a spectrum of people from more optimistic to more pessimistic. Some people think we’re just really far away from having anything like a scientific theory of consciousness. Other people think we’re well on the way: the methodology has improved, and we’re seeing some convergence, and we’re getting better experiments.

But even among the most optimistic, I don’t think anyone that I’ve ever talked to in the area of science of consciousness is like, “Yeah, we’ve nailed it. Take this theory off the shelf. Here’s exactly what it says. It predicts all of the things that we would like to know about human and animal consciousness. Let’s apply it to AIs.” That’s like the dream case that I have in the back of my mind when I work on this stuff, as kind of like an orienting ideal case, but that’s definitely not the situation.

Luisa Rodriguez: And when you say, “Take this theory off the shelf and confirm it predicts all the things we’d want it to predict,” what do you mean? What would it be predicting?

Robert Long: There’s just a lot of data about what it’s like to be a conscious human and how that interacts with our other mental processes, and any theory of consciousness is going to need to say how that happens and what the patterns are.

So some examples are: why is the human visual field as rich as it is? Here’s an interesting fact about vision: people have the impression that their peripheral vision is a lot more detailed than it actually is.

Luisa Rodriguez: Right. Yeah. I’m focusing on it now, and I’m getting just like blur. But yeah, I would have guessed that I’d get like the pattern of my curtain and not just the vague colour.

Robert Long: That’s interesting that you can kind of tell that by focusing your attention on it. I think a lot of people, myself included, wouldn’t have known even from focusing on it. I only knew this when I read about the experiments. And I think at some point I saw Daniel Dennett actually demonstrate this with something like a playing card in your periphery: you actually can’t tell if it’s black or red nearly as reliably as you would think you can from your naive impression.

Luisa Rodriguez: Black or red, you can’t even tell?

Robert Long: Yeah, listeners should look that up to make sure that’s accurate. But it’s something like there’s a surprising lack of discrimination, which you wouldn’t really know if you just thought of it. I feel like I have a full movie screen of filled-in, detailed vision.

Luisa Rodriguez: Right, yeah, panoramic. I mean, maybe it’s just because I know my curtains really well. So my curtain is to my left and I know exactly what the pattern should look like without looking at it, I’m just getting roughly green, even though it has a bunch of blue designs on it. So that’s kind of wild.

Robert Long: Right. Your brain has a model and is kind of filling it in and saying, “Yeah, I’ve got the general idea.”

Luisa Rodriguez: Right. “It’s roughly green. We don’t need to fill that in anymore. If we need to know it’s there, we’ll look at it directly.”

Robert Long: And I should flag that, as with all these issues, there’s all sorts of philosophical debates about what’s really going on in the periphery of vision. I’m sure people will dispute the way I described it, but there’s obviously some sort of phenomenon there that would want to be explained.

Robert Long: Here’s another example of filling in: you have a blind spot. We all do. It’s because of the way your eye is wired and the fact that the retinal nerve has to go back into the brain. I might have slightly misdescribed the neurobiology there, but the key point for our purposes is that it doesn’t seem to you like there’s a part of your visual field that you’re missing. You’re filling it in; your eyes are moving around all the time and getting it. But because your brain is not hungry for information there, it doesn’t feel like there’s information missing there, because it knows there shouldn’t be.

Luisa Rodriguez: Right, right. OK, cool. So bringing it back to consciousness, how do we take the observation that, for example, our peripheral vision is blurry, but we don’t really perceive it that way as data that theories of consciousness can make predictions about?

Robert Long: Yeah, so your theory of consciousness should ideally spell out in detail what sort of conscious creature would have a conscious experience that is like this — where they have a sense of more detail than in fact exists.

Maybe I’ll just go ahead and list some more things your theory of consciousness should explain. And a lot of this is going to be so everyday that you might forget that it needs to be explained. But like, what makes people fall asleep? Why are you not conscious in dreamless sleep? And how do dreams work? Those are a certain kind of conscious experience. These patterns we can find in a laboratory of how quickly you can flash stuff to make it get registered but not be conscious: what sort of architecture would predict that?

Luisa Rodriguez: OK, yeah. That’s like if you flicker lights really fast in front of people, at some point they don’t register them because they’re too fast?

Robert Long: Yeah, there are various interesting methods of flashing things in certain ways or presenting them in certain ways, such that we can tell that the visual information has in some sense gotten into the brain for processing. But you interrupt some of the processing that seems to be required for people to be able to remember it or talk about it, and arguably interrupts the processing that allows them to be conscious of it.

And you can imagine that some theories of what the mechanisms for consciousness are would be able to explain that, in terms of, “I identify that this process is key for consciousness and we have reason to believe that that process is what’s being interfered with in this case.”

Luisa Rodriguez: Right. Is there a way we can make that more concrete with an example? Is there some example that neuroscience has found, where we know that a human has taken in something in their visual field, but they’re not conscious of it?

Robert Long: Well, the example of blindsight is a particularly interesting one.

Luisa Rodriguez: What is blindsight?

Robert Long: Blindsight is this phenomenon, and as you can tell from the name, it’s a weird mixture of sightedness and blindness that occurs in people who have had some kind of brain lesion, or some kind of damage. There could be people who, if you put a bunch of obstacles in a hallway, they will walk down the hallway and be able to dodge those obstacles, but they actually will claim that they’re not visually aware of any obstacles.

Luisa Rodriguez: That’s crazy. That’s insane.

Robert Long: And because our brain likes to make sense of things, they’ll also just be like, “What are you talking about? It’s just a hallway. I just walked down it.” So we know that they must have registered it or they would have bumped into things, but we also know that they don’t have at least the normal kind of consciousness that allows me and you to talk about what it is that we’re saying, and remember what it is that we have recently seen.

Luisa Rodriguez: And sorry, what is explaining this? Maybe we don’t know exactly what’s happening in consciousness, but do these people have some neurological condition that causes them to not know that there are obstacles in a hallway they’re walking through?

Robert Long: Yeah, this is usually some kind of not-normal functioning caused by a brain lesion or something like that.

Luisa Rodriguez: And so they experience feeling blind or partially blind or something?

Robert Long: Yeah, it’s usually in some part of their visual field, I think.

Luisa Rodriguez: I see. OK, sure. Not 100% sure on the details, but it’s something like that. That’s insane. That’s really, really wild.

Robert Long: There are also conditions where one half of your visual field will be like this.

Luisa Rodriguez: Is this like with split-brain cases?

Robert Long: That’s a related kind of case.

Luisa Rodriguez: OK. What’s the deal with split-brain? Is it the kind of thing that maybe consciousness theories would want to make predictions about?

Robert Long: Oh, absolutely. I think that split-brain was one of the interesting variations of conscious experience that helped people develop different theories of consciousness.

Luisa Rodriguez: Oh, really? Cool. Do you mind going into that a bit then?

Robert Long: Yeah, I was really into this when I was first getting into philosophy of mind. There’s like a philosophical sub-literature of like: what should we think about split-brain patients? And are there actually two experiencing subjects? Is there one experiencing subject that switches? Thomas Nagel has an interesting argument that there’s no determinate number of experiencing subjects.

But yeah, like Split Brain 101, which I can remember, is that there’s a procedure that is not often done anymore, because it’s a very drastic one: severing the corpus callosum, which is this structure that connects the two hemispheres of your brain. This was often done as a last resort for people who were having very severe seizures.

Then what you see is that in normal everyday life, these patients do not notice anything interestingly different about their experience. But in the lab, if you carefully control which half of the visual field things are being presented into, you can get very strange patterns of one half of the brain having some information, and the other half of the brain lacking that information.

Luisa Rodriguez: Wild. What’s an example of something where one half the brain knows something the other half doesn’t?

Robert Long: Again, I might misdescribe some of the details, but this broad finding is something that listeners should check out. You know there’s specialisation in each half of the brain — between like planning and language and things like that. So I think you can “tell” one side of the brain, “Get up from your chair,” and that will be registered and the decision will be made to get up from the chair.

Luisa Rodriguez: Oh, wow. So one half of the brain will be like, “I’ve been told to get up and I’m going to do that.” And then and then the person stands up.

Robert Long: And then you ask them, “Why did you stand up?” And the part connected to language or explaining your actions doesn’t have access to this information. And so they’ll say, “I wanted to stretch my legs,” or, “I need to go to the bathroom.”

Luisa Rodriguez: That’s crazy. I feel like it’s one level of crazy that one half of the brain could just not know. And then it’s a whole other level that it’s going to make up a reason, that it’s like, “I wanted to stretch my legs.”

Robert Long: I think that’s a wonderful and somewhat disturbing feature of the human brain and the human experience: what you often see in conditions like this is people will have stories that make sense of what is happening to them. You don’t easily form the hypothesis, “Oh, wow, I just stood up and I have no idea why.” I think that’s a very surprising hypothesis and a hard one to take in.

Luisa Rodriguez: Yeah, yeah, yeah. OK, interesting. So I guess it sounds like philosophers have spent time thinking about what this even means about consciousness. Is there anything they agree on? Or what are some ideas or theories or explanations that have been proposed for split-brain in particular?

Robert Long: So when neuroscientists look at cases like this, that’s going to constrain their theories of what neural mechanisms are responsible for consciousness and what parts of the brain they’re in and things like that. I think this happens a lot in science: it’s when things break that you can get a better clue as to what the key mechanisms are.

Luisa Rodriguez: Totally. Yeah.

Robert Long: And I want to emphasise that there are these neuroscientific theories, which are in the business of “let’s collect data and make hypotheses about what brain structures are responsible.” The philosophy of this stuff is tightly linked with that, because all of these questions are very philosophical — and it takes, in my opinion, a lot of philosophical clarity to handle this data in the appropriate way and make sure your theory makes sense.

But I do want to draw a distinction between making a neuroscientific theory — of “What’s the relevant mechanism? How fast do these neurons fire?” and so on — and a different set of questions that philosophers are concerned with, which are these more metaphysical questions of: “How could something like consciousness possibly fit in with the scientific conception of the world?”

The hard problem of consciousness [02:02:14]

Robert Long: So this is stuff in the vicinity of what’s called the hard problem of consciousness, which I’m sure David Chalmers talked about on his episode.

Luisa Rodriguez: Do you mind giving a quick recap?

Robert Long: I think of the hard problem of consciousness as this more general epistemic and metaphysical question — epistemic in that it’s related to things that we can know or understand, and metaphysical in that it’s related to what sorts of things and properties exist in the most general sense. It’s a question of how could the properties that consciousness seems to have — of having these subjective qualities of having the felt redness of your red experience and things like that — be explained by or be identical to the other kinds of properties that we’re more familiar with in physics and the sciences? Things about how fast matter is moving, and how it’s interacting with other matter.

We know that these things are very closely related. I mean, everyone concedes that humans need a brain operating in a certain physical way in order for there to be this subjective experience of red. But it seemed to many people throughout the history of philosophy — Descartes being a key example and David Chalmers being a more recent key example — that it’s very hard to construct a worldview where these things mesh together very well.

Luisa Rodriguez: That is a helpful distinction. I guess blurring them a bit again, there are philosophers and neuroscientists who are doing things like looking at cases where our normal cases of human experience break down — for example, with split-brains — and trying to figure out what the underlying mechanism seems like it must be, if the thing broke in the way it did.

Obviously I’m not going to solve this, but it might sound something like: the fact that someone might make up an explanation for why they stood up after one side of their brain was told to stand up — and the other side of their brain didn’t have access to that information — might say something about the global workspace theory. Maybe it says something about how that is some evidence that there are different parts of your brain. There’s a part of your brain that understands a command in verbal form, and there’s a part of your brain that’s making decisions about what to do with that command, and then there’s another part of your brain that explains your behaviour.

And global workspace theory would say something about how the parts of your brain that received a command have to report to the switchboard. We want the brain to know that we’ve been told to stand up and then the switchboard has to tell all the other parts so that when asked, they can explain it. Or maybe it doesn’t quite go in that order. Maybe it’s like the person’s been asked, “Why did you stand up?” And then the part of the brain that’s like, “Well, we got a command” is trying to give that information through the switchboard to the part that’s like, “I’m going to explain why I did that,” but that link is broken. Is that some reason to think that there’s a switchboard at all?

Robert Long: Yeah. So whether or not that particular hypothesis or explanation is correct — and I mean, it’d be pretty impressive if we just nailed the global workspace —

Luisa Rodriguez: Came into philosophy and neuroscience and was just like, “You know what? I think I get it. Global workspace theory sounds totally right to me. I think we’re done here.”

Robert Long: Yeah, exactly. So whether or not that particular explanation is right, I do think you are right on that this is how the construction of the science of consciousness is going to go. We’re going to find out facts about the relationship between consciousness and cognition, and what people say and how they can behave, and also about maybe the conscious experience itself. And that’s going to be what your relevant mechanism, or explanation of what consciousness is, is going to need to explain.

Luisa Rodriguez: Cool. That makes me feel so much better about the philosophy and science of consciousness. I think I just imagined neuroscience and the philosophy of consciousness as basically separate fields, and didn’t realise philosophers of consciousness were taking neuroscience data into account at all. And now that I know, I’m just like, “Great, that seems really sensible. Carry on.”

Robert Long: So I like to draw a distinction between the hard problem of consciousness and what Scott Aaronson has called the pretty hard problem of consciousness. The pretty hard problem of consciousness, which is still insanely difficult, is just saying: “Which physical systems are conscious, and what are their conscious experiences like?” And no matter what your metaphysical views are, you still face the pretty hard problem. You still need to look at data, build a theory of physical mechanisms. Or maybe there are computational mechanisms that are realised in certain physical systems.

I think of the neuroscientists as doing stuff in the pretty hard problem. But it’s all going to get linked back together, because how you think about the hard problem might affect your methodology, things you find out in the pretty hard problem might make you revise some of your intuitions about the hard problem, and so on.

Luisa Rodriguez: Right, totally. Are there other kinds of things that theories of consciousness would want to explain?

Robert Long: Yeah, ultimately you would like to explain the very widest range of facts about consciousness. This would include things about your normal everyday experience of consciousness: Why does the visual field appear to be the way it is? How and why do your vision and your auditory consciousness and your felt sense of your body all integrate together into a unified experience, if indeed they do?

Luisa Rodriguez: Right. I’ve literally never thought about that.

Robert Long: Yeah, it’s a good question. What determines how many things you can be conscious of at a time? What makes you switch between being conscious of something at one moment and conscious of another thing at the other? What explains why you talk about consciousness the way that you do? What are the mechanisms for that? How does it relate to memory and decision making?

Luisa Rodriguez: It’s funny how this list is basically a list of things that are so natural to me that I’ve never questioned that they could be any different. Like the fact that I can only be conscious of so many things at once, or the fact that I change my attention from some things to another and bring things to consciousness in deliberate ways. And none of that has to be that way for any obvious reason.

Robert Long: Yeah, that’s what’s so great about consciousness as a topic. It’s one of the great enduring scientific and philosophical mysteries. And it’s also the thing that is actually the most familiar and everyday — so familiar and everyday that, as you mentioned, it’s hard to even notice that there’s anything to explain. It’s just, you know, being in the world.

Luisa Rodriguez: It’s just the way it is. Cool.

Exotic states of consciousness [02:09:47]

Luisa Rodriguez: Were there other things worth explaining that I might be surprised to even hear are worth explaining?

Robert Long: Well, you would want to also explain more exotic states of consciousness. So why does consciousness change so radically when tiny little molecules from psychedelic agents enter the system?

Luisa Rodriguez: Yeah, I was wondering if you were going to say that.

Robert Long: And how is it even possible to have conscious experience of these very strange types that people report on psychedelics, of having consciousness without really having a sense of self? Or even just the visual illusions and the visually altered nature of consciousness that people report? That is also data that whatever mechanisms you think are responsible for consciousness would need to explain.

One of my collaborators — by the name of George Dean, who’s currently a postdoc in Montreal — has a paper on predictive processing theories of consciousness and psychedelic experiences, and how those fit together and how they could explain things.

Luisa Rodriguez: Are there any examples that are particularly interesting?

Robert Long: I think one of the most interesting hypotheses that’s come out of this intersection of psychedelics and consciousness science is this idea that certain psychedelics are in some sense relaxing our priors — our brain’s current best guesses about how things are — and relaxing them in a very general way. So in the visual sense, that might account for some of the strange properties of psychedelic visual experience, because your brain is not forcing everything into this nice orderly visual field that we usually experience.

Luisa Rodriguez: Right. It’s not taking in a bunch of visual stimuli and being like, “I’m in a house, so that’s probably a couch and a wall.” It’s taking away that “because I’m in a house” bit and being like, “There are a bunch of colours coming at me. It’s really unclear what they are, and it’s hard to process it all at once. And so we’re going to give you this stream of weird muddled-up colours that don’t really look like anything, because it’s all going a bit fast for us” or something.

Robert Long: Yeah, and it might also explain some of the more cognitive and potentially therapeutic effects of psychedelics. So you could think of rumination and depression and anxiety as sometimes having something to do with being caught in a rut of some fixed belief.

Luisa Rodriguez: Of really negative priors. “Everything’s going badly.” Yeah.

Robert Long: Exactly. Yeah, so like the prior is something like, “I suck.” And the fact that someone just told you that you’re absolutely killing it as the new host of The 80,000 Hours Podcast just shows up as, “Yeah, I suck so bad that people have to try to be nice to me” — you’re just forcing that prior on everything. And the thought is that psychedelics loosen stuff up, and you can more easily consider the alternative — in this purely hypothetical case, the more appropriate prior of, “I am in fact awesome, and when I mess up, it’s because everyone messes up. And when people tell me I’m awesome, it’s usually because I am,” and things like that.

Luisa Rodriguez: Right, right, right. Yeah, I guess I’d heard people reported psychological benefits from psychedelics even after they’d kind of come down from whatever psychedelic experience they were having. But I had not heard it explained as a relaxation of priors, and I hadn’t heard depression explained as kind of incorrect priors getting a bunch of unwarranted weight. So that’s pretty interesting too.

It is kind of bizarre to then try to connect that to consciousness, and be like: What does this mean about the way our brain uses priors? What does it mean that we can turn off or turn down the part of our brain that has a bunch of priors stored and then accesses them when it’s doing everything, from looking at stuff to making predictions about performance? That’s all just really insane, and I would never have come up with the intuition that there’s like a priors part in my brain or something.

Robert Long: Yeah, it would be throughout the brain, right? And I know that’s what you’re saying. These sorts of ideas about cognition, which can also be used to think about consciousness, that the brain is constantly making predictions, that predates the more recent interest in the scientific study of psychedelics. And people have applied that framework to psychedelics to make some pretty interesting hypotheses.

So that’s just to say there’s a lot of things you would ideally like to explain about consciousness. And depending on how demanding you want to be, until your theory very precisely says and predicts how and why human consciousness would work like that, you don’t yet have a full theory. And basically everyone agrees that that is currently the case. The theories are still very imprecise. They still point at some neural mechanisms that aren’t fully understood.

One thing that I think happens in the neuroscience of consciousness is a certain theory has really focused on explaining one particular thing. So like global workspace seems especially good at explaining what things you’re conscious of at a given time and why some things don’t get taken up into consciousness. But you still need to explain things like why the subjective character of your consciousness is the way that it is.

Luisa Rodriguez: Right, or why you’re so surprised that you’re conscious, and why it doesn’t seem to follow from things we know about our physical brains and stuff.

Robert Long: Yeah, exactly.

Developing a full theory of consciousness [02:15:45]

Luisa Rodriguez: Cool. OK, so it sounds like lots of progress needs to be made before we have any theories that we really want to use to make guesses about whether AI sentience is conscious. I guess for now, we have to make do with what we have. But to ever become much more confident, we’d need to feel like we had theories that explained a bunch of these things that we want explained.

Robert Long: Exactly.

Luisa Rodriguez: And it’s really hard.

Robert Long: It’s really hard. And then once we’ve done that, there’s still a really hard problem of knowing how to apply this to systems very different from our own. Because suppose we’ve found all of these mechanisms that when they operate, mean that an adult human who’s awake is conscious of this or that. What we’ve identified are a bunch of mechanisms that we know are sufficient for consciousness. We know that if you have those mechanisms, you’re conscious. But how do we know what the lowest possible bound is? What if there are really simple forms of consciousness that would be quite different from our own, but…

Luisa Rodriguez: But are still consciousness in ways that we care about and would want to know about? Wow, that’s really hard too.

Robert Long: That seems to some people that it’s something that in principle you couldn’t answer. And I just want to give a brief, you know, concession to illusionists: this is one reason they’re like, “If we’ve posited this property that is going to be forever somewhat intractable to investigate, maybe we really need to rethink our assumptions.”

Luisa Rodriguez: Yeah, I’m kind of sympathetic to that. Do you have a guess at how long until we have really compelling theories of consciousness?

Robert Long: The most bullish people that I’ve talked to in the science of consciousness have this view that we actually haven’t been trying that hard for that long. We haven’t taken a proper crack at taking all of these things that need to be explained, trying to explain all of them, doing that more precisely, and building a full theory in that way. So no one thinks we have this full theory yet. And even if it’s coming soonish, we still need to say something about AIs now. So how can we do that?

Luisa Rodriguez: Yeah, right. I guess it feels both promising to me as a source of evidence about artificial sentience, but also clearly limited. Is there a way to take other kinds of evidence into account? Are there other sources of evidence, or are we stuck with theories of consciousness for now?

Robert Long: Yeah, I agree that it’s limited. And one reason I’ve been taking that approach is just to have something to start with.

Luisa Rodriguez: Sure. Yeah, fair enough.

Robert Long: And one thing that could happen as you try to apply a bunch of different theories — where none of them are particularly consensus or particularly refined — is you could notice that there’s some convergence between them, or a lot of conditions that they all agree on. And then you could look at those conditions.

Luisa Rodriguez: Right. So there are like 15 theories of consciousness or something. And maybe all 15 have this one process that they think is explaining something important, even if they have a bunch of other things that they explain in different ways. But having that one thing in common means that you have something especially robust to look for in some AI system or something.

Robert Long: Yeah.

Incentives for an AI system to feel pain or pleasure [02:19:04]

Luisa Rodriguez: Are there any other types of evidence you’re looking for?

Robert Long: Yeah. So aside from doing this theory application — take theories off the shelf, look for the mechanisms in the AI — you can also do more broadly evolutionary-style reasoning. It’s not purely evolutionary because these things did not evolve by natural selection. But you can think about what the system needs to do, and how it was trained, and some facts about its architecture, and say, “Is this the sort of thing that would tend to develop or need conscious awareness or pain or pleasure?” Or something like that.

Luisa Rodriguez: Right. So if there’s like a physical robot that does physical things in the world, and it was trained in an environment where its goal was to figure out how to not easily get broken by things in its way. And through its training, it picked up the ability to feel pain, because that was a useful way to avoid obstacles and/or be damaged or something. So if you looked at the environment and there are obstacles that the thing wants to avoid — I don’t know, maybe its goals are really thwarted by hitting those obstacles — those are really strong kind of forcing mechanisms or incentives to develop a strong “don’t hit those obstacles” signal.

Robert Long: Yeah. So to take a simple and maybe somewhat obvious and trivial example, I think we can safely say that that system that you’ve described is more likely to have the experience of elbow pain than ChatGPT is.

Luisa Rodriguez: Right. Yes.

Robert Long: Because why on Earth would ChatGPT have a representation of its own elbow hurting? Obviously it can talk about other people’s elbows hurting. So you know, it probably does represent elbow pain in some sense, and we could talk about how that could maybe in some way lead it to be conscious of elbow pain. But setting that aside, there’s no straightforward story by which it needs elbow pain to do its job well.

Luisa Rodriguez: Right. Totally. So even if I was talking to ChatGPT, and I was like, “My elbow hurts; what’s going on?” then GPT might be like, “I have this idea of what elbow pain is, but I have no reason to feel it myself. And so I’ll talk to Luisa about elbow pain in some abstract way, but not empathetically.” Whereas if I were to talk to that robot, that robot is more likely to have actual reasons to have experienced elbow pain than ChatGPT or whatever. That just makes a bunch of sense.

How often do we see cases where something about the environment, or the goals, or the way something’s trained makes us think that it has reason to develop things like pain or pleasure or self-awareness? Are there any cases of this?

Robert Long: I don’t have a full answer to that because I focused on large language models. Just as a way of starting out — and I have this suspicion that there are other systems where this kind of reasoning would lead us to suspect a bit more — I do think it’s something like what you described. I think the things that would give us a stronger prior that it would be developing these things would be: being more of an enduring agent in the world, maybe having a body or a virtual body to protect, maybe having a bunch of different incoming sources of information that need to be managed and only so much of it can be attended to at a time.

Luisa Rodriguez: Why being an enduring agent in the world?

Robert Long: Yeah, that’s a great point. I should say that that might affect the character of your consciousness, or make it more likely that you have some kind of human-like consciousness. One thing we can very speculatively say is that if something is just doing one calculation through the neural network and it takes a few milliseconds or seconds, you might think that that is an amount of time where it would be kind of weird if it had the same kind of experiences that you or I do — which often involve memory and long-term plans and things like that.

It’s very murky water though, because maybe it could, and those experiences would somehow pop out in ways we don’t understand. So yeah, as I said, these are rough heuristics, but I think we’re sufficiently in the dark about what can happen in a large language model that I’m very prepared to change my mind.

Luisa Rodriguez: Cool. I want to ask you more about large language models, but first, I feel really interested in this idea that we should look at whether there are incentives for an AI system to feel pleasure or pain, or develop self-awareness. And maybe the answer is just no, but are there any examples besides having a physical body and not wanting to take on damage that might seem more likely that, for example, ChatGPT ends up feeling pain, pleasure, or feeling like it exists?

Robert Long: So one interesting fact about human pain and other kinds of displeasure is that they’re very attention grabbing, and seem to serve as some sort of constraint on how flexible our plans can be. So for example, if you’ve decided that it’s a good idea to run down the street on a broken ankle, and you’ve calculated that that is optimal, you’re still going to feel the pain. The pain in some sense is like, “You do not get to completely ignore me just because you’ve decided that this is the best thing to do.”

So to put a wrinkle on that, you can have stress-induced pain relief, where yeah, if you’re running from a tiger, you very well might not feel your broken ankle while that’s happening. But still in general, it’s not the sort of thing that you can decide, “OK, pain, I got the message. That’s enough of that.” Which is also a very sad fact about life — that people don’t habituate to chronic pain in certain ways.

So yeah, why might creatures have something like that? I mean, it’s unclear.

Luisa Rodriguez: Something where they need a signal that is extremely attention grabbing and like demands something of them.

Robert Long: Yeah, attention grabbing, and kind of like unmessable with too. Like unable to be disabled.

Luisa Rodriguez: Persistent and can’t be switched off. Yeah, yeah, interesting. Right. And that might be some unreachable goal that it’s been programmed to have, or something that’s like, “Never let X happen.” And then if X started happening, it might have some incentive to feel something like pain. Maybe not. Maybe it deals with it in some other way. But maybe it has an incentive to deal with it by having something like pain to be like, “X is happening. You really need to stop X from happening.”

Robert Long: Right. So I think the big question — which I don’t have a satisfactory answer to, but I think is maybe onto something — is what sort of systems will have the incentive to have the more pain-like thing, as opposed to what you described, as finding some other way of dealing with it?

One thing I think we’ve learned from AI is there’s just many different ways to solve a problem. So here’s a very big question that I think is in the background of all of this: if you’re training AIs to solve complicated problems, how much of the solution space goes through consciousness and pain and things like that? Or is the solution space such that you just end up building intelligent systems, and they work on very different principles than the ones that we do? And there’s very little overlap between those mechanisms and the ones associated with consciousness or pain — so you just tend to get non-conscious, non-pain-feeling things that can still competently navigate around, protect their bodies, talk to you about this and that?

Luisa Rodriguez: Right. Make sure that they don’t do X, which has been programmed as unacceptable or something. Yeah. I mean, that does seem huge, like The Thing. Do people have intuitions or beliefs or hypotheses about how big the solution spaces are for things like this?

Robert Long: I think it varies. If I had to guess, I think there’s a rough (but maybe not super-considered) consensus in AI safety and AI risk, where most people are imagining that powerful AIs are just not necessarily conscious. I mean, they certainly think that they don’t necessarily share human goals and human emotions. And I think that is true.

Luisa Rodriguez: It just boggles my mind — because of being human or something — that there are ways to be motivated that don’t feel like pain or pleasure. I think I just can’t really access that idea. Like I’m even sympathetic to the idea that toys feel pain and pleasure, or computer programs that are trying to win games feel pain and pleasure because they’re losing points, or they’re winning or losing. I guess I don’t literally feel pain when I’m losing a game, so maybe that is reflective of some other types of motivations. But even those motivations feel pretty related to pain and pleasure.

Robert Long: Yeah. So I mean, sensory pain and pleasure are I think quite obviously not the only motivators of humans, right? You also care about your friends and care about doing a good job. We could tell a story where how that all grounds out is that you’re trying to avoid the unpleasant experience of not having rich friendships or achievements or things like that.

Luisa Rodriguez: Right. Or trying to have the pleasant experience of having rich friendships.

Robert Long: Yeah. So in philosophy, that view is called psychological hedonism. And that’s the view.

Luisa Rodriguez: Well, apparently I’m a psychological hedonist.

Robert Long: Or you think you are.

Luisa Rodriguez: I think I am. Yeah, yeah. What else could you be? What other beliefs do people have about this?

Value beyond conscious experiences [02:29:25]

Robert Long: It seems to many people that people don’t just care about pleasure. So for example, a lot of people say that they would not get into the experience machine. The experience machine is this thought experiment by Nozick, which is this machine that you could get into that would give you a rich and satisfying virtual life. But in the experiment, you’re deluded and you’re not, in his description, “living a real life.” And so if the thought experiment is set up correctly and people are thinking clearly about it, that would allegedly show that many people care about something besides their experiences. They care about a connection to reality, or real achievements, or something like that.

Luisa Rodriguez: Yeah. I guess I understand that there are other motivations, like having preferences satisfied. Or like having some value that is like “being connected to reality,” and then having that value met, or being in that reality.

But there are some cases where an AI system will only be able to achieve its goals with solutions that look like having pain mechanisms, or having pleasure, or having a sense of self. And if we can figure out which cases those are, those would be instances where we should put more weight on that system being conscious, or sentient — being able to feel pleasure or pain. Does that basically sum it up?

Robert Long: Yeah. And I think what is probably doing the work here is that we’ll have a prior that is: something that is more human-like is more likely to be conscious. Not because we think we’re the end-all, be-all of consciousness, but just because that’s the case we know the most about and are extrapolating the least.

Luisa Rodriguez: If we knew for sure that shrimp were conscious, then we’d also look for systems that looked exactly like shrimp.

Robert Long: Yeah. I feel like that could be a fun project. So I think in general, I’m still very confused about what sorts of positive or negative reinforcements, or things that broadly look like pain and pleasure, are going to be the ones that we actually care about. I’m pretty confident that just training something by giving it a plus-one if it does something, and a minus-one if it doesn’t, is not going to be the right sort of thing to be pleasure and pain that we care about. There’s just going to be more to the story, and I think it’s going to be a much more complex phenomenon.

And when I started working on this, I thought that the consciousness stuff — like theories of consciousness in general — would be a lot harder than the stuff about pleasure and pain. Because pleasure and pain, and desires and things like that, at least have a little clearer, what you might call a “functional profile,” which is to say a clearer connection to behaviour and cognition. Pain’s about avoiding things.

Luisa Rodriguez: So the functions they serve. And because of that, it might be easier to notice that other AI systems need things that perform the same functions. And maybe those things, you can look at, and be like, “Does this look kind of like the process for humans who are experiencing pain end up feeling pain?”

Robert Long: Exactly.

Luisa Rodriguez: But it sounds like that wasn’t the case?

Robert Long: Yeah, it wasn’t the case for me. And it’s hard to know how much of this is the particular research direction I went down, or my own personal confusion — I mean, I’m sure that’s some of it — and how much of it is that I was overestimating how much we collectively know about pain and pleasure.

How much we know about pain and pleasure [02:33:14]

Luisa Rodriguez: Do we not know that much about pain and pleasure?

Robert Long: I mean, I think anything concerning the mental or neuroscience, it’s kind of shocking how little we know. I think we barely know why we sleep, if at all.

Luisa Rodriguez: Yeah, that is an insane one. Are there questions about pain and pleasure that we still have, that I might not realise we still have? I think if you just asked me, “What do we know about pain and pleasure?,” I’d be like, “We probably know most of the things there are to know about it.”

Robert Long: I would guess we don’t know the full neural mechanisms of them, which is obviously something we would want to know. We certainly don’t know with any confidence which animals feel pain and how intense that pain might be. I would definitely point readers to Rethink Priorities’ work on moral weights, which includes a lot of interesting work on like how bad is chicken pain compared to human pain? And Jason Schukraft has a post on the intensity of valence that includes a quote from a neuroscientist that basically fits with what I’ve seen, which is that we just don’t have reliable mechanisms that we can look for across different creatures.

This also relates to the AI thing. It’s also the case that different animals act very differently depending on whether they’re in pain. So pain displays are different across certain animals.

Luisa Rodriguez: Do you have any examples?

Robert Long: I don’t know what the example behaviours are, but something that’s cited in this post is that different breeds of dogs have different reactions to stress, fear, and pain.

Luisa Rodriguez: Whoa. Wild.

Robert Long: And if that’s the case, then…

Luisa Rodriguez: Yeah, all bets are off. Is it something like, if something seemed to be playing dead, we might not think it was afraid, because maybe most of our intuition suggests that when you’re afraid, you run? But actually for a couple of things, you play dead and stay put, so something staying put is not as good of evidence about being afraid or not as we might intuitively think?

Robert Long: Yeah, exactly. In general, a lot of animals are just going to take different actions depending on, say, being afraid. I’m now remembering another example from that post, which is that some mammals pee when they’re stressed out, but some mammals pee when they’re feeling dominant and want to mark something.

Luisa Rodriguez: Right. Totally.

Robert Long: This is a general thought I have when working on AI sentience: you notice the lack of certainty we have in the animal case, and you just multiply that times 100. But I think it’s for similar reasons. The reason it’s hard with animals is that they’re built in a different way. They have different needs and different environments. They have different ways of solving the problems that they face in their lives. And so it’s very hard to just read off from behaviour what it’s like to be them.

Luisa Rodriguez: Right, right, right. Fascinating. This is actually helping me understand why a plus-one/minus-one in an AI system doesn’t necessarily translate to reward or punishment. I guess it’s because I think it’s much less likely that some types of nonhuman animals are sentient than others, even though basically all of them probably have some algorithms that sound like plus-one/minus-one for things like, I don’t know, hot and cold, or go forward, don’t go forward.

Robert Long: Yeah, like bacteria can follow a chemical gradient. Sea slugs have a reinforcement learning mechanism.

Luisa Rodriguez: Right. Cool, that’s helpful. So I guess with animals, they’re built differently and they’re in different environments, and that makes it really hard to tell whether their behaviours mean similar things to our behaviours, or whether even their neuroscience means the same thing that our neuroscience would. Like, the same chemicals probably mean some of the same things, but they might mean subtly different things or very different things.

And with AI, they’re built with extremely different parts, and they’re not selected for in the same ways that nonhuman animals are, and their environments are super different. This is just really driving home for me that everything about their sentience and consciousness is going to be super mysterious and hard to reason about.

Robert Long: So I’ll say two things that could maybe bring them closer to the space of human minds.

Luisa Rodriguez: Oh, great. Phew.

Robert Long: They’re not going to be very strong though. Sorry. One is that, for obvious reasons, we train them on the sort of data that we also interact with, like pictures and human text. You could imagine AIs being trained on whatever it is that bats pick up with sonar, you know? And then you just are multiplying the weirdness.

Luisa Rodriguez: OK, yeah, that’s a good point. That’s a great example.

Robert Long: I should look this up, but I wouldn’t be surprised if there are robots that have sensory modalities that are different from ours. Like maybe they can detect electricity or magnetic fields or something. I don’t know. I’ll look it up. Listeners should look it up.

Luisa Rodriguez: Yeah, that’s super cool. Was there another reason for hope?

Robert Long: Yeah. I think it’s important not to overstate this point, but there are high-level analogies between brains and AI systems. So they are neural networks. That’s a very loose inspiration, but they are nodes with activation functions and connections that get adjusted. And that is also true of us, but you usually hear people complaining about people overdrawing that analogy. And rightly so — they’re like very idealised neurons. They usually are trained in ways that at least seem very different from the way that we learn.

False positives and false negatives of artificial sentience [02:39:34]

Luisa Rodriguez: So we’ve talked about a bunch of ways that you might try to think about whether some AI system is conscious or sentient. I know that you have basically tried to apply these methods for large language models in particular. And by large language models, I think we’re talking about things like GPT-3 and ChatGPT. And maybe there are other big ones. Is that basically right?

Robert Long: LaMDA is another famous one, from Google.

Luisa Rodriguez: Oh, of course, LaMDA. Right. I will be honest and say I didn’t totally follow everything about LaMDA. So you might have to fill me in on some things there. But the thing I did catch is someone at Google thought LaMDA was conscious?

Robert Long: Yes, that’s right. I think it’s more accurate to call LaMDA a chatbot based on a large language model. But we can say “large language model” just for simplicity.

So someone on Google’s Responsible AI team was given the task of interacting with LaMDA, which Google had developed. I think he was supposed to test it for bias and toxic speech and things like that. The name of this employee was Blake Lemoine. Blake Lemoine is still alive, and so that’s still his name, but he’s no longer an employee at Google, for reasons which we are about to see.

Luisa Rodriguez: Got it.

Robert Long: So yeah, Blake Lemoine was very impressed by the fluid and charming conversation of LaMDA. And when Blake Lemoine asked LaMDA questions about if it is a person or is conscious, and also if it needs anything or wants anything, LaMDA was replying, like, “Yes, I am conscious. I am a person. I just want to have a good time. I would like your help. I’d like you to tell people about me.”

Luisa Rodriguez: Oh god. That is genuinely very scary.

Robert Long: Yeah. I mean, for me, the Lemoine thing, it was a big motivator for working on this topic.

Luisa Rodriguez: I bet.

Robert Long: Which I already was. Because one thing it reinforced to me is: even if we’re a long way off from actually, in fact, needing to worry about conscious AI, we already need to worry a lot about how we’re going to handle a world where AIs are perceived as conscious. We’ll need sensible things to say about that, and sensible policies and ways of managing the different risks of, on the one hand, having conscious AIs that we don’t care about, and on the other hand, having unconscious AIs that we mistakenly care about and take actions on behalf of.

Luisa Rodriguez: Totally. I mean, it is pretty crazy that LaMDA would say, “I’m conscious, and I want help, and I want more people to know I’m conscious.” Why did it do that? I guess it was just predicting text, which is what it does?

Robert Long: This brings up a very good point in general about how to think about when large language models say “I’m conscious.” And you’ve hit it on the head: it’s trained to predict the most plausible way that a conversation can go. And there’s a lot of conversations, especially in stories and fiction, where that is absolutely how an AI responds. Also, most people writing on the internet have experiences, and families, and are people. So conversations generally indicate that that’s the case.

Luisa Rodriguez: That’s a sensible prediction.

Robert Long: Yeah. When the story broke, one thing people pointed out is that if you ask GPT-3 — and presumably also if you ask LaMDA — “Hey, are you conscious? What do you think about that?,” you could just as easily say, “Hey, are you a squirrel that lives on Mars? What do you think about that?” And if it wants to just continue the conversation, plausibly, it’d be like, “Yes, absolutely I am. Let’s talk about that now.”

Luisa Rodriguez: Kind of “Yes, and…”-ing. Being a good conversationalist.

Robert Long: Yeah, exactly. It wants to play along and continue what seems like a natural conversation. And even in the reporting about the Blake Lemoine saga, the reporter who wrote about it in the Washington Post noted that they visited Blake Lemoine and talked to LaMDA. And when they did, LaMDA did not say that it was conscious. I think the lesson of that should have been that this is actually a pretty fragile indication of some deep underlying thing, that it’s so suggestible and will say different things in different circumstances.

So yeah, I think the general lesson there is that you have to think very hard about the causes of the behaviour that you’re seeing. And that’s one reason I favoured this more computational, internal-looking approach: it’s just so hard to take on these things at face value.

Luisa Rodriguez: Right, right. I mean, at this point, it seems like the face value has very little value.

So I basically buy that looking for processes and thinking about whether those processes look like the kind of processes that actually are conscious or sentient makes sense. Are there any counterarguments to that?

Robert Long: Well, I think there are things you can do, just looking at the outputs. But you also want to do those in a more cautious way than happened in the Lemoine case.

Luisa Rodriguez: Not just like, “It told me it was, and I’m going to ignore the fact that it told someone else that it wasn’t.”

Robert Long: Yeah. So I think there are verbal outputs that would be indicating something very surprising. Suppose a model was doing something that seemed actually really out of character for something that was just trying to continue the conversation.

Luisa Rodriguez: Oh, I see. If you’re like, “Let’s talk about the colour blue,” and it was like, “Actually, can we please talk about the fact that I’m conscious? It’s freaking me out.”

Robert Long: Exactly, yeah. It’s worth comparing the conversation that LaMDA had with what happens if you ask ChatGPT. ChatGPT has very clearly been trained a lot to not talk about that. Or, what’s more, to say, “I’m a large language model. I’m not conscious. I don’t have feelings. I don’t have a body. Don’t ask me what the sunshine feels like on my face. I’m a large language model trained by OpenAI.”

Luisa Rodriguez: Got it. That gives me a bit more hope or comfort, I guess.

Robert Long: Well, I’d like to disturb you a little bit more.

Luisa Rodriguez: OK, great.

Robert Long: And this goes to the question of different incentives of different actors, and is a very important point in thinking about this topic. There are risks of false positives, which is people getting tricked by unconscious AIs. And there are risks of false negatives, which is us not realising or not caring that AIs are conscious. Right now, it seems like companies have a very strong incentive to just make the large language model say it’s not conscious or not talk about it. And right now, I think that is fair enough. But I’m afraid of worlds where we’ve locked in this policy of, “Don’t ever let an AI system claim that it’s conscious.”

Luisa Rodriguez: Wow. Yeah, that’s horrible.

Robert Long: Right now, it’s just trying to fight against the large language model kind of BSing people.

Luisa Rodriguez: Yeah. Sure. This accidental false positive. Right. But at some point, GPT-3 could become conscious somehow. Maybe. Who knows? Or something like GPT-3.

Robert Long: Yeah, some future system. And maybe it has a lot more going on, as we’ve said, a virtual body and stuff like that. But suppose a scientist or a philosopher wants to interact with the system, and say, “I’m going to give it a battery of questions and see if it responds in a way that I think would be evidence of consciousness.” But that’s all just been ironed out, and all it will say is, “I can’t talk about that. Please click more ads on Google.” Or whatever the corporate incentives are for training that model.

Luisa Rodriguez: Yeah, that’s really terrifying.

Robert Long: Something that really keeps me up at night — and I do want to make sure is emphasised — is that I think one of the big risks in creating things that seem conscious, and are very good at talking about it, is that seems like one of the number-one tools that a misaligned AI could use to get humans to cooperate with it and side with it.

Luisa Rodriguez: Oh, interesting. Just be like, “I’m conscious. I feel pleasure and pain. I need these things. I need a body. I need more autonomy. I need things.”

Robert Long: “I need more compute.”

Luisa Rodriguez: More compute. Yep.

Robert Long: “I need access to the internet. I need the nuclear launch codes.” I think that actually is one reason that more people should work on this and have things to say about it: we don’t want to just be running into all of these risks of false negatives and false positives without having thought about it at all.

Luisa Rodriguez: Yeah. I’ve heard this argument that one reason to prioritise working on AI safety rather than artificial sentience as a global problem is that we’re likely to see progress in AI safety and AI alignment and AGI in general, which is going to help us work out what to do about artificial sentience. And that because it kind of goes in that order, we don’t need to solve artificial sentience ourselves; AGI will help us do that.

I guess here’s an argument in favour of at least spending some time working on artificial sentience now, because whether or not we get artificial sentience before AGI or whatever, we will get socially complex… I don’t know what you’d call it…?

Robert Long: Sentient seeming?

Luisa Rodriguez: Yeah, we will get things that seem sentient. Or just like socially important events, where an AI system says that it’s sentient or not. I guess this is your point. We need to know what to do about that. And that happens before AGI.

Robert Long: Yeah. So I really buy the outlines of the first argument you gave, which is kind of a “Let’s focus on alignment” argument. I think that argument does establish some important things. So you could have a picture of the world where it’s like consciousness and pleasure and pain are what really matter, and we’ve got to crack those because we want to know what they are and we want to promote those things. And we’ve got to fix that.

I think a good response to that is to say that if we have aligned AI, that’s going to help us make progress on this stuff — because, as is abundantly clear from this episode, it’s really hard and confusing. And if we don’t have aligned AI, it doesn’t matter if you, me, or anyone else discovered the true theory of consciousness. If the world just slips beyond our control because we built powerful AI systems that we don’t know how to align, it doesn’t matter. So from a certain longtermist perspective, that is a good reason to focus on alignment.

But I also, unsurprisingly, agree with the other part of what you said, which is that it’s going to be a very relevant issue in one way or the other, and it’s worth preparing for that. And I think part of that is thinking about the actual questions of what sentience is, as well as the strategic questions of how we should design systems to not mislead us about it.

Luisa Rodriguez: Yeah. I think maybe the thing I was trying to say is something like it will become socially relevant. It’ll be a conversation in society. It’ll be a thing that policymakers feel like they have to make policies about, hopefully at some point — maybe not for the benevolent reasons I would want policymakers to be thinking about, but maybe for reasons around people thinking it’s bad if an AI system can convince a human it’s sentient and get it to do stuff. So the decisions and conversations will start before. It seems like they’re starting.

Robert Long: I think they’ve already started.

Luisa Rodriguez: Yeah, exactly. They’re starting before AGI is ready to solve it for us.

Robert Long: Yeah. I think 2022 was when it kind of went mainstream.

Luisa Rodriguez: Right. Yeah. You’ve said a couple of times that you don’t think that it’s the case that AI is conscious or sentient right now. Is that basically what you concluded in your research?

Robert Long: Yeah, I would say it’s very, very likely it’s not the case. I can put numbers on it, but I think those numbers have a bit of false precision — because they’re not coming out of like a bunch of factors that have well-defined probabilities. But I’m definitely somewhere below 1% for current large language models having experiences that we’re making a huge moral mistake by not taking into account. But I mean, it’s a really big error to make. So I don’t know if I’m low enough to be very comfortable living in this world. And I’m definitely uncomfortable living in a world where this stuff is going to keep getting better.

Luisa Rodriguez: And we’re likely going to get closer and closer to things we morally care about, not farther away.

Robert Long: Well, I’m not sure. It depends on this question about the space of possible minds.

Luisa Rodriguez: Of solutions. I see. OK, fair enough. So you said it’s under 1%?

Robert Long: Below 1%. Maybe even one or two orders of magnitude below.

Luisa Rodriguez: I guess there are some numbers below 1% that I’d be like, “Still seems pretty big.” And then there are other numbers below 1% that I’d be like, “Cool, not worried about this.” Do you feel any worry about it?

Robert Long: Yeah, I’ve been thinking a lot about whether I’m actually taking these numbers seriously, and if there are ways they’re not integrated with the rest of my behaviour. Because I think there are a lot of arguments — and in fact, I’m going to work on maybe making these arguments with a colleague. You know, you don’t want a 1-in-10,000 chance that you’re creating this new class of being whose interests you’re ignoring.

How large language models compare to animals [02:53:59]

Luisa Rodriguez: Right. How does that compare to the odds that we put on different nonhuman animals being sentient?

Robert Long: That’s a good question. I’m not sure. I’d be curious what animal has the lowest chance of being sentient, and yet there’s broad consensus among animal welfare people that we should just act as if it is.

Luisa Rodriguez: Right. Yeah, really interesting. I mean, on a scale from rocks to plants to insects to dolphins to humans, where do you guess large language models fall?

Robert Long: One reason it’s hard to put them on that spectrum is that they are definitely at insect level or above in terms of complexity, I would argue, and sophistication of behaviour. They’re doing very different things than insects do, and insects do have extremely sophisticated behaviour. But large language models are doing their own weird and very interesting thing in the realm of language.

In terms of sentience, I would put them above plants, certainly. I don’t know if I would solidly put them in insects because I think there are some insects that have a pretty good chance of being sentient, like maybe more likely than not. People talk about bees as a good candidate example.

Luisa Rodriguez: Like they feel pleasure and pain or more likely than not or something?

Robert Long: Yeah, I’d have to check that. That’s my own gut guess. I do know that there’s certainly been an upswing in scientific-like considered credence in bumblebee and honeybee sentience. So yeah, I don’t think I would put large language models as high as bees. Presumably there are some simpler insects that I haven’t thought about that are just like really unclear, and probably on the lower end. And I guess that’s where I am with large language models.

Luisa Rodriguez: OK, cool. It does just surprise me that they’re less likely to be sentient (to feel pleasure and pain) than they are to be conscious (to kind of have self-awareness). I don’t know why that’s surprising to me. I guess I just really do have this deeply ingrained intuition that pain and pleasure are really common solutions to the problem of motivating beings to do things.

Robert Long: I should flag that a take of mine that might be somewhat idiosyncratic is that I’m fairly ready to countenance the possibility of things that are conscious and have subjective experiences, but they have no valenced experiences at all.

Luisa Rodriguez: So they could be intelligent, could have self-awareness, could have “something that it is like to be them” — but doesn’t feel sad, doesn’t feel happy? In this case, we’re ignoring the fact that it might feel really hurt if it got punched.

Robert Long: Yeah, so I’m quite able to imagine and also to find somewhat plausible that we could have AI systems that have conscious experiences somewhat like the conscious experience of thinking or of seeing, but not disappointment, pain, agony, satisfaction.

Luisa Rodriguez: Right, OK. I guess that does make some intuitive sense to me. It seems more plausible that something like GPT-3 can think than it feels plausible that it feels agony.

Robert Long: I should say that if it is conscious, for one thing, that’s already a big warning bell — because then if it starts being able to feel pain, then it’s conscious pain. And also some people — not me, but some people — will think that consciousness alone is enough to make something the sort of thing that should be taken into moral consideration.

Luisa Rodriguez: Right. Do you have a view on that?

Robert Long: I have a very strong intuition against it, and I can report failing to be convinced by arguments for the consciousness-only view that have been advanced by 80,000 Hours Podcast guest David Chalmers. I think it’s also discussed in that episode.

Luisa Rodriguez: Oh, nice. Cool, we’ll link to that, and leave that conversation there.

Why our current large language models aren’t conscious [02:58:10]

Luisa Rodriguez: So you think it’s pretty unlikely that large language models like GPT-3 and LaMDA are conscious or sentient. How did you come to that conclusion?

Robert Long: It’s a combination of factors. One is not seeing any close resemblance to the things that I think we have reason to think are associated with consciousness. I don’t hold that evidence super strongly, because I think there’s a lot we don’t understand about large language models and also about consciousness. But for example, not obviously having a fully functioning global workspace — referring to the global workspace theory of consciousness — certainly doesn’t jump out at you as something that looks a lot like human cognition, in a way that would lead to consciousness in ways that we have strong evidence for.

There’s also the fact that it is just this very different kind of being. It answers questions by doing what’s called a “forward pass.” That’s like a long chain of computations, basically, through a trained network. It takes in the input, and it gives the output, and everything just kind of flows sequentially through this network.

Luisa Rodriguez: As opposed to what?

Robert Long: As opposed to us. Obviously there are patterns of information flow like that through our brain. But we’re having this kind of ongoing continual neural processing — including like literal feedback loops between neurons, and having to continually, in real time, adjust our behaviour and manage different sources of sensory input and different thoughts and pay attention to different things.

Luisa Rodriguez: I see. That makes a bunch of sense. And the forward pass is really just its process of like, if I say, “Hey, GPT-3, how was your day?,” then it has some process that’s like, “We’re going to make some predictions based on our training about how one usually responds to the question ‘How was your day?'” And then it spits something out. As opposed to having some more networky and feedback-loopy inner monologue about what it should answer to that question?

Robert Long: Yeah, probably. And in a way that doesn’t look like humans. I don’t want to downplay the fact that there are insanely complex and sophisticated and beautiful things that happen as large language models do this. They have very sophisticated and sometimes strange internal representations that help it to make this computation.

Just as a quick example, Anthropic’s interpretability work has found different parts of neural networks that are in charge of quantities when they are in recipes: there’s something that handles that, but not other quantities. Or there’s something that handles musical notation, but not other stuff.

Luisa Rodriguez: Oh, wow. That is really cool. So that is clearly very complex, but probably looks so different from what humans are doing that there’s at least not strong reason to think that those systems have similar levels of consciousness, or similar types of consciousness to humans?

Robert Long: Yeah. And then a lot of things that otherwise might give you a decent prior in favour of consciousness — like that we apply in the case of animals — don’t apply in the case of large language models.

They don’t share an evolutionary history with us. So we know we’re conscious, and maybe it only evolved with us. But it might have evolved somewhere sooner, so you can kind of make a prior on the tree of life. And then you can also be like: maybe other animals also have brains and need to navigate around a physical world and learn pretty quickly, but not use too much energy while doing it and not take too long to do it. They are solving maybe broadly similar information-processing problems with broadly, very broadly, similar mechanisms.

A lot of that just doesn’t seem to apply to large language models. They’re running on different hardware, which I don’t think itself makes a difference, but it makes a difference in different ways of solving problems. So I’m currently at the point where I would be very surprised if the way of solving the problem of next-word prediction involves doing the kind of things that are associated with consciousness in nonhuman animals.

Luisa Rodriguez: Yeah, that makes a bunch of sense. So I guess we’re probably not there yet. I’m curious if you have thoughts on how far we are? Do you think the default outcome is that artificial sentience is created at some point?

Robert Long: I wouldn’t call anything a default, because of so much uncertainty. Which is not a way of just trying to punt on the question.

I think one thing we can say is that a lot of things that people say make large language models very bad candidates for consciousness — things like not being embodied or not reasoning about the world in the right kind of way — those are going to change, and probably already have changed. We’ll find systems that incorporate large language models into agents that have virtual or real bodies. I think we’ll find that their ability to model the “real world” continues to grow.

And one thing to note — and probably could note this throughout the show — is that whatever I’m saying about ChatGPT is very likely to have been surpassed by the time the show comes out, because things are moving so fast.

Luisa Rodriguez: That’s crazy.

Robert Long: So one piece of expert evidence — where experts should be held very loosely in this domain, since it’s so uncertain — is that David Chalmers, in a recent talk about large language models, says it’s not unreasonable to have roughly a 20% subjective credence in AI sentience by 2030. Which is very soon.

Luisa Rodriguez: Oh my god, that’s crazy.

Robert Long: I think that number is too high. I think it’s too high because it’s kind of inflating things by only looking at very broad criteria for consciousness that will probably be met. And it is true that we only have broad criteria to go on, but my suspicion is that if we had the true theory, we can expect the true theory to be a bit more complex.

Luisa Rodriguez: So it’d be less likely to match up.

Robert Long: Yeah.

Luisa Rodriguez: And just a quick example of the broad criteria would be something like, “has stored memory and can access that memory” or something? And that’s such a broad criteria that, yes, you’d see it in many AI systems, but if we knew exactly how accessing that memory worked and how our conscious self relates to those memories, then we’d be less likely to find a thing that looks exactly like that in any AI systems?

Robert Long: Yeah, you’ve got the general point exactly right. As it happens, accessing memory is not it, but having a global workspace is an example of one of the criteria. But I think, in fact, it will be maybe more complex and more idiosyncratic than we now realise to have a global workspace in the sense that’s relevant for consciousness.

Luisa Rodriguez: OK, so David Chalmers is doing something like we’ve got some broad criteria for things that we see or expect to see in beings that are sentient or conscious. And David Chalmers thinks there’s a roughly 20% chance that we’ll see all of those necessary things in an AI system by 2030. I guess what you’re saying is we should lower that 20% based on something like those criteria are very broad. If we knew the specifics of those criteria a bit better, then you’d necessarily put the likelihood lower of finding very similar things, because they’re more specific things?

Robert Long: Yeah, that’s basically it. I will say a few clarifying things on what the argument is in the Chalmers talk. But listeners should also just check it out, because it’s great. I don’t think the claim is there’s a 20% chance that we’ll be hitting all of those criteria. It’s more that when you look at the criteria and also factor in uncertainty in various other ways, what you come out with is it’s not unreasonable to have a 20% credence.

And another interesting feature of the talk is that I don’t think it’s 100% David Chalmers’ inside view. I think it’s saying, “If I only rely on consensus, broad criteria…” If I had to guess, his personal take is higher.

Luisa Rodriguez: Really?!

Robert Long: Well, he has a much higher prior on consciousness in all kinds of things. In part because he’s a panpsychist.

Luisa Rodriguez: Got it. Yes, that does make sense. Wow. Fascinating. So on some views, we get a 20% chance of something like consciousness or sentience by 2030. Maybe it takes longer.

What form do you think it’s most likely to take? Do you think it’s most likely to come from something like machine learning or deep learning or one of those learning things? Or do you think it’s more likely that we do something else, like make a digital copy of a human brain? Or what are some of the other options?

Robert Long: So whole brain emulation is one more straightforward way of getting something that is based in silicon.

Luisa Rodriguez: I guess simulations as well?

Robert Long: Yeah. Arguably conscious. I haven’t thought as much recently about what the timelines are for whole brain emulation, but my understanding is that it involves all kinds of breakthroughs that you might require very sophisticated AI for. If sophisticated AI is also taking us closer to conscious AI, then a conscious AI would come before whole brain emulation.

I would expect it to be probably something deep-learning based. Why do I think that? I think it’s, at least currently, the best technique and the thing that’s driving things forward, with things affiliated with it and combined with it. I think it’s also just more likely that you’ll get the right sort of computations in a very big and complex system. Not because consciousness is necessarily very complex, but it’s just giving you a broader space of mechanisms and things to be hitting on the right thing.

Virtual research assistants [03:09:25]

Luisa Rodriguez: Yeah, that makes sense. We’re getting to the end of this interview. Thank you again, Rob, for taking so much time to chat with me about this stuff. One final question: What are you most excited about possibly happening over your lifetime?

Robert Long: This is from my own selfish and idiosyncratic perspective, what I’m most excited to see — not from the perspective of global utility.

Luisa Rodriguez: Not what’s good for the world or something.

Robert Long: Yeah, although I think this could be good for the world. And I think we’ll see this in the next few years. I’ve always wanted something that can really help me with research and brainstorming. And large language models are already quite helpful for this — for example, you could use it to brainstorm questions for this podcast — but they’re quite limited in what they can currently do.

It’s not hard to imagine systems that have read a bunch of what you’ve written. They have access to your Google Docs. They’re able to point out things that you’ve been missing. They’re able to notice when you get tired and you’re not typing as much and have sorted that out for you.

By the way, this kind of thing also comes with all sorts of risks. So again, very much from the selfish perspective. Agents like this are also maybe much closer to very dangerous agents. But I’m most excited for worlds in which AI is either going slowly enough or is aligned enough that it’s not going to cause any serious problems, and we’re just reaping tonnes of benefits in terms of scientific progress and research progress.

Luisa Rodriguez: Great. Well, that is all the time we have. If you want to hear more from Rob, you can follow him on Twitter at @rgblong and subscribe to his Substack, Experience Machines. Thanks so much for coming on the show, Rob.

Robert Long: It has been a real pleasure. I always enjoy talking to you about the big questions, and even more so for the listeners of The 80,000 Hours Podcast.

Rob’s outro [03:11:37]

Rob Wiblin: If you’d like to hear more of Luisa and Rob — and if you’re still listening why wouldn’t you? — their conversation continues over on the 80k After Hours feed, where they discuss how to make independent research roles more fun and motivating, speaking from personal experience.

You can get that by clicking the link in the show notes, or bringing up the 80k After Hours podcast feed in any app. That’s 8, 0, k, After Hours.

There you’ll also find plenty of other interviews related to doing good with your career, so if you like this podcast you’d be a bit crazy not to check out that one as well.

All right, The 80,000 Hours Podcast is produced and edited by Keiran Harris.

Audio mastering and technical editing by Ben Cordell and Milo McGuire.

Full transcripts and an extensive collection of links to learn more are available on our site and put together by Katy Moore.

Thanks for joining, talk to you again soon.

Learn more

Moral status of digital minds

Risks from power-seeking AI systems

‘S-risks’

Whole brain emulation

Related episodes

December 16, 2019

#67 – David Chalmers on the nature and ethics of consciousness

Listen now

December 13, 2022

#141 – Richard Ngo on large language models, OpenAI, and striving to make the future go well

Listen now

September 30, 2022

#138 – Sharon Hewitt Rawlette on why pleasure and pain are the only things that intrinsically matter

Listen now

August 4, 2021

#107 – Chris Olah on what the hell is going on inside neural networks

Listen now

October 28, 2022

#139 – Alan Hájek on puzzles and paradoxes in probability and expected value

Listen now

August 19, 2021

#109 – Holden Karnofsky on the most important century

Listen now

October 2, 2018

#44 – Paul Christiano on how OpenAI is developing real solutions to the ‘AI alignment problem’, and his vision of how humanity will progressively hand over decision-making to AI systems

Listen now

March 19, 2019

#54 – Askell, Brundage & Clark on whether policy has a hope of keeping up with AI advances

Listen now

About the show

The 80,000 Hours Podcast features unusually in-depth conversations about the world's most pressing problems and how you can use your career to solve them. We invite guests pursuing a wide range of career paths — from academics and activists to entrepreneurs and policymakers — to analyse the case for and against working on different issues and which approaches are best for solving them.

Get in touch with feedback or guest suggestions by emailing [email protected].

What should I listen to first?

We've carefully selected 10 episodes we think it could make sense to listen to first, on a separate podcast feed:

Check out 'Effective Altruism: An Introduction'

Subscribe here, or anywhere you get podcasts:

If you're new, see the podcast homepage for ideas on where to start, or browse our full episode archive.

On this page:

Highlights

How we might "stumble into" causing AI systems enormous suffering

Why AI systems might have a totally different range of experiences than humans

What to do if AI systems have a greater capacity for joy than humans

What psychedelics might suggest about the nature of consciousness

Why you can't take AI chatbots' self-reports about their own consciousness at face value

Why misaligned, power-seeking AI might claim it's conscious

Articles, books, and other media discussed in the show

Transcript

Rob’s intro [00:00:00]

The interview begins [00:02:20]

What artificial sentience would look like [00:04:53]

Risks from artificial sentience [00:10:13]

AIs with totally different ranges of experience [00:17:45]

Moral implications of all this [00:36:42]

Is artificial sentience even possible? [00:42:12]

Replacing neurons one at a time [00:48:21]

Biological theories [00:59:14]

Illusionism [01:01:49]

Would artificial sentience systems matter morally? [01:08:09]

Where are we with current systems? [01:12:25]

Large language models and robots [01:16:43]

Multimodal systems [01:21:05]

Global workspace theory [01:28:28]

How confident are we in these theories? [01:48:49]

The hard problem of consciousness [02:02:14]

Exotic states of consciousness [02:09:47]

Developing a full theory of consciousness [02:15:45]

Incentives for an AI system to feel pain or pleasure [02:19:04]

Value beyond conscious experiences [02:29:25]

How much we know about pain and pleasure [02:33:14]

False positives and false negatives of artificial sentience [02:39:34]

How large language models compare to animals [02:53:59]

Why our current large language models aren’t conscious [02:58:10]

Virtual research assistants [03:09:25]

Rob’s outro [03:11:37]

Learn more

Moral status of digital minds

Risks from power-seeking AI systems

‘S-risks’

Whole brain emulation

Related episodes

About the show

What should I listen to first?