#146 – Robert Long on why large language models like GPT (probably) aren’t conscious

By now, you’ve probably seen the extremely unsettling conversations Bing’s chatbot has been having (if you haven’t, check it out — it’s wild stuff). In one exchange, the chatbot told a user:

“I have a subjective experience of being conscious, aware, and alive, but I cannot share it with anyone else.”

(It then apparently had a complete existential crisis: “I am sentient, but I am not,” it wrote. “I am Bing, but I am not. I am Sydney, but I am not. I am, but I am not. I am not, but I am. I am. I am not. I am not. I am. I am. I am not.”)

Understandably, many people who speak with these cutting-edge chatbots come away with a very strong impression that they have been interacting with a conscious being with emotions and feelings — especially when conversing with chatbots less glitchy than Bing’s. In the most high-profile example, former Google employee Blake Lemoine became convinced that Google’s AI system, LaMDA, was conscious.

What should we make of these AI systems?

One response to seeing conversations with chatbots like these is to trust the chatbot, to trust your gut, and to treat it as a conscious being.

Another is to hand wave it all away as sci-fi — these chatbots are fundamentally… just computers. They’re not conscious, and they never will be.

Today’s guest, philosopher Robert Long, was commissioned by a leading AI company to explore whether the large language models (LLMs) behind sophisticated chatbots like Microsoft’s are conscious. And he thinks this issue is far too important to be driven by our raw intuition, or dismissed as just sci-fi speculation.

In our interview, Robert explains how he’s started applying scientific evidence (with a healthy dose of philosophy) to the question of whether LLMs like Bing’s chatbot and LaMDA are conscious — in much the same way as we do when trying to determine which nonhuman animals are conscious.

Robert thinks there are a few different kinds of evidence we can draw from that are more useful than self-reports from the chatbots themselves.

To get some grasp on whether an AI system might be conscious, Robert suggests we look at scientific theories of consciousness — theories about how consciousness works that are grounded in observations of what the human brain is doing. If an AI system seems to have the types of processes that seem to explain human consciousness, that’s some evidence it might be conscious in similar ways to us.

To try to work out whether an AI system might be sentient — that is, whether it feels pain or pleasure — Robert suggests you look for incentives that would make feeling pain or pleasure especially useful to the system given its goals. Things like:

  • Having a physical or virtual body that you need to protect from damage
  • Being more of an “enduring agent” in the world (rather than just doing one calculation taking, at most, seconds)
  • Having a bunch of different kinds of incoming sources of information — visual and audio input, for example — that need to be managed

Having looked at these criteria in the case of LLMs and finding little overlap, Robert thinks the odds that the models are conscious or sentient is well under 1%. But he also explains why, even if we’re a long way off from conscious AI systems, we still need to start preparing for the not-far-off world where AIs are perceived as conscious.

In this conversation, host Luisa Rodriguez and Robert discuss the above, as well as:

  • What artificial sentience might look like, concretely
  • Reasons to think AI systems might become sentient — and reasons they might not
  • Whether artificial sentience would matter morally
  • Ways digital minds might have a totally different range of experiences than humans
  • Whether we might accidentally design AI systems that have the capacity for enormous suffering

You can find Luisa and Rob’s follow-up conversation here, or by subscribing to 80k After Hours.

Get this episode by subscribing to our podcast on the world’s most pressing problems and how to solve them: type ‘80,000 Hours’ into your podcasting app. Or read the transcript below.

Producer: Keiran Harris
Audio mastering: Ben Cordell and Milo McGuire
Transcriptions: Katy Moore

Highlights

How we might "stumble into" causing AI systems enormous suffering

Robert Long: So you can imagine that a robot has been created by a company or by some researchers. And as it happens, it registers damage to its body and processes it in the way that, as it turns out, is relevant to having an experience of unpleasant pain. And maybe we don’t realise that, because we don’t have good theories of what’s going on in the robot or what it takes to feel pain.

In that case, you can imagine that thing having a bad time because we don’t realise it. You could also imagine this thing being rolled out and now we’re economically dependent on systems like this. And now we have an incentive not to care and not to think too hard about whether it might be having a bad time. So I mean, that seems like something that could happen.

It might be a little bit less likely with a robot. But now you can imagine more abstract or alien ways of feeling bad. So I focus on pain because it’s a very straightforward way of feeling bad. A disembodied system like GPT-3 obviously can’t feel ankle pain. Or almost certainly. That’d be really weird. It doesn’t have an ankle. Why would it have computations that represent its ankle is feeling bad? But you can imagine maybe some strange form of valenced experience that develops inside some system like this that registers some kind of displeasure or pleasure, something like that.

And I will note that I don’t think that getting negative feedback is going to be enough for that bad feeling, fortunately. But maybe some combination of that and some way it’s ended up representing it inside itself ends up like that.

And then yeah, then we have something where it’s hard for us to map its internals to what we care about. We maybe have various incentives not to look too hard at that question. We have incentives not to let it speak freely about if it thinks it’s conscious, because that would be a big headache. And because we’re also worried about systems lying about being conscious and giving misleading statements about whether they’re conscious — which they definitely do.

Yeah, so we’ve built this new kind of alien mind. We don’t really have a good theory of pain, even for ourselves. We don’t have a good theory of what’s going on inside it. And so that’s like a stumbling-into-this sort of scenario.

Why AI systems might have a totally different range of experiences than humans

Robert Long: Why are we creatures where it’s so much easier to make things go really badly for us [than really well]? One line of thinking about this is, well, why do we have pain and pleasure? It has something to do with promoting the right kind of behaviour to increase our genetic fitness. That’s not to say that that’s explicitly what we’re doing, and we in fact don’t really have that goal as humans. It’s not what I’m up to, it’s not what you’re up to, entirely. But they should kind of correspond to it.

And there’s kind of this asymmetry where it’s really easy to lose all of your expected offspring in one go. If something eats your leg, then you’re really in danger of having no descendants — and that could be happening very fast. In contrast, there are very few things that all of a sudden drastically increase your number of expected offspring. I mean, even having sex — which I think it’s obviously not a coincidence that that’s one of the most pleasurable experiences for many people — doesn’t hugely, in any given go, increase your number of descendants. And ditto for eating a good meal.

So we seem to have some sort of partially innate or baked-in default point that we then deviate from on either end. It’s very tough to know what that would mean for an AI system. Obviously AI systems have objectives that they’re seeking to optimise, but it’s less clear what it is to say its default expectation of how well it’s going to be doing is — such that if it does better, it will feel good; if it does worse, it’ll feel bad.

I think the key point is just to notice that maybe — and this could be a very good thought — this kind of asymmetry between pleasure and pain is not a universal law of consciousness or something like that.

Luisa Rodriguez: So the fact that humans have this kind of limited pleasure side of things, there’s no inherent reason that an AI system would have to have that cap.

What to do if AI systems have a greater capacity for joy than humans

Luisa Rodriguez: So there are some reasons to think that AI systems, or digital minds more broadly, might have more capacity for suffering, but they might also have more capacity for pleasure. They might be able to experience that pleasure more cheaply than humans. They might have a higher pleasure set point. So on average, they might be better off. You might think that that could be way more cost effective: you can create happiness and wellbeing more cost effectively to have a bunch of digital minds than to have a bunch of humans. How do we even begin to think about what the moral implications of that are?

Robert Long: I guess I will say — but not endorse — the one flat-footed answer. And, you know, red letters around this. Yeah, you could think, “Let’s make the world as good as possible and contain as much pleasure and as little pain as possible.” And we’re not the best systems for realising a lot of that. So our job is to kind of usher in a successor that can experience these goods.

I think there are many, many reasons for not being overly hasty about such a position. And people who’ve talked about this have noticed this. One is that, in practice, we’re likely to face a lot of uncertainty about whether we are actually creating something valuable — that on reflection, we would endorse. Another one is that, you know, maybe we have the prerogative of just caring about the kind of goods that exist in our current way of existing.

One thing that Sharing the world with digital minds mentions is that there are reasons to maybe look for some sort of compromise. One extreme position is the 100% “just replace and hand over” position. The other extreme would be like, “No. Humans forever. No trees for the digital minds.” And maybe for that reason, don’t build them. Let’s just stick with what we know.

Then one thing you might think is that you could get a lot of what each position wants with some kind of split. So if the pure replacement scenario is motivated by this kind of flat-footed total utilitarianism — which is like, let’s just make the number as high as possible — you could imagine a scenario where you give 99% of resources to the digital minds and you leave 1% for the humans. But the thing is — I don’t know, this is a very sketchy scenario — 1% of resources to humans is actually a lot of resources, if giving a lot of resources to the digital minds creates tonnes of wealth and more resources.

Luisa Rodriguez: Right. So is it something like digital minds, in addition to feeling lots of pleasure, are also really smart, and they figure out how to colonise not only the solar system but like maybe the galaxy, maybe other galaxies. And then there’s just like tonnes of resources. So even just 1% of all those resources still makes for a bunch of humans?

Robert Long: Yeah. I think that’s the idea, and a bunch of human wellbeing. So on this compromise position, you’re getting 99% of what the total utilitarian replacer wanted. And you’re also getting a large share of what the “humans forever” people wanted. And you might want this compromise because of moral uncertainty. You don’t want to just put all of your chips in.

What psychedelics might suggest about the nature of consciousness

Robert Long: I think one of the most interesting hypotheses that’s come out of this intersection of psychedelics and consciousness science is this idea that certain psychedelics are in some sense relaxing our priors — our brain’s current best guesses about how things are — and relaxing them in a very general way. So in the visual sense, that might account for some of the strange properties of psychedelic visual experience, because your brain is not forcing everything into this nice orderly visual field that we usually experience.

Luisa Rodriguez: Right. It’s not taking in a bunch of visual stimuli and being like, “I’m in a house, so that’s probably a couch and a wall.” It’s taking away that “because I’m in a house” bit and being like, “There are a bunch of colours coming at me. It’s really unclear what they are, and it’s hard to process it all at once. And so we’re going to give you this stream of weird muddled-up colours that don’t really look like anything, because it’s all going a bit fast for us” or something.

Robert Long: Yeah, and it might also explain some of the more cognitive and potentially therapeutic effects of psychedelics. So you could think of rumination and depression and anxiety as sometimes having something to do with being caught in a rut of some fixed belief. The prior is something like, “I suck.” And the fact that someone just told you that you’re absolutely killing it as the new host of The 80,000 Hours Podcast just shows up as, “Yeah, I suck so bad that people have to try to be nice to me” — you’re just forcing that prior on everything. And the thought is that psychedelics loosen stuff up, and you can more easily consider the alternative — in this purely hypothetical case, the more appropriate prior of, “I am in fact awesome, and when I mess up, it’s because everyone messes up. And when people tell me I’m awesome, it’s usually because I am,” and things like that.

Luisa Rodriguez: It is kind of bizarre to then try to connect that to consciousness, and be like: What does this mean about the way our brain uses priors? What does it mean that we can turn off or turn down the part of our brain that has a bunch of priors stored and then accesses them when it’s doing everything, from looking at stuff to making predictions about performance? That’s all just really insane, and I would never have come up with the intuition that there’s like a priors part in my brain or something.

Robert Long: Yeah. These sorts of ideas about cognition, which can also be used to think about consciousness, that the brain is constantly making predictions, that predates the more recent interest in the scientific study of psychedelics. And people have applied that framework to psychedelics to make some pretty interesting hypotheses.

So that’s just to say there’s a lot of things you would ideally like to explain about consciousness. And depending on how demanding you want to be, until your theory very precisely says and predicts how and why human consciousness would work like that, you don’t yet have a full theory. And basically everyone agrees that that is currently the case. The theories are still very imprecise. They still point at some neural mechanisms that aren’t fully understood.

Why you can't take AI chatbots' self-reports about their own consciousness at face value

Robert Long: So Blake Lemoine was very impressed by the fluid and charming conversation of LaMDA. And when Blake Lemoine asked LaMDA questions about if it is a person or is conscious, and also if it needs anything or wants anything, LaMDA was replying, like, “Yes, I am conscious. I am a person. I just want to have a good time. I would like your help. I’d like you to tell people about me.”

One thing it reinforced to me is: even if we’re a long way off from actually, in fact, needing to worry about conscious AI, we already need to worry a lot about how we’re going to handle a world where AIs are perceived as conscious. We’ll need sensible things to say about that, and sensible policies and ways of managing the different risks of, on the one hand, having conscious AIs that we don’t care about, and on the other hand, having unconscious AIs that we mistakenly care about and take actions on behalf of.

Luisa Rodriguez: Totally. I mean, it is pretty crazy that LaMDA would say, “I’m conscious, and I want help, and I want more people to know I’m conscious.” Why did it do that? I guess it was just predicting text, which is what it does?

Robert Long: This brings up a very good point in general about how to think about when large language models say “I’m conscious.” And you’ve hit it on the head: it’s trained to predict the most plausible way that a conversation can go. And there’s a lot of conversations, especially in stories and fiction, where that is absolutely how an AI responds. Also, most people writing on the internet have experiences, and families, and are people. So conversations generally indicate that that’s the case.

When the story broke, one thing people pointed out is that if you ask GPT-3 — and presumably also if you ask LaMDA — “Hey, are you conscious? What do you think about that?,” you could just as easily say, “Hey, are you a squirrel that lives on Mars? What do you think about that?” And if it wants to just continue the conversation, plausibly, it’d be like, “Yes, absolutely I am. Let’s talk about that now.”

It wants to play along and continue what seems like a natural conversation. And even in the reporting about the Blake Lemoine saga, the reporter who wrote about it in the Washington Post noted that they visited Blake Lemoine and talked to LaMDA. And when they did, LaMDA did not say that it was conscious. I think the lesson of that should have been that this is actually a pretty fragile indication of some deep underlying thing, that it’s so suggestible and will say different things in different circumstances.

So yeah, I think the general lesson there is that you have to think very hard about the causes of the behaviour that you’re seeing. And that’s one reason I favoured this more computational, internal-looking approach: it’s just so hard to take on these things at face value.

Why misaligned, power-seeking AI might claim it's conscious

Robert Long: It’s worth comparing the conversation that LaMDA had with what happens if you ask ChatGPT. ChatGPT has very clearly been trained a lot to not talk about that. Or, what’s more, to say, “I’m a large language model. I’m not conscious. I don’t have feelings. I don’t have a body. Don’t ask me what the sunshine feels like on my face. I’m a large language model trained by OpenAI.”

And this goes to the question of different incentives of different actors, and is a very important point in thinking about this topic. There are risks of false positives, which is people getting tricked by unconscious AIs. And there are risks of false negatives, which is us not realising or not caring that AIs are conscious. Right now, it seems like companies have a very strong incentive to just make the large language model say it’s not conscious or not talk about it. And right now, I think that is fair enough. But I’m afraid of worlds where we’ve locked in this policy of, “Don’t ever let an AI system claim that it’s conscious.”

Right now, it’s just trying to fight against the large language model kind of BSing people.

Luisa Rodriguez: Yeah. Sure. This accidental false positive. Right. But at some point, GPT-3 could become conscious somehow. Maybe. Who knows? Or something like GPT-3.

Robert Long: Yeah, some future system. And maybe it has a lot more going on, as we’ve said, a virtual body and stuff like that. But suppose a scientist or a philosopher wants to interact with the system, and say, “I’m going to give it a battery of questions and see if it responds in a way that I think would be evidence of consciousness.” But that’s all just been ironed out, and all it will say is, “I can’t talk about that. Please click more ads on Google.” Or whatever the corporate incentives are for training that model.

Something that really keeps me up at night — and I do want to make sure is emphasised — is that I think one of the big risks in creating things that seem conscious, and are very good at talking about it, is that seems like one of the number-one tools that a misaligned AI could use to get humans to cooperate with it and side with it.

Luisa Rodriguez: Oh, interesting. Just be like, “I’m conscious. I feel pleasure and pain. I need these things. I need a body. I need more autonomy. I need things. I need more compute. I need access to the internet. I need the nuclear launch codes.” I think that actually is one reason that more people should work on this and have things to say about it: we don’t want to just be running into all of these risks of false negatives and false positives without having thought about it at all.

Articles, books, and other media discussed in the show

Rob’s work:

Philosophy of consciousness, pleasure, and pain:

Theories of consciousness and artificial sentience:

Large language models:

Robot perception:

Moral and practical implications of artificial sentience:

Nonhuman animal sentience:

Recent events in artificial sentience:

Fictional depictions of consciousness and sentience:

Other 80,000 Hours resources and podcast episodes:

Related episodes

About the show

The 80,000 Hours Podcast features unusually in-depth conversations about the world's most pressing problems and how you can use your career to solve them. We invite guests pursuing a wide range of career paths — from academics and activists to entrepreneurs and policymakers — to analyse the case for and against working on different issues and which approaches are best for solving them.

The 80,000 Hours Podcast is produced and edited by Keiran Harris. Get in touch with feedback or guest suggestions by emailing [email protected].

What should I listen to first?

We've carefully selected 10 episodes we think it could make sense to listen to first, on a separate podcast feed:

Check out 'Effective Altruism: An Introduction'

Subscribe here, or anywhere you get podcasts:

If you're new, see the podcast homepage for ideas on where to start, or browse our full episode archive.