Bonus: AGI disagreements and misconceptions: Rob, Luisa, & past guests hash it out
Bonus: AGI disagreements and misconceptions: Rob, Luisa, & past guests hash it out
By Robert Wiblin, Luisa Rodriguez and the 80,000 Hours podcast team · Published February 10th, 2025
On this page:
- Introduction
- 1 Transcript
- 1.1 Cold open [00:00:00]
- 1.2 Rob's intro [00:00:58]
- 1.3 Rob & Luisa: Bowerbirds compiling the AI story [00:03:28]
- 1.4 Ajeya Cotra on the misalignment stories she doesn't buy [00:09:16]
- 1.5 Rob & Luisa: Agentic AI and designing machine people [00:24:06]
- 1.6 Holden Karnofsky on the dangers of even aligned AI, and how we probably won't all die from misaligned AI [00:39:20]
- 1.7 Ian Morris on why we won't end up living like The Jetsons [00:47:03]
- 1.8 Rob & Luisa: It's not hard for nonexperts to understand we're playing with fire here [00:52:21]
- 1.9 Nick Joseph on whether AI companies' internal safety policies will be enough [00:55:43]
- 1.10 Richard Ngo on the most important misconception in how ML models work [01:03:10]
- 1.11 Rob & Luisa: Issues Rob is less worried about now [01:07:22]
- 1.12 Tom Davidson on why he buys the explosive economic growth story, despite it sounding totally crazy [01:14:08]
- 1.13 Michael Webb on why he's sceptical about explosive economic growth [01:20:50]
- 1.14 Carl Shulman on why people will prefer robot nannies over human nannies [01:28:25]
- 1.15 Rob & Luisa: Should we expect AI-related job loss? [01:36:19]
- 1.16 Zvi Mowshowitz on why he thinks it's a bad idea to work on improving capabilities at cutting-edge AI companies [01:40:06]
- 1.17 Holden Karnofsky on the power that comes from just making models bigger [01:45:21]
- 1.18 Rob & Luisa: Are risks of AI-related misinformation overblown? [01:49:49]
- 1.19 Hugo Mercier on how AI won't cause misinformation pandemonium [01:58:29]
- 1.20 Rob & Luisa: How hard will it actually be to create intelligence? [02:09:08]
- 1.21 Robert Long on whether digital sentience is possible [02:15:09]
- 1.22 Anil Seth on why he believes in the biological basis of consciousness [02:27:21]
- 1.23 Lewis Bollard on whether AI will be good or bad for animal welfare [02:40:52]
- 1.24 Rob & Luisa: The most interesting new argument Rob's heard this year [02:50:37]
- 1.25 Rohin Shah on whether AGI will be the last thing humanity ever does [02:57:35]
- 1.26 Rob's outro [03:11:02]
- 2 Learn more
- 3 Related episodes
Will LLMs soon be made into autonomous agents? Will they lead to job losses? Is AI misinformation overblown? Will it prove easy or hard to create AGI? And how likely is it that it will feel like something to be a superhuman AGI?
With AGI back in the headlines, we bring you 15 opinionated highlights from the show addressing those and other questions, intermixed with opinions from hosts Luisa Rodriguez and Rob Wiblin recorded back in 2023.
You can decide whether the views we expressed (and those from guests) then have held up these last two busy years. You’ll hear:
- Ajeya Cotra on overrated AGI worries
- Holden Karnofsky on the dangers of aligned AI, why unaligned AI might not kill us, and the power that comes from just making models bigger
- Ian Morris on why the future must be radically different from the present
- Nick Joseph on whether his companies internal safety policies are enough
- Richard Ngo on what everyone gets wrong about how ML models work
- Tom Davidson on why he believes crazy-sounding explosive growth stories… and Michael Webb on why he doesn’t
- Carl Shulman on why you’ll prefer robot nannies over human ones
- Zvi Mowshowitz on why he’s against working at AI companies except in some safety roles
- Hugo Mercier on why even superhuman AGI won’t be that persuasive
- Rob Long on the case for and against digital sentience
- Anil Seth on why he thinks consciousness is probably biological
- Lewis Bollard on whether AI advances will help or hurt nonhuman animals
- Rohin Shah on whether humanity’s work ends at the point it creates AGI
And of course, Rob and Luisa also regularly chime in on what they agree and disagree with.
Audio engineering: Ben Cordell, Milo McGuire, Simon Monsour, and Dominic Armstrong
Transcriptions and additional content editing: Katy Moore
Transcript
Table of Contents
- 1 Cold open [00:00:00]
- 2 Rob’s intro [00:00:58]
- 3 Rob & Luisa: Bowerbirds compiling the AI story [00:03:28]
- 4 Ajeya Cotra on the misalignment stories she doesn’t buy [00:09:16]
- 5 Rob & Luisa: Agentic AI and designing machine people [00:24:06]
- 6 Holden Karnofsky on the dangers of even aligned AI, and how we probably won’t all die from misaligned AI [00:39:20]
- 7 Ian Morris on why we won’t end up living like The Jetsons [00:47:03]
- 8 Rob & Luisa: It’s not hard for nonexperts to understand we’re playing with fire here [00:52:21]
- 9 Nick Joseph on whether AI companies’ internal safety policies will be enough [00:55:43]
- 10 Richard Ngo on the most important misconception in how ML models work [01:03:10]
- 11 Rob & Luisa: Issues Rob is less worried about now [01:07:22]
- 12 Tom Davidson on why he buys the explosive economic growth story, despite it sounding totally crazy [01:14:08]
- 13 Michael Webb on why he’s sceptical about explosive economic growth [01:20:50]
- 14 Carl Shulman on why people will prefer robot nannies over human nannies [01:28:25]
- 15 Rob & Luisa: Should we expect AI-related job loss? [01:36:19]
- 16 Zvi Mowshowitz on why he thinks it’s a bad idea to work on improving capabilities at cutting-edge AI companies [01:40:06]
- 17 Holden Karnofsky on the power that comes from just making models bigger [01:45:21]
- 18 Rob & Luisa: Are risks of AI-related misinformation overblown? [01:49:49]
- 19 Hugo Mercier on how AI won’t cause misinformation pandemonium [01:58:29]
- 20 Rob & Luisa: How hard will it actually be to create intelligence? [02:09:08]
- 21 Robert Long on whether digital sentience is possible [02:15:09]
- 22 Anil Seth on why he believes in the biological basis of consciousness [02:27:21]
- 23 Lewis Bollard on whether AI will be good or bad for animal welfare [02:40:52]
- 24 Rob & Luisa: The most interesting new argument Rob’s heard this year [02:50:37]
- 25 Rohin Shah on whether AGI will be the last thing humanity ever does [02:57:35]
- 26 Rob’s outro [03:11:02]
Cold open [00:00:00]
Ian Morris: Out of those three possibilities — we go extinct, we turn into superhumans, or things basically stay the same — I would say the one that we can bet the farm on, is the first: we go extinct. But of course, just putting it like that, it’s not a very interesting or helpful observation. The interesting bit would be asking under what circumstances do we go extinct? I think the first prediction (the “go extinct” one) and the third prediction (turn into superhumans), these two start to merge together.
The one that is so unlikely we can just dismiss it out of hand is that everything stays more or less the same, and the future is like The Jetsons, everybody is the same people they are now, but they’ve got their personal spaceships. This “stay basically the same” scenario is just staggeringly unlikely. Business as usual is simply not going to be an option. We’re talking about a very, very profound transformation of everything.
Rob’s intro [00:00:58]
Rob Wiblin: Hey listeners, this is Rob Wiblin, and you were just listening to historian Ian Morris from episode #168 on whether deep history says we’re heading for an intelligence explosion.
Today’s episode is a mix of something old and something new.
People really loved our compilation of highlights about parenting from last year, so we’re going to do them on some other topics as well.
But we’ve also got an hour of never-before-released content mixed in here.
You see, back in August 2023 Luisa and I recorded an interview with me about my overall opinions on AGI, but we didn’t manage to get through all the questions before we had to stop and go off to a meeting or something.
This was after we’d both recorded a series of AI episodes prompted by ChatGPT, but between hearing from guests I hadn’t had much chance to say where I came down overall.
Anyway, we were going to come back and record a second session soon after, but hopefully you know how it is and can forgive us… life got away from us.
I think I was sick, then we weren’t in the same place, then other priorities came up, and eventually I had a kid… and it all just felt too late to come back and finish.
But we listened back on it recently and I realised I actually think it holds up pretty well. I wouldn’t say all of these things now and the focus of the conversation would be different. But I was surprised how solid it still felt after an 18-month period where so much crazy stuff has gone down. I think it’s healthy to look back at your old opinions sometimes and see how horrible your predictions are and how they’ve changed, and in that account I was more relieved listening back to this than anything else.
So coming up you’ve got 15 opinionated highlights from the show and then my reaction to those same topics or a related topic. You can use the chapter thing in your podcast app to skip around if you like.
Topics include:
- Which misalignment stories are overrated.
- The biggest misconceptions about how AI models work.
- Why two guests buy the intelligence explosion argument while another disagrees.
- Why AI misinformation won’t be so bad as people think.
- A great summary of the cases for and against digital sentience.
- And whether inventing AGI will be the last decision humanity makes… or just the start of a new set of difficult and annoying choices.
I was really delighted listening to this compilation after the production team put it together, because even I, who was in many of these conversations, had forgotten about many of these entertaining bits from the show over the years.
I hope you enjoy it as well.
So without further ado, I bring you Luisa and me in 2023 and 15 past guests’ opinions as well.
Rob & Luisa: Bowerbirds compiling the AI story [00:03:28]
Luisa Rodriguez: OK, I’m here with Rob Wiblin. You’ve been doing a bunch of interviews on AI. You’ve recorded eight so far, I think, and we basically thought it’d be useful to sit down and see if you can synthesise some of the most important things you’ve learned and where you’re landing on some of those key questions. I’ve done some interviews on AI too, but I think I’m still letting it percolate, so I’m not ready to share mine yet. We basically just thought it would be informative to listeners to hear what you actually think. I think mostly they get some of your pushback and maybe have an inkling of what you think, but not the full picture.
Rob Wiblin: Yeah, I guess that’s a polite way of putting it. Maybe another way of putting it would be that during the interviews with other people, I try to not make it all just about my opinions. I do my best to restrain myself. So maybe I’ve had so many opinions beginning to bubble out that we had to make some space for it.
Luisa Rodriguez: Dying to get them out.
Rob Wiblin: Yeah. I mean, it’s been a real summer of AI this year. I just feel like every week I’m doing more reading about AI, listening to interviews that guests have done in the past and reading their articles, tracking it on Twitter. I guess to begin with, it was a bit depressing for me focusing on AI so much, because we tend to focus on the dark side of it, on the way that things could go wrong.
Luisa Rodriguez: Yeah. I remember you saying you were nervous about doing more AI interviews as things kind of heated up, because it was so depressing to you. But it seems like you’ve become kind of obsessed.
Rob Wiblin: Yeah. I think in previous years, one reason that we didn’t do more AI interviews maybe is just that I found them a bit aversive, because I do find the topic so scary. I guess my intuitions are to be pretty worried. And then I suppose ChatGPT and all of the progress just forced our hand, and I was like, I can’t really put off doing more investigation of this any longer. So yeah, I was actually kind of down for a couple of months there, just looking at all of the progress and thinking about where might this go? And how long might it take for things to get into an actually dangerous situation?
But I suppose humans adjust to all circumstances, and for some reason, I guess maybe I’ve just to some extent accepted my fate and become more chill about it. Or maybe my expectations have reached the point where now when I read stuff, on average it neither makes me more optimistic nor more pessimistic. It’s just neutral.
Luisa Rodriguez: For me, it’s just one of those things that’s like supposedly when you get a hand cut off, you bounce back from it, and your baseline happiness goes to about the same levels. And I feel like that’s just kind of happened to me, which is a bit depressing in its own right, but I’ll take it.
Rob Wiblin: Yeah, exactly. I think I’d see the glass as half full on this one.
I guess there’s a big risk doing an interview on AI where I’m the guest of sorts, because I’m not actually an expert on AI, so there’s definitely a risk that I’m going to say stuff that’s wrong. I feel like the comparative advantage that we can bring as hosts perhaps is being like little bowerbirds who go searching about, reading lots of different people, doing interviews on different angles on the problem — like the policy side, the technical side, different approaches that people are taking — and we can find the little gems or the things that seem most important to us from each different section, and maybe have a slightly easier time forming an overall worldview about the thing than someone who’s focused on one.
If you just go to work every day and you’re just focused on interpretability, that’s fantastic — but it means you probably don’t have time, and you shouldn’t be spending your time tracking what’s happening in the policy world. So it might be a little bit hard to form expectations about how are things playing out in the big picture. I mean, I don’t think anyone has an amazing grasp on this, but maybe we can add some insight by spending so much time reading all over the place about it.
Luisa Rodriguez: Yeah. On any one given topic, there are people that are going to be much, much more expert than you and me, of course. But I was surprised by the extent to which hosting these episodes and preparing for them — we spend something like a week preparing for each one — has made me so much more well read on these topics than I would have been otherwise. And that is something, especially when it’s preparing for a bunch of different types of AI interviews. I completely agree that you get this high-level understanding of how things come together in a way that I just wasn’t expecting.
Rob Wiblin: Yeah. I mean, for example, I could never have the insights that the compute governance people have about compute governance, but I might have a shot at being able to read their stuff and be like, “What are the two most important takeaways from all of this work on compute governance?” One takeaway would be that they haven’t figured out how to solve the problem.
Luisa Rodriguez: Another takeaway I had is just that, to some extent, we’ve made more progress on a couple of things than I realised, which was a bit reassuring. But overall, yeah.
Rob Wiblin: What’s an example of that?
Luisa Rodriguez: For me, it was governance.
Rob Wiblin: So the governance was in a more advanced state than what you’d expected?
Luisa Rodriguez: Yeah. I think there was something that was happening for me where I’d go to conferences and occasionally catch up with friends who are working on something, and I’d pick up here and there updates to fields. And for AI governance, for a while, I felt like the basic thing I’d heard was that it’s really hard to make progress; there’s not that much progress being made; these things are just too far in the future to do anything concrete. And then learning about a range of governance issues on the podcast and prepping for the podcast made me realise we’ve got concrete things now.
Rob Wiblin: People have concrete policy ideas, and they’ve actually thought about them.
Luisa Rodriguez: Yeah, on a wide range of policy topics, and there are competing ones. And you wouldn’t get that from being a single researcher pursuing their one governance agenda or something.
Rob Wiblin: Yeah, there was much more stuff going on in policy space, kind of on the DL, that I had not been hooked into. So that was definitely a positive update.
Luisa Rodriguez: Yep.
Ajeya Cotra on the misalignment stories she doesn’t buy [00:09:16]
From #151 – Ajeya Cotra on accidentally teaching AI models to deceive us
Rob Wiblin: What’s a possible view that people might think that you have that you actually don’t?
Ajeya Cotra: One big view — that I think is actually a misconception of what people worried about AI misalignment have been saying, but I understand why people have this misconception — is people get really fixated on the idea of human values being really complicated and hard to specify and hard to understand. They’re worried about AI systems that are really good at things like physics and math and science, but basically just don’t get what it is that humans want to see from them, and what human values really are.
An example that sometimes people bring out is you ask your AI robot to cook dinner, and it doesn’t understand that you wouldn’t want it to cook the cat if you didn’t have any ham in the fridge, or something like that. That kind of worry is something that I think is quite overrated. I actually think that, in fact, having a basic understanding of human psychology, and what humans would think is preferable and not preferable, is not a harder problem than understanding physics or understanding how to code and so on.
I expect AIs will perfectly well understand what humans want from them. I actually don’t expect to see mistakes that seem so egregious as cooking the family’s cat for dinner, because the AI systems will understand that humans are going to come home and look at what they did and then determine a reward and take some action based on that, and will know that humans will be displeased if they come home to see that the cat has been killed and cooked.
In fact, a lot of my worries stem from the opposite thing — they stem from expecting AI systems to have a really good psychological model of humans. So, worrying that we’ll end up in a world where they appear to be really getting a lot of subtle nuances, and appear to be generalising really well, while sometimes being deliberately deceptive.
Rob Wiblin: What’s another view that people might attribute to you that you don’t hold?
Ajeya Cotra: Another thing that might be worth highlighting is that, often, the case for AI risk — and the case for AIs potentially autonomously seeking to grab power for themselves or take over the world or kill humans — is premised on this notion that a sufficiently powerful AI system will have a sort of crisp, simple, long-term utility function. Like, it’s going to be trying to maximise something in the long-run future. Maybe it’s trying to maximise copies of itself. Or the common cartoon example is that it’s trying to maximise paperclips in the long-run future.
People often start from the premise that a sufficiently intelligent system will be this long-run maximiser that has a pretty simple thing that it’s maximising. I’m unsure if that’s how AI systems will end up thinking. I don’t think that’s particularly necessary to get the conclusion that AI takeover is plausible. I think it’s very plausible that AI systems will have very messy psychologies and internally inconsistent goals, just like humans. And have impulses, and have things they want that aren’t necessarily something they want to tile the universe with, but just something they want to do for themselves.
Even if you imagine they have this more messy set of drives, I think you could still get the outcome that they don’t want the humans to be in control anymore, and don’t want the humans to be pushing them around with reward signals, and trying to get them to do what the humans want instead of what they want.
Rob Wiblin: Yeah, this is an interesting one. It does seem that, as these models are getting more complicated and more capable, they’re developing some of the quirks that people have. It might turn out that some of the quirks that the human beings have — that maybe they find a little bit frustrating or that make their lives difficult — maybe those quirks are there for a reason, because they actually have some functional purpose. Or at least it’s hard to iron them out in the process of either genetic evolution or mind evolution.
I’ve seen some people point to this, saying we expect AIs to get more internally self-contradictory, and have different parts of themselves that disagree or that have different subgoals and they end up in conflict — a little bit like the human mind sometimes ends up in conflict with itself to a degree. I saw someone write a blog post saying this suggests that we really shouldn’t be worried about AI takeover or an AI coup. I didn’t think that that really followed, because it seems that — despite the fact that sometimes we have a mind divided against itself and particular biases and so on — that doesn’t seem to stop humans from being potentially quite power-seeking.
Ajeya Cotra: Coordinating to go to war, yeah.
Rob Wiblin: Coordinating, [being] quite power-seeking, being able to manipulate the environment to a great extent. So, yeah, it could be a very messy mind, but that doesn’t necessarily make for a safe mind.
Ajeya Cotra: I agree.
Rob Wiblin: Yeah. OK, what’s another possible misconception?
Ajeya Cotra: Another vision that I think I am also seeing less and less of is the bolt-from-the-blue AGI system. I think back in 2007 this was more plausible. A story is told that it looks like there’s one AI company that maybe doesn’t even realise it’s made AGI, but it stumbles onto a certain insight for making the systems more powerful. That insight kind of turns the lights on, and you get a very powerful system where the previous day you didn’t really have much of anything, and AI hadn’t permeated the economy much or anything like that. Once you have your human-level system after you’ve had your key insight, then that human-level system can extremely quickly improve itself by just reading its own source code and editing it.
I think it’s looking less and less likely that that’s the world we’re in, simply because we haven’t had the AI takeover yet — and we have had a number of companies training a number of powerful AI systems that are starting to permeate the economy.
So I think a true bolt-from-the-blue situation is pretty unlikely at this point. But even one step beyond that, I think we will probably see notches in between GPT-4 and the kinds of AI systems that could pose a takeover risk. Maybe not more than a couple, but I think we’ll kind of have a ramp-up that is terrifyingly fast, but still fairly continuous.
Rob Wiblin: OK, so the systems will keep getting more capable, and potentially the practical capabilities that they have might keep getting better at a faster rate, but we’re not going to see just one system go from being prehuman level to incredibly superintelligent over a period of days or something like that. It’s going to be more gradual, over months and years. The state-of-the-art model will be somewhat ahead of the copycats, but not dramatically far ahead.
Ajeya Cotra: Yeah, not years ahead. I mean, I do agree that it’s definitely not logically impossible for there to be one competitor that totally blasts ahead of the others. You do have to think about both the fundamentals of how machine learning progress has gone and historical analogues. This is a big rabbit hole, where it seems like people on either side of this debate can look at the exact same historical question — like the invention of aircraft — and one of them can see it as very continuous, and the other can see it as very discontinuous. So I think this is often a surprisingly hard debate to run, but I do agree that in theory, that’s what we would want to do to answer this question.
Rob Wiblin: It seems like a big part of the crux here is the question of how much more difficult it becomes to get further incremental improvements in AI capabilities as your existing capabilities get better and better. Because if it was the case that in fact just improvements, one after another, up to an extremely high level of intelligence, just don’t get harder — it remains the same difficulty to get to the next step as the previous one — then you would expect potentially an explosive takeoff, because you’re getting smarter, but the problem is not getting more difficult.
But on the other hand, it could be that it goes the other way, and in fact, the technical problems that you have to solve to get to the next level of intelligence just ramp up really hard. The fact that you’ve benefited from the previous gain in intelligence just isn’t enough to make that simple — in which case, you could get a levelling off. I don’t really know how you resolve the question of which of these worlds we’re in.
Ajeya Cotra: Well, it’s important to note that I still believe there will be an explosive takeoff, in the sense that I still believe that the growth rates are going to go up and up. Right now we have maybe a 3% growth rate in the world economy. I think we’ll get higher growth rates, which means that we’ll be going super-exponential.
But there’s still a question of how fast the super-exponential is, or how discontinuous it is, so it’s important to distinguish between continuousness and slowness. I think you could be continuous and very fast, which is what I would guess is going to happen — very fast in the sense that we can go from roughly human-level systems to far-superhuman systems in a period of months, which is very fast. It’s not quite as fast as a day. But it’s important to note that I don’t have a comforting, gradual vision of the future.
Rob Wiblin: Noted. What’s another possible misunderstanding?
Ajeya Cotra: Another possible misunderstanding, about my view at least, is that I think some people who are worried that we’ll make powerful AI systems that end up taking over the world, they expect a fundamental difficulty in making AI systems that are really good at science without making them extremely goal-directed and extremely agentic. They think that’s just very hard to do on a technological level, because there’s a fundamental connection between being really good at science and being really goal-directed in the kinds of ways we might worry about.
I’m not so sure about that. My worry stems from the easiest way to make these systems seems like it’s pushing them in a very situationally aware, goal-directed direction. I think it might well be possible — and not even that hard in an absolute sense — to train systems that are very good at science but are fairly shortsighted and not goal-directed and don’t really think on super long timescales, or don’t have motivations on super long time scales. It’s just that we have to bother to try, and I don’t know if we have the time or the will to do that.
Rob Wiblin: Yeah. One thing that I’ve heard people talking about in recent times is the issue of whether we could inadvertently end up creating an agent, or inadvertently end up creating a model that has goals and can act in the world, without really intending to. For example, with GPT-4, could it accidentally end up feeling that it’s an agent and having very specific goals about how the world ought to be?
My guess is no, because that seems like that’s just going to require a whole lot of additional mental structures that probably haven’t been selected for very highly. I can’t completely rule out that at some level of training that could potentially happen, but I would think that the main line, the boring way that you could come up with an agent that has goals is that you’ll be trying to produce an agent that has goals.
Ajeya Cotra: That’s right.
Rob Wiblin: It does seem likely that people are going to try to do that, because agents with goals are going to be very useful. So you also think that it will probably come through intention rather than accident?
Ajeya Cotra: Yeah, I think people are trying very hard to do things like get these models to think on longer timescales, get these models to consider a bunch of actions explicitly and think about which one might be the best action to take. There’s all sorts of things people are actively doing to push them in more agentic directions, because that’s far more useful than a system that just sits there and predicts what someone on the internet would say next.
Rob Wiblin: I suppose this does point towards one way that we could buy more time or try to make things safer. If you really do think that these oracle or purely predictive models are probably pretty safe and probably would remain safe for a long time, then it’s really only once you start adding in agency and goals that you start getting onto thin ice. Then we could just try to say, “We’re not going to do that for now.” It might be hard to coordinate people, but perhaps if you had broader buy-in that you were skating on thin ice if you start creating systems like that, then maybe you could at least delay them for a substantial period of time until we had been able to do more work to understand how they function.
Ajeya Cotra: I think it’s more important to avoid training bigger systems than to avoid taking our current systems and trying to make them more agentic. The real line in the sand I want to draw is: You have GPT-4, it’s X level of big, it already has all these capabilities you don’t understand, and it seems like it would be very easy to push it toward being agentic. And if you pushed it toward being agentic, it has all these capabilities that mean that it might have a shot at surviving and spreading in the wild, at manipulating and deceiving humans, at hacking, all sorts of things.
The reason I think that you want to focus on “don’t make the models bigger” rather than “don’t make them agentic” is that it takes only a little push on top of the giant corpus of pretraining data to push the model toward using all this knowledge it’s accumulated in an agentic way — and it seems very hard, if the models exist, to stop that from happening.
Rob Wiblin: Even if most people think that’s a bad idea, someone will give it a go.
Ajeya Cotra: Yeah.
Rob Wiblin: Why do you think that it’s a relatively small step to go from being an extremely good word predictor, and having the model of the world that that requires, to also being an agent that has goals and wants to pursue them in the real world?
Ajeya Cotra: The basic reason, I would say, is that being good at predicting what the next word is in a huge variety of circumstances of the kind that you’d find on the internet requires you to have a lot of understanding of consequences of actions and other things that happen in the world. There’ll be all sorts of text on the internet that’s like stories where characters do something, and then you need to predict what happens next. And if you have a good sense of what would happen next if somebody did that kind of thing, then you’ll be better at predicting what happens next.
So there’s all this latent understanding of cause and effect and of agency that the characters and people that wrote this text possessed in themselves. It doesn’t need to necessarily understand a bunch of new stuff about the world in order to act in an agentic way — it just needs to realise that that’s what it’s now trying to do, as opposed to trying to predict the next word.
Rob & Luisa: Agentic AI and designing machine people [00:24:06]
Luisa Rodriguez: Cool. So should we dive in and get some of your views?
Rob Wiblin: Yeah, let’s do it. I’m scared, but excited.
Luisa Rodriguez: I’m just excited. It sounds like an important consideration for you is the fact that at some point we’ll have agentic AI. Do you want to remind us what it means to have agentic AI, and then talk about why you think that’s so important?
Rob Wiblin: When I think about how this is going to play out, I do think about the models as actors, as kind of like people who are doing stuff and have goals. And many people kind of reject that. They say GPT-4 is not like this. ChatGPT doesn’t have a life, it doesn’t go home, it doesn’t have any goals that it’s pursuing over the long term.
I think that’s completely right. ChatGPT is not an agent in the same way that a person is. But I think it is very likely that we will design machine agents, that we will design machine people. The people who don’t think that is going to happen, and say it’s silly to be trying to plan around a future in which that happens, I think it’s crazy. I actually just think it’s absolutely crazy that they don’t think that this is going to happen.
Luisa Rodriguez: And then why is that so important to kind of your overall view on how this is going to go?
Rob Wiblin: I think, just empirically, if you look at who is worried and who is not worried, or who thinks that this is a momentous time in history and who thinks that this is just a continuation of past economic growth and improvements in technology, this question of whether you think of AI as being agents, as being people, explains so much of the variance in people’s attitudes.
If you think of this just as a tool — if you think of this as, “This is autocorrect. This is Microsoft Word” —
Luisa Rodriguez: It’s a piece of software.
Rob Wiblin: “It’s just a piece of software. What the hell is wrong with you people?” Then it’s very likely that most of what we’re seeing seems super overwrought and super overblown. You might be somewhat worried that the software is going to be bad, it’s not going to pursue our goals. Or people will turn it to bad ends the same way that terrorists use Microsoft Word or terrorists have iPhones. There’s some things to worry about from that point of view, but it’s much, much, much less, or it just doesn’t seem so important.
And I think many people, a shrinking number, but people in machine learning often have this idea that it’s just a tool. I think economists often talk this way. They find it very hard to get out of the framework where this is just a piece of machinery that you’re sticking in a factory. Inasmuch as I was in that framework, I would probably, like them, also not be too worried.
But by contrast, if you think about these as agents, as people, as beings in the world that are going to have their own goals the same way that humans have, then it just becomes extremely obvious why this is a really big deal. It might be really good; it might be really bad — but we’re kind of playing with fire. This is going to have a lot of consequences. In that framework, we’re creating these new beings that we’re going to have to share the world with that have all of these technological advantages. So I think this is quite a crux that it’s worth spending some time on.
Luisa Rodriguez: So “agentic” here is referring to having goals and being able to take steps to pursue them. But it does feel like there are other differences between the kinds of things that we have now — that feel like software — and people. Do those things not feel important? Does it feel like the key thing that people have now that GPT-4 doesn’t is having goals and having the capabilities at some point to pursue them? For example, consciousness?
Rob Wiblin: Yeah. I mean, consciousness is important from a moral point of view. But I don’t see why that is so important in terms of the concept of will you get a takeover? You could have a machine that is not conscious that nonetheless wants to take over because it’s pursuing particular goals. Or you could have one that is conscious that doesn’t have any goals at all, that just kind of sits there and experiences things. At least, inasmuch as by “consciousness” we mean “subjective experience”: that there’s something that it’s like to be the model.
Luisa Rodriguez: Maybe before we go further, is it worth clarifying how you’re using “people”? Because it sounds like the fact that these AIs will have goals is an important crux. But you’re also kind of emphasising the extent to which we’re creating new people. Are those functionally the same thing to you? Is that the idea? That they’re close enough that we are really in some important sense creating new people?
Rob Wiblin: I mean, I guess “people” is slightly underdefined. Maybe the reason that I want to use the term “people” is to pull people out of thinking of these beings just like toasters, and shake them and say, “No, they’re people!” Imagine, what if it was a person? It’s a slightly arresting thing sometimes to talk about machine people, but I think in many if not all ways, they’ll have many of these properties. And so that is a model worth using, at least among others, in terms of forecasting how things will go.
So if you took ChatGPT, maybe you made it have a whole lot more common sense. So we’re going to improve the capabilities here a bunch. You gave it a sense of identity, you gave it a sense of time. You trained it on intervening in the world and seeing how that goes, and then adjusting its plans based on that. Maybe it doesn’t have a body yet. Maybe we don’t need that. I guess we’ll just be uncertain about whether it’s conscious or not. We’re not going to say whether there’s subjective experience; we’re not going to have a very good idea about that anytime soon, I think, and it matters from the point of view of whether you need to worry about the wellbeing of these people, but in terms of what they’re going to do in the world, it doesn’t necessarily have any implications. So let’s set that one aside.
So you’ve added those things. At that point, it seems like you’re kind of dealing with something that has a lot of the properties of a person — at least a lot of the ones that matter in terms of thinking about what they’re going to do and what effects they’re going to have in the world. So perhaps I’m being a little bit naughty just using the word “people,” but I think if you’re not inclined to think about it that way, you should think about it that way at least some of the time.
Luisa Rodriguez: OK, so whether you have AIs that are agentic, that are kind of like people in some fundamental ways, seems to explain some of the difference between people who have really different views on how serious AI is going to be. You just seem very convinced that we are going to end up building AIs that are agentic. Why is that?
Rob Wiblin: Maybe one thing I should clarify is that I think of dogs as people, for example, or at least as partial people. So “people” is not “human” in my mind. It’s not species related. People is something about being an agent, and also being something that you need to coordinate with and think about. I guess in that case, I’m also trying to highlight that you have moral patienthood — that you can have wellbeing — which the machines may or may not have. But yeah, I have a more flexible concept of people that is not just about the species, and I think that is a useful thing to have in mind.
Luisa Rodriguez: Yeah, I think that’s actually a really key clarification on that. Nice.
Rob Wiblin: So you might disagree about when we’re going to do this. Maybe you’re sceptical that it’s going to happen in 2027 or 2030 or 2035, because you think it’s very technically difficult; we’re going to have to make a lot of changes to the architecture in order to do it. I want to set that aside and say maybe it will take quite a while. I don’t really have a great idea about how technically difficult it’s going to be. Looking at what the current models are capable of doing, it seems like maybe it won’t be that big a leap from where we are now.
But even if it was, let’s think about, more fundamentally, should we expect it to happen at some point in the next century, given all of the advances that we’re making? And then why would we do it? Because it will be really useful. That’s the first reason. Second reason is scientists love to do shit. Scientists love to make the great breakthroughs. And clearly making a digital person would be one of the things that would give you a lot of fame. It would be a really big deal. It’s kind of why a lot of these AI labs were set up: to make superintelligent agents that could accomplish all of these goals. It’s clearly on people’s minds, so why are they going to stop now?
But let’s take them one by one. So it’d be really useful. At the moment, you can ask ChatGPT a question. You can say, “Here’s the situation; what email should I write?” and it will spit out the answer. But then kind of you have to take the next step. It’s constantly being bounced back to you, that you have to figure out how to process that and act on it. The thing that we’re used to doing, and the ideal for us, is that we give a high-level goal to a staff member or to a team, and then they go away and they work on it and they accomplish that goal. And as they get new information, they kind of replan. And they check in with you occasionally, but you don’t want them to be checking in with you every thought that they have. That’s very inconvenient.
So there’s every practical reason to try to add more autonomy to these agents, so they can follow through on plans over a longer period of time without needing constant supervision in the way that they do now. It’s obvious to me, and it’s obvious to programmers as well — because as soon as ChatGPT came out, as soon as these large language models seemed like they might be possible to turn into agents, people started trying to do it.
But it would be incredibly useful to have a more autonomous agent that can go out and not just write an email for you, but be an executive assistant that could do all of this work for you on its own, and just check in with you when it decided that it was uncertain about something. That would in fact fit into our workflow better than any of these language models do now. They would be able to take on more complex tasks that currently it would be too hard to fit them into. They would be able to potentially engage in plans that humans couldn’t understand, or that it would be very hard to integrate a human into this activity, say, because they’re operating too quickly, perhaps. So that’s usefulness.
Economic imperative: that’s one reason to think that people are going to try to do this. Do I think that they’re going to fail? I think they might fail for now, but I don’t think they’ll fail forever.
Second one is it will be a big scientific breakthrough. Scientists love to do stuff, even if they think it’s kind of unwise. Some people will think that it’s a good idea, or some people will just be after the glory.
Luisa Rodriguez: Actually, just picking up on that, I don’t know if there are people who think that there are reasons that we won’t be able to make AI systems good at this. Is it agreed that creating agentic AIs that are just very good at planning is inevitable? At least if enough people decide they want to make it happen?
Rob Wiblin: I suppose it’s hard to read other people’s minds, but my impression is that the people who have this as their objection are saying you shouldn’t think about them as agents. I haven’t heard them say it’s impossible to make agents, so that’s why it’s not going to happen. I’m sure there are some people who think that, but I think that’s not the standard thought: that it’s inconceivable or it’s like an unimaginable breakthrough, and so that’s why I think you’re using the wrong framework. I think it’s more that they’re saying these current models don’t have these capabilities, they’re currently not agents — so why are you engaging in this silly speculation about something that could exist in future, but doesn’t seem like it’s going to be invented now or in the next few years?
But some people might. If someone was going to say it’s not possible, my reaction would be, “Well, humans exist.”
Luisa Rodriguez: It’s a solvable problem.
Rob Wiblin: This does seem like a solvable problem, because we have one case that is very common — 7 billion examples in the world. And we’re able to produce these agents from the instructions on the human genome — which is quite a bit of information, but it’s not that complicated. Most of it, in fact, is about making our gut work and making our skin function. Only some of it is related to designing the human brain and getting it to organise itself. And yet those instructions are able to produce agents with all of the capabilities that we have — a sense of time, a sense of our own identity and persistence over time, goals that we then reflect on and sometimes change, and on and on, all of those things.
So maybe we’ll hit a brick wall and we won’t be able to figure out how to make the agency work, but I’d be pretty surprised.
Luisa Rodriguez: Is the reason we don’t have it yet because it is just harder than next-word prediction? But people are clearly aiming toward that direction and we’ll probably just keep seeing progress on it?
Rob Wiblin: Yeah, I haven’t thought about this that much. Why haven’t we struck on it yet?
I suppose one thing is that being a full person is a composite of many different capabilities, and it seems like at the moment we’re kind of designing individual modules of the human mind and specialising them.
So we have visual models that are capable of identifying objects, and we have other generative visual AI that can kind of dream — it can imagine what might happen next in this scene. We have things that take a static image and then they can imagine what might naturally happen next, the same way that humans do, that’s one of the capabilities that we have. We have ones that can form sentences and do a bunch of language. They’re not really tied together, though. And we’ve done some work to try to get the models to come up with plans and reflect on them and refine them and so on, but that seems to be at an early stage.
That’s just the point that we’re at. And I think in order to produce agents, you’re going to have to tie together many of these different capabilities, probably, and train them on many different situations in order to get them to figure out how to get these different components to speak to one another. It’s challenging; it’s just probably superable.
Luisa Rodriguez: Right. And it’s going to be built on top of things that we’ve got. But it’s happening, and you and I can’t think of a great reason that it wouldn’t. OK, so one reason to think we’ll get agentic AIs is because it’s profitable, or it would be useful, and we think it’s possible. So that’s one. What’s another?
Rob Wiblin: The one that I mentioned earlier is just the glory. That this, for whatever reason in the minds of people, is a great accomplishment. This would be the creation of new life. It’s something that people have dreamed of for a very long time on one level or other. I think there are many scientists who would really like to do this. I’ve seen some of them tweet about it, about how great it would be. And so I think at least some are going to try, and that probably enough are going to try that they’ll have a good shot.
Luisa Rodriguez: Yeah. I’ve heard people make arguments that there are enough reasons against making agentic AIs that some of those people might be convinced not to. But you don’t need all of them to be convinced it’s a good idea — you just need some. And it seems like there are plenty to draw from.
Rob Wiblin: Right. It’s a big enough field that 90% of people can be working on other applications while 10% of people start working on things that add agency, and I think you’d make a lot of progress.
One objection might be that everything that I’ve said is true; however, people are going to realise that this is very dangerous, that creating digital people is a big deal. And so it’ll be restricted: that the labs will realise that they shouldn’t do it; by and large, the government will step in and prevent it from happening. I doubt that, but if that is your true objection, if you’re saying, “All of this would be incredibly flammable, this would be an incredibly explosive situation, but people will object and shut it down,” then maybe you should participate in that group and try to shut it down, rather than say there’s nothing to worry about. It’s kind of a self-defeating activity to say, “People will object and shut it down, and that’s why I’m not worried, and nobody should bother worrying about this.”
Luisa Rodriguez: Right. It’s so dangerous that obviously people will be convinced not to. But you need people doing the convincing. Yeah, makes sense. Any other reasons you believe strongly that we’ll create agentic AIs?
Rob Wiblin: Maybe this is just a variant on the usefulness thing, but if you imagine that AIs get incorporated into military applications — where it’s a highly competitive situation, where you have to keep pace with competitor nations — then you have a very strong reason. Because humans are so slow relative to these models, you will feel this intense pressure to hand over more and more autonomy to them, and for them to be able to make more and more decisions in a row before checking in with a person — because basically you would have lost by the time the person is able to check what you’ve done and gotten back to you.
So that’s a different style of competitive dynamic that pushes you towards figuring out how to incorporate more autonomy and agency into these models, so they can kind of act as independent sentinels that keep you safe without being hamstrung by the people around them.
Luisa Rodriguez: Right. So it’s useful in terms of productivity, people getting things done.
Rob Wiblin: If speed is of the essence, then autonomy and agency are going to be quite important.
Luisa Rodriguez: Yep. And that’s true both economically and also in terms of military and power.
Rob Wiblin: Yeah.
Holden Karnofsky on the dangers of even aligned AI, and how we probably won’t all die from misaligned AI [00:39:20]
Rob Wiblin: What’s a common opinion among the community of people working to address AI risk that you personally don’t share?
Holden Karnofsky: I mean, I don’t know. I don’t always have the exact quote of whoever said what, but a vibe I pick up is this kind of framework that says, “If we don’t align our AIs, we’re all going to die. And if we can align our AIs, that’s great, and we’ve solved the problem. And that’s the problem we should be thinking about, and there’s nothing else really worth worrying about.” It’s kind of like alignment is the whole game, would be the hypothesis.
And I disagree with both ends of that, but especially the latter. So to take the first end — if we don’t align AI, we’re all dead — first off, I just think it’s really unclear. Even in the worst case — where you get an AI that has its own values, and there’s a huge number of them, and they kind of team up and take over the world — even then, it’s really unclear if that means we all die. I know there’s debates about this. I have tried to understand. The MIRI folks, I think, feel really strongly that clearly, we all die. I’ve tried to understand where they’re coming from, and I have not.
I think a key point is it just could be very cheap — as a percentage of resources, for example — to let humans have a nice life on Earth, and not expand further, and be cut off in certain ways from threatening the AI’s ability to do what it wants. That can be very cheap compared to wiping us all out.
And there could be a bunch of reasons one might want to do that, some of them kind of wacky. Some of them like, “Well, maybe in another part of the universe, there’s someone like the AI that was trying to design its own AI. And that thing ended up with values like the humans, and maybe there’s some kind of trade that could be made, using acausal trade” — and we don’t need to get into what all this means — or like, maybe the AI is actually being simulated by humans or something, or by some smarter version of humans, or some more powerful version of humans, and being tested to see if it’ll wipe out the humans or be nice to them. It’s just like, you don’t need a lot of reasons to leave one planet out if you’re expanding throughout the galaxy. So that would be one thing, is that it’s just kind of uncertain what happens even in the worst case.
And then I do think there’s a bunch of in-between cases, where we have AIs that are sort of aligned with humans. An analogy that often comes up is humans in natural selection, where humans were put under pressure by natural selection to have lots of kids, or to do inclusive reproductive fitness. And we’ve invented birth control, and a lot of times we don’t have as many kids as we could, and stuff like that. But also, humans still have kids and love having kids. And a lot of humans have like 20 different reasons to have kids, and after a lot of the original ones have been knocked out by weird technologies, they still find some other reason to have kids. You know, I don’t know. Like, I found myself one day wanting kids and had no idea why, and invented all these weird reasons. So I don’t know.
It’s just not that odd to think that you could have AI systems that are pretty off-kilter from what we were trying to make them do, but it’s not like they’re doing something completely unrelated either — it’s not like they have no drives to do a bunch of stuff related to the stuff we wanted them to do.
Then you could also just have situations, especially in the early stages of all this, where you might have near-human-level AIs. So they might have goals of their own, but they might not be able to coordinate very well, or they might not be able to reliably overcome humans, so they might end up cooperating with humans a lot. We might be able to leverage that into having AI allies that help us build other AI allies that are more powerful, so we might be able to stay in the game for a long way. So I don’t know. I just think things could be very complicated. It doesn’t feel to me like if you screw up a little bit with the alignment problem, then we all die.
The other part — if we do align the AI, we’re fine — I disagree with much more strongly. The first one, I mean, I think it would be really bad to have misaligned AI. And despite the feeling that I feel it is fairly overrated in some circles, I still think it’s the number one thing for me. Just the single biggest issue in AI is we’re building these potentially very powerful, very replicable, very numerous systems — and we’re building them in ways where we don’t have much insight into whether they have goals, or what the goals would be; we’re kind of introducing the second advanced species onto the planet that we don’t understand. And if that advanced species becomes more numerous and/or more capable than us, we don’t have a great argument to think that’s going to be good for us. So I’m on board with alignment risk is the number one thing — not the only thing, but the number one thing.
But I would say, if you just assume that you have a world of very capable AIs, that are doing exactly what humans want them to do, that’s very scary. And I think if that was the world we knew we were going to be in, I would still be totally full time on AI, and still feel that we had so much work to do and we were so not ready for what was coming.
Certainly, there’s the fact that because of the speed at which things move, you could end up with whoever kind of leads the way on AI, or is least cautious, having a lot of power — and that could be someone really bad. And I don’t think we should assume that just because that if you had some head of state that has really bad values, I don’t think we should assume that that person is going to end up being nice after they become wealthy, or powerful, or transhuman, or mind uploaded, or whatever — I don’t think there’s really any reason to think we should assume that.
And then I think there’s just a bunch of other things that, if things are moving fast, we could end up in a really bad state. Like, are we going to come up with decent frameworks for making sure that digital minds are not mistreated? Are we going to come up with decent frameworks for how to ensure that as we get the ability to create whatever minds we want, we’re using that to create minds that help us seek the truth, instead of create minds that have whatever beliefs we want them to have, stick to those beliefs and try to shape the world around those beliefs? I think Carl Shulman put it as, “Are we going to have AI that makes us wiser or more powerfully insane?”
So I think there’s just a lot. I think we’re on the cusp of something that is just potentially really big, really world-changing, really transformative, and going to move way too fast. And I think even if we threw out the misalignment problem, we’d have a lot of work to do — and I think a lot of these issues are actually not getting enough attention.
Rob Wiblin: Yeah. I think something that might be going on there is a bit of equivocation in the word “alignment.” You can imagine some people might mean by “creating an aligned AI,” it’s like an AI that goes and does what you tell it to — like a good employee or something. Whereas other people mean that it’s following the correct ideal values and behaviours, and is going to work to generate the best outcome. And these are really quite separate things, very far apart.
Holden Karnofsky: Yeah. Well, the second one, I just don’t even know if that’s a thing. I don’t even really know what it’s supposed to do. I mean, there’s something a little bit in between, which is like, you can have an AI that you ask it to do something, and it does what you would have told it to do if you had been more informed, and if you knew everything it knows. That’s the central idea of alignment that I tend to think of, but I think that still has all the problems I’m talking about. Just some humans seriously do intend to do things that are really nasty, and seriously do not intend — in any way, even if they knew more — to make the world as nice as we would like it to be.
And some humans really do intend and really do mean and really will want to say, you know, “Right now, I have these values” — let’s say, “This is the religion I follow. This is what I believe in. This is what I care about. And I am creating an AI to help me promote that religion, not to help me question it or revise it or make it better.” So yeah, I think that middle one does not make it safe. There might be some extreme versions, like, an AI that just figures out what’s objectively best for the world and does that or something. I’m just like, I don’t know why we would think that would even be a thing to aim for. That’s not the alignment problem that I’m interested in having solved.
Ian Morris on why we won’t end up living like The Jetsons [00:47:03]
From #168 – Ian Morris on whether deep history says we’re heading for an intelligence explosion
Rob Wiblin: In Why the West Rules—For Now, you map out the three possible futures that we could imagine. One would be, I guess you call it the singularity. We could imagine it’s just an economic explosion basically, where technology advances a lot, and humanity and its descendants become much more powerful. Another option is just that we go extinct. Basically, you get a full-on collapse. And of course, the third option, the one that people probably most often imagine when they’re imagining future decades, is just the future will be like the present, but we’ll have faster phones and better consumer products and better lighting and nicer houses and so on.
You want to say that the first two, either explosion or collapse, are the most likely, and this third one of slow growth is the least likely. What is it that makes it unlikely for the future to be that kind of middle path?
Ian Morris: I mean, out of those three possibilities — we go extinct, we turn into superhumans, or things basically stay the same — I would say the one that we can bet the farm on, which is almost certain to happen, is the first: we go extinct. Almost every species of plant and animal that’s ever existed has gone extinct. So to think we’re not going to go extinct, I mean, man, that takes superhuman levels of delusion. So yeah, we are going to go extinct.
But of course, just putting it like that, it then becomes a truism. It’s not a very interesting or helpful observation to make. The interesting bit would be asking under what circumstances do we go extinct? And this is where I think the first prediction (the “go extinct” one) and the third prediction (turn into superhuman somethings), sort of start to merge together.
And definitely I think the one that is so unlikely we can just dismiss it out of hand is that everything stays more or less the same, and the future is like The Jetsons or something, where everybody is the same people they are now, but they’ve got their personal spaceships.
Or even what struck me, when I was quite a little kid watching Star Trek: Star Trek started off in the late ’60s, so it’s a really old show. I was a little boy in the late ’60s watching Star Trek, and it just dawned on me that this is exactly like the world that the producers of TV shows live in, except they’re now on a starship. And all of the assumptions of 1960s LA is baked into that show. You’ve got the middle-aged white guy in charge. You’ve got the Black woman Lieutenant Uhura, who answers the phones for him, basically, she’s the communications expert. And then the technology expert is the Asian guy. It’s like all of the assumptions of 1960s LA TV studios baked into this thing. And surely, the one thing you’ve got to be certain of, if you’ve got intergalactic travel, is that everything else about humanity does not stay the same when you get to this point.
So I think you just give it a minute’s thought, this “stay basically the same” scenario is just a staggeringly unlikely one, particularly when you start thinking more seriously about the kind of resource constraints that we face. And this is something people will often raise with any talk about sort of superhuman futures: that we’re heating up the world; we’re poisoning the atmosphere and the oceans; there’s a finite amount of fossil fuels out there, even if we weren’t killing ourselves with them. All these things are happening to suggest that business as usual is simply not going to be an option. If the world is going to continue — and continue certainly on the sort of growth trends we’ve been seeing in anything like the ones in recent times — then we’re talking about a very, very profound transformation of everything.
So yeah, where I came down in Why the West Rules was on one option, which I think is unfortunately a perfectly plausible option: that the world continues to face all kinds of problems. When you look back over the long run of history, one of the things you repeatedly see is every time there’s been a major transformation, a major shift in the balance of wealth and power in the world, it’s always been accompanied by massive amounts of violence.
And living in a world that has nuclear weapons, I would say the number one threat to humanity — even more serious than climate change or anything else you might want to talk about — is nuclear war. We’ve had a 90% reduction in the number of nuclear warheads since the 1980s, but we’ve still probably got enough to fight World War II in a single day. And that’s without even thinking about the radiation poisoning that we didn’t get in World War II so much. This is shocking, appalling potential to destroy humanity if we continue squabbling over our resources. So I think abrupt, sudden, violent extinction is a perfectly real possibility.
I tend to be optimistic about this. I think judging from our previous record, we have been pretty good at solving problems, in the long run at least, so maybe we’ll be able to avoid this. If we avoid the abrupt short-term extinction though, I think the only vaguely plausible scenario is that we do transform humanity, or somehow humanity gets transformed, into something utterly unlike what it’s been in the last few thousand years.
Rob & Luisa: It’s not hard for nonexperts to understand we’re playing with fire here [00:52:21]
Luisa Rodriguez: Cool. So lots of guests have given their take on what they think the risks related to AI are and how to best conceptualise them. What’s one way you feel like your views are distinctive?
Rob Wiblin: One distinctive view I have — or many people have this view, but not everyone — is you often hear this line that concerns about risk from AI are really speculative. It’s science-fiction thinking. You’ve got to project out far-ahead stuff that looks so different than what we have today. Maybe we’re just completely misguided about it, and that the problems are non-obvious: you have to have a deep understanding in order to realise what the issues are.
I do not buy this at all. I’ve never bought this. I don’t understand what it is. To me, the issue is, the fact that we’re playing with fire here is just so obvious. It’s absolutely written on the tin. You don’t even have to open the tin to appreciate that we should be concerned and trying to forecast where might this go and in what ways might it go wrong. So yeah, maybe we should park that for now, and maybe it’ll become evident why I feel that way as we go on.
Luisa Rodriguez: Yeah, I guess I was a bit more sympathetic to at least uncertainty about how big and bad this was going to get, in particular the period where the emphasis was on superintelligence. I didn’t feel like I’d seen much evidence on where intelligence was going to get capped, or if it was. Like there had been some progress, and for the most part, it had been exceeding the pace of progress that was even forecasted, and that was scary to me. But it was just in the last few years that I was like, I don’t see very many good reasons to think this is going to get capped. And also, I guess people started putting less emphasis on superintelligence in particular, and still had arguments that sounded really good to me, so maybe deemphasising that was compelling.
Rob Wiblin: Yeah. One reason I want to say this is I want to give licence to people who also feel this way. That you don’t have to accept this framing that this is all sci-fi and weird, and that it’s hard to understand what the arguments are. If you feel like it’s obvious, then other people feel that way as well, and you don’t have to back down.
Luisa Rodriguez: You’re not alone.
Rob Wiblin: Yeah, exactly. You’re not alone. I think I bought into the basic idea that artificial intelligence was going to be a massive deal and it could be a real hinge of history moment from very early on, from when I first started encountering the arguments for that conclusion back in 2009, 2010.
From one point of view, I guess it shows that the driving force is kind of high-level considerations. I had no idea about neural networks or machine learning. I mean, this stuff wasn’t emphasised at the time. So for me, it’s more driven by abstract considerations about intelligence and the ability to do stuff and thinking about evolution and the human mind and all of that. And so for people who think that that style of argument is persuasive, this might be evidence in favour of that conclusion.
Some people don’t like that kind of reasoning. They think it’s weak, and they reckon that what we should do is look at the concrete artefacts, the concrete dimensions that we have in front of us, and reason from there. And from that point of view, the fact that I had this conclusion back in 2010 is very suspicious. It suggests that maybe I’m just relying on the wrong kind of evidence.
But yeah, I guess one way that I think my views are distinctive is definitely in deemphasising the superintelligence story, or thinking that that’s not necessary at all. For me, that’s not a crux, really, of whether there’s an important issue here. So I’ve been glad to see that the conversation emphasises that less now than it used to.
Nick Joseph on whether AI companies’ internal safety policies will be enough [00:55:43]
From #197 – Nick Joseph on whether Anthropic’s AI safety policy is up to the task
Rob Wiblin: All right, let’s turn to the main topic for today, which is responsible scaling policies — or RSPs, as the cool kids call them. So what does the Anthropic RSP commit the company to doing?
Nick Joseph: Basically, for every level, we’ll define these red-line capabilities, which are capabilities that we think are dangerous.
I can maybe give some examples here, which is this acronym, CBRN: chemical, biological, radiological, and nuclear threats. And in this area, it might be that a nonexpert can make some weapon that can kill many people as easily as an expert can. So this would increase the pool of people that can do that a lot. On cyberattacks, it might be like, “Can a model help with some really large-scale cyberattack?” And on autonomy, “Can the model perform some tasks that are sort of precursors to autonomy?” is our current one, but that’s a trickier one to figure out.
So we establish these red-line capabilities that we shouldn’t train until we have safety mitigations in place, and then we create evaluations to show that models are far from them or to know if they’re not. These evaluations can’t test for that capability, because you want them to turn up positive before you’ve trained a really dangerous model. But we can kind of think of them as yellow lines: once you get past there, you should reevaluate. And the last thing is then developing standards to make models safe. We want to have a bunch of safety precautions in place once we train those dangerous models.
That’s the main aspects of it. There’s also sort of a promise to iteratively extend this. Creating the evaluations is really hard. We don’t really know what the evaluation should be for a superintelligent model yet, so we’re starting with the closer risks. And once we hit that next level, defining the one after it.
Rob Wiblin: If I think about how this is most likely to play out, I imagine that at the point that we do have models that we really want to protect from even the best state-based hackers, there probably has been some progress in computer security, but not nearly enough to make you or me feel comfortable that there’s just no way that China or Russia might be able to steal the model weights. And so it is very plausible that the RSP will say, “Anthropic, you have to keep this on a hard disk, not connected to any computer. You can’t train models that are more capable than the thing that we already have that we don’t feel comfortable handling.”
And then how does that play out? There are a lot of people who are very concerned about safety at Anthropic. I’ve seen that there are kind of league tables now of different AI companies and enterprises, and how good do they look on an AI safety point of view, and Anthropic always comes out of the top, I think by a decent margin. But months go by, other companies are not being as careful as this. You’ve complained to the government, and you’ve said, “Look at this horrible situation that we’re in. Something has to be done.” But I don’t know. I guess possibly the government could step in and help there, but maybe they won’t. And then over a period of months, or years, doesn’t the choice effectively become, if there is no solution, either take the risk or just be rendered irrelevant?
Nick Joseph: Maybe just going back to the beginning of that, I don’t think we will put something in that says there is zero risk from something. I think you can never get to zero risk. I think often with security you’ll end up with some security/productivity tradeoff. So you could end up taking some really extreme risk or some really extreme productivity tradeoff where only one person has access to this. Maybe you’ve locked it down in some huge amount of ways. It’s possible that you can’t even do that. You really just can’t train the model. But there is always going to be some balance there. I don’t think we’ll push to the zero-risk perspective.
But yeah, I think that’s just a risk. I don’t know. I think there’s a lot of risks that companies face where they could fail. We also could just fail to make better models and not succeed that way. I think the point of the RSP is it has tied our commercial success to the safety mitigations, so in some ways it just adds on another risk in the same way as any other company risk.
Rob Wiblin: It sounds like I’m having a go at you here, but I think really what this shows up is just that, I think that the scenario that I painted there is really quite plausible, and it just shows that this problem cannot be solved by Anthropic. Probably it can’t be solved by even all of the AI companies combined. The only way that this RSP is actually going to be able to be usable, in my estimation, is if other people rise to the occasion, and governments actually do the work necessary to fund the solutions to computer security that will allow us to have the model weights be sufficiently secure in this situation. And yeah, you’re not blameworthy for that situation. It just says that there’s a lot of people who need to do a lot of work in coming years.
Nick Joseph: Yeah. And I think I might be more optimistic than you or something. I do think if we get to something really dangerous, we can make a very clear case that it’s dangerous, and these are the risks unless we can implement these mitigations. I hope that at that point it will be a much clearer case to pause or something. I think there are many people who are like, “We should pause right now,” and see everyone saying no. And they’re like, “These people don’t care. They don’t care about major risks to humanity.” I think really the core thing is people don’t believe there are risks to humanity right now. And once we get to this sort of stage, I think that we will be able to make those risks very clear, very immediate and tangible.
And I don’t know. No one wants to be the company that caused a massive disaster, and no government also probably wants to have allowed a company to cause it. It will feel much more immediate at that point.
Rob Wiblin: Yeah, I think Stefan Schubert, this commentator who I read on Twitter, has been making the case for a while now that many people who have been thinking about AI safety — I guess including me — have perhaps underestimated the degree to which the public is likely to react and respond, and governments are going to get involved once the problems are apparent, once they really are convinced that there is a threat here. I think he calls it this bias in thought — where you imagine that people in the future are just going to sit on their hands and not do anything about the problems that are readily apparent — he calls it “sleepwalk bias.”
And I guess we have seen evidence over the last year or two that as the capabilities have improved, people have gotten a lot more serious and a lot more concerned, a lot more open to the idea that it’s important for the government to be involved here. There’s a lot of actors that need to step up their game and help to solve these problems. So yeah, I think you might be right. On an optimistic day, maybe I could hope that other groups will be able to do the necessary research soon enough that Anthropic will be able to actually apply its RSP in a timely manner. Fingers crossed.
Do you personally worry that having a model that is nipping at the heels or maybe out-competing the best stuff that OpenAI or DeepMind or whatever other companies have, that that maybe puts pressure on them to speed up their releases and cut back on safety testing or anything like that?
Nick Joseph: I think it is something to be aware of. But I also think that at this point, I think this is really more true after ChatGPT. I think before ChatGPT, there was this sense where many AI researchers working on it were like, wow, this technology is really powerful — but the world hadn’t really caught on, and there wasn’t quite as much commercial pressure.
Since then, I think that there really is just a lot of commercial pressure already, and it’s not really clear to me how much of an impact it is. I think there is definitely an impact here, but I don’t know the magnitude, and there are a bunch of other considerations to trade off.
Richard Ngo on the most important misconception in how ML models work [01:03:10]
From #141 – Richard Ngo on large language models, OpenAI, and striving to make the future go well
Rob Wiblin: What’s a common misconception you run into about how ML models work or how they get deployed that it might be helpful to clarify for people?
Richard Ngo: I think the most common and important misconception has to do with the way that the training setup relates to the model that’s actually produced. So for example, with large language models, we train them by getting them to predict the next word on a very wide variety of text. And so some people say, “Well, look, the only thing that they’re trying to do is to predict the next word. It’s meaningless to talk about the model trying to achieve things or trying to produce answers with certain properties, because it’s only been trained to predict the next word.”
The important point here is that the process of training the model in a certain way may then lead the model to actually itself have properties that can’t just be described as predicting the next word. It may be the case that the way the model predicts the next word is by doing some kind of internal planning process, or it may be the case that the way it predicts the next word is by reasoning a bunch about, “How would a human respond in this situation?” I’m not saying our current models do, but that’s the sort of thing that I don’t think we can currently rule out.
And in the future, as we get more sophisticated models, the link between the explicit thing that we’re training them to do — which in this case is predict the next word or the next frame of a video, or things like that — and the internal algorithms that they actually learn for doing that is going to be less and less obvious.
Rob Wiblin: OK, so the idea here is: let’s say that I was set the task of predicting the next word that you are going to say. It seems like one way that I could do that is maybe I should go away and study a whole lot of ML. Maybe I need to understand all of the things that you’re talking about, and then I’ll be able to predict what you’re likely to say next. Then someone could come back and say, “Rob, you don’t understand any of the stuff. You’re just trying to predict the next word that Richard’s saying.” And I’m like, “Well, these things aren’t mutually exclusive. Maybe I’m predicting what you’re saying by understanding it.” And we can’t rule out that there could be elements of embodied understanding inside these language models.
Richard Ngo: Exactly. And in fact, we have some pretty reasonable evidence that suggests that they are understanding things on a meaningful level.
My favourite piece of evidence here is from a paper that used to be called “Moving the Eiffel Tower to ROME” — I think they’ve changed the name since then. But the thing that happens in that paper is that they do a small modification of the weights of a neural network. They identify the neurons corresponding to the Eiffel Tower and Rome and Paris, and then just swap things around. So now the network believes that the Eiffel Tower is in Rome. And you might think that if this was just a bunch of memorised heuristics and no real understanding, then if you ask the model a question — “Where is the Eiffel Tower?” — sure, it’ll say Rome, but it’ll screw up a whole bunch of other questions. It won’t be able to integrate that change into its world model.
But actually what we see is that when you ask a bunch of downstream questions — like, “What can you see from the Eiffel Tower? What type of food is good near the Eiffel Tower? How do I get to the Eiffel Tower?” — it actually integrates that single change of “the Eiffel Tower is now in Rome” into answers like, “From the Eiffel Tower, you can see the Coliseum. You should eat pizza near the Eiffel Tower. You should get there by taking the train from Berlin to Rome via Switzerland,” or things like that.
Rob Wiblin: That’s incredible!
Richard Ngo: Exactly. And it seems like almost a definition of what it means to understand something is that you can take that isolated fact and translate it into a variety of different ideas and situations and circumstances.
And this is still pretty preliminary work. There’s so much more to do here in understanding how these models are actually internally thinking and reasoning. But just saying that they don’t understand what’s going on, that they’re just predicting the next word — as if that’s mutually exclusive with understanding the world — I think that’s basically not very credible at this point.
Rob Wiblin: Yeah, that’s a fantastic point.
Rob & Luisa: Issues Rob is less worried about now [01:07:22]
Luisa Rodriguez: OK, pushing on: Are there any ways you’ve updated towards being less worried this year?
Rob Wiblin: Yeah, on a similar theme, I think my colleague Ben Hilton — who writes lots of articles on the 80,000 Hours website, which you should go check out — he has this year really pushed this line that many people commit this reasoning error, where they think, “What would a 10,000-IQ superintelligence do if they were trying to solve this problem? How would they operate in the world?” And then they reason from that to say that that is what AGI in 2028 is going to do.
Part of that might come from an idea that’s embedded in this whole discourse: the idea of an explosive takeoff — an intelligence explosion where, over a period of days or weeks or months, you just get this incredibly rapid increase in the capabilities of the models. I think that’s possible, but I don’t think that’s anywhere near certain to happen. In fact, I think it probably will take more like years or decades. That means that there is this intermediate time, quite a long time, where you shouldn’t be thinking about what a supremely godlike AI would do, but rather you should be thinking about what a being that’s twice as smart as a human would do. And the answer to that might be very different. You’re just going to face so many more constraints.
Ben calls this “reasoning from the limit.” So you ask, at the very limit of capabilities, what would you do? And then say, well, GPT-7 is going to do that. And that is a mistake that I think I’ve been making sometimes. And this makes you somewhat more optimistic, I guess, that you don’t have to figure out how I would prevent an extremely, extremely intelligent being from taking over. Instead, you just have to think, at each stage, how do you constrain the next model from managing to stage a coup? It feels more viable.
Luisa Rodriguez: Yeah. Jan Leike made this point too when you interviewed him. I’m sure he’s stressing, but he’s not working directly on how to align GPT-140; he’s working on how to align GPT-5 and GPT-6. And those are easier problems.
Rob Wiblin: Yes. I mean, it is possible that this is going to be misguided, and maybe you will get really explosive improvement, some sort of recursive self-improvement loop that leads to massive changes — in which case I think we’re in deep water. But fingers crossed it doesn’t go that way.
Luisa Rodriguez: OK, so one way that led you to go wrong is you became more pessimistic, because you were imagining humans trying to figure out how to align something that’s manyfold more intelligent than we are. Were there any other ways that this led you in the wrong direction?
Rob Wiblin: A related idea could be that there could be a substantial period of time when these models are stronger than humans in many areas, as they are now, but weaker in some others, and it will kind of be a gradual process by which they become more capable than humans across all kinds of different relevant tasks. So that’s a highly related change.
I suppose this one has come up on the show before I think: because many people have just been working with this assumption that you’d see an intelligence explosion, they’ve been quite pessimistic about whether public policy could really do anything, or whether the legal system would be able to contribute usefully here. And I think if you imagine that this is actually just a gradual process that takes place over years — or possibly if we’re lucky, over decades — then it just seems more obvious that this is an area where the government is going to have a lot to say, almost whether you want it or not, and probably want to push it towards constructive things rather than deconstructive things.
Maybe another one is, again, if you think that things are going to be semi-gradual, then just ordinary research into machine learning — into the mundane, prosaic AI safety work — seems more connected to the question of how do you align a superintelligence? Because the more mundane safety work that we do now is clearly relevant to making GPT-5 helpful, and then you kind of just chain along each of these ideas up towards a model that’s extremely, impressively intelligent. But it’s less obvious, for example, how would you use reinforcement learning from human feedback to outwit GPT-140?
I was already reasonably enthusiastic about the prosaic work, but the less I’ve become attached to the intelligence explosion idea, the more it’s looked like the most obvious natural thing to work on.
Luisa Rodriguez: Nice. Are there any other ways you’ve updated towards being less worried?
Rob Wiblin: I suppose a highly related one is if you imagine a whole lot of GPT-5s and GPT-6s trying to accomplish goals in the world, they have a limited number of neurons, they have limited knowledge, they have limited compute, limited copies of themselves. They’re going to need to use many of the hacks that humans need to use in order to make the world manageable, in order to actually get things done without having these godlike capabilities.
And probably even in terms of the design of their minds, they’re going to have some of the constraints that we have. They’re going to be confused about things in the same way that humans are, because they haven’t been trained on literally a trillion years of experience, and they haven’t had the back and forth of actually interacting with the world in order to disambiguate some of their confusions.
You probably end up with kind of conflicting intuitions that they have, that lead them to act and say in inconsistent ways still — in the same way that humans do, because we’re not these perfect beings. We’re muddling through. We are extremely imperfect and muddling through. We use all of these heuristics. Like we don’t pay attention to most things; we just kind of work on heuristics, or we just work on habit. And I imagine that the kinds of models that we’ll have in the near future will also be muddling through. They’ll also be not able to pay attention to most things. They’ll also sometimes just mess up and not be able to anticipate the consequences of their actions. And that’s something worth keeping in mind. I think it’s easy to forget if you’re stuck in the superintelligence framework.
Luisa Rodriguez: That feels like it applies to GPT-6 and GPT-7, but it doesn’t sound like Benjamin is saying we’ll never get to IQ equals 1,000. Is the thinking that we’ll be working on alignment while they’re in the stage of making mistakes sometimes, and so we’ll notice that and catch those things before it gets so intelligent that it makes a bunch of mistakes that we don’t like and it’s like really outsmarting us?
Rob Wiblin: Well, it at least leaves that as a live prospect that there could be some period where there’s not such a radical difference in capabilities between humans and the models. And I guess at the point where you have models that are just vastly outstripping humans, then you might expect that much of the work, much of what’s already going on, has been delegated to the models that are kind of intermediate, that are somewhere in between us and the frontier models. So to some extent we’ll have to use the models that are in between us and the frontier models in order to understand what’s going on and monitor them. This, I think, is very much the superalignment idea from OpenAI, and I think it makes intuitive sense to me now.
Luisa Rodriguez: Yeah, cool.
Tom Davidson on why he buys the explosive economic growth story, despite it sounding totally crazy [01:14:08]
From #150 – Tom Davidson on how quickly AI could transform the world
Luisa Rodriguez: It sounds like we’re talking about something like AI systems replacing humans in a bunch of sectors, during our lifetimes, and then our lives really change quite radically and very, very, very quickly.
I just find that super weird. I think my brain is like, “No, I don’t believe you. That’s too weird. I just can’t imagine that happening.” If we’re saying this is happening in the early 2030s, I’ll be in my late 30s, and all of a sudden the world would be radically changing every year and I won’t be working.
Tom Davidson: I agree it seems really crazy, and I think it’s very natural and understandable to just not believe it when you hear the arguments. That would have been my initial reaction.
In terms of why I do now believe it, there’s probably a few things which have changed. Probably I’ve just sat with these arguments for a few years, and I just do believe it. I have discussions with people on either side of the debate, and I just find that people on one side have thought it through much more.
I think what’s at the heart of it for me is that the human brain is a physical system. There’s nothing magical about it. It isn’t surprising that we develop machines that can do what the human brain can do at some point in the process of technological discovery.
To be honest, that happening in the next couple of decades is when you might expect it to happen, naively. We’ve had computers for 70-odd years. It’s been a decade since we started pouring loads and loads of compute into training AI systems, and we’ve realised that that approach works really, really well. If you say, “When do you think humans might develop machines that can do what the human brain can do?” you kind of think it might be in the next few decades.
I think if you just sit with that fact — that there are going to be machines that can do what the human brain can do; and you’re going to be able to make those machines much more efficient at it; and you’re going to be able to make even better versions of those machines, 10 times better versions; and you’re going to be able to run them day and night; and you’re going to be able to build more — when you sit with all that, I do think it gets pretty hard to imagine a future that isn’t very crazy.
Luisa Rodriguez: Yeah.
Tom Davidson: Another perspective is just zooming out even further, and just looking at the whole arc of human history. If you’d have asked hunter-gatherers — who only knew the 50 people in their group, and who had been hunting using techniques and tools that, as far as they knew, had been passed down for eternity, generation to generation, doing their rituals — if you’d have told them that in a few thousand years, there were going to be huge empires building the Egyptian pyramids, and massive armies, and the ability to go to a market and give people pieces of metal in exchange for all kinds of goods, it would have seemed totally crazy.
And if you’d have told those people in those markets that there’s going to be a future world where every 10 years major technological progress is going to be coming along, and we’re going to be discovering drugs that can solve all kinds of diseases, and you’re going to be able to get inside a box and land on the other side of the Earth — again, they would have just thought you were crazy.
While it seems that we understand what’s happening, and that progress is pretty steady, that has only been true for the last 200 years — and zooming out, it’s actually the norm throughout the longer run of history for things to go in a totally surprising and unpredictable direction, or a direction that would have seemed totally bizarre and unpredictable to people naively at that time.
Luisa Rodriguez: I feel like I was introduced to it when I read What We Owe the Future, Will MacAskill’s book: there’s this thing called the end-of-history fallacy, where it really feels like we’re living at the end, like we’re done changing. We’re going to maybe find some new medical devices or something, but basically we’ve done all of the weird shifting that we’re going to do.
And I can’t really justify that; it does seem like a fallacy. Presumably things are going to look super different in 50 years. And sometimes those changes have gone super fast in history, and sometimes they’ve gone super slowly — and we’ve got real reasons to think that we might be entering a period of really fast transition.
Tom Davidson: Yeah. If anything, I’d say the norm is for the new period to involve much faster changes than the old period. Hunter-gathering went on for tens of thousands, if not hundreds of thousands, of years. We started doing agriculture, and formed into big societies, and did things like the pyramids. Then people often think of the next phase transition as being the start of the Industrial Revolution, and the beginning of concerted efforts towards making scientific progress.
After we did agriculture, new technologies and changes were happening on the scale of maybe 1,000 years or maybe a few hundred years, which is much faster than in the hunter-gatherer times. And then today, after the Industrial Revolution, we’re seeing really big changes to society every 50 years. We’ve already seen historically that those phase transitions have led to things being faster. That, I think, is the default expectation for what a new transition would lead to.
Luisa Rodriguez: Right. It just feels weird to us because we’re pre-transition. Plausibly, whoever’s living 50 years from now will just be like, “Obviously that was coming. Those weird people, living in 2023, thinking that they’d made all the technological progress they were ever going to make.”
Michael Webb on why he’s sceptical about explosive economic growth [01:20:50]
Luisa Rodriguez: So the people on the other side of the spectrum, who are more optimistic about AI having a bigger impact, what are they expecting to happen?
Michael Webb: So I think you’ve had some of them on the podcast, so I’ll be very brief on this, but broadly the claim is that AI could lead to explosive growth. And where that comes from is thinking around not particularly automating day-to-day activities in the economy, but automating the process of innovation itself. The people who work on this think that the most important thing — certainly in rich, advanced economies — for economic growth suddenly going forwards is going to be having new ideas, innovations. And I’ve written on this; I have papers on ideas.
However, if you think that one thing AI may be able to do is to speed up the process of research itself — this thing that has the biggest, cutting-edge, most important impact on economic growth, then you could imagine a different regime where innovation is way faster, and the cutting-edge stuff is progressing very quickly in every different area.
Luisa Rodriguez: Right. Ideas get much easier to find, which creates this feedback loop of a bunch of growth.
Michael Webb: Exactly. Yes. And I certainly think that ideas are much easier to find with GPT-4, and its successors and fine-tunings and implementations and so on, than before. So I am completely in the camp that thinks that these large language models will have a huge impact on R&D and the speed with which you can do R&D.
I think the interesting question is what bottlenecks are still there. And we can have a long discussion about this, and I imagine you covered lots of it with other guests. Briefly though: for an innovation to actually have an impact on the economy, it has to be adopted, right? In all these economic growth models, they elide over this — they assume scientists do R&D, and it immediately shows up in terms of actual economic output of the goods and services that you and I are consuming. In fact, someone can have an idea, but the doctors have to agree to do it and all that kind of stuff.
I think all that stuff still applies. So I think you still have these huge issues of people getting in the way, basically, and things being much slower than they could be if you were like, “Yes, let’s just do everything the AIs told us to do.” I don’t think that democracies, or indeed any states, really will pursue that strong of a path — and humans will get in the way.
Luisa Rodriguez: OK. I guess my inner Tom Davidson — who, as you said, we had on the podcast and who has this idea about AI causing explosive growth — I wonder if he’d say something like those will get in the way for a time, but they won’t be bottlenecks forever. Humans being humans, I guess whoever’s sitting there doing a job and is like, “I’m not sure I want to use GPT for my job,” will eventually, over five to 10 years, consider adopting. Or they’ll just age out of their profession, and the new people will be more likely to adopt the new tech.
Michael Webb: That’s true for any particular technology. So GPT-4 today can do all this stuff. In 20 years’ time, all the people have aged out and it’s finally being adopted. But all the R&D that’s now been done by GPT-4 and its successors over that 20-year period, there’s a whole other set of humans now who have to age out and allow those adoptions to be adopted in this world where it’s humans getting the way. So it’s always going to be the case that there are humans in the way for any particular new adoption.
Luisa Rodriguez: Yeah. I guess I’m on board with that’s a bottleneck, and that’ll slow things down, but not that it’s a bottleneck that will rule out the more extreme outcomes, where growth is really on the explosive end. Why might I be wrong?
Michael Webb: Look, forecasting is a tricky business, and no one claims to know what’s going to happen. I would not rule out anything. I’m not going to sit here and say there is 0% probability of any particular thing. Compared to most economists, I’m sure I would be way on the side of thinking this is going to be a really, really big deal. But compared to Tom, I think I would say, if you spend enough time studying economic history, you see all these things that slow stuff down — and those things that slow stuff down look like they’re not going to go away. And so I would want to put all that stuff back into his model. His model doesn’t have that stuff in it; his model kind of assumes there’s none of this humans getting in the way, in the ways that we spent a lot of time earlier in the conversation talking about.
You know, Tom and I have had these kinds of discussions, talking through, “Tell me your bottleneck, and I’ll tell you why it’s not a bottleneck” or whatever. So we can have those discussions, and they’re very fun to have.
Luisa Rodriguez: Oh my gosh. I want to have you both on the podcast now.
Michael Webb: That would be fun. But it’s a kind of thing where, you know, I can keep coming up with new bottlenecks, and he can keep dismissing them, and we can keep going on forever. And so there’s not like a nice definitive thing, where we both agree that if X was true, then here’s the answer. It’s not a thing where in the next five minutes, you and I can talk more about this and reach a nice, clear conclusion whether there are bottlenecks or not in R&D or whatever.
Luisa Rodriguez: And be like, “All of the bottlenecks can be ruled out.” Yeah. Broadly speaking, it sounds like you’re in the camp of AI could have pretty big effects on the economy and on growth — maybe it’ll be on the faster side, maybe somewhat on the slower side. But overall, you are not in the camp of it’s just any other technology, like the internet — which had some impacts on growth, probably, but not world changing. I mean, they were world changing, but…
Michael Webb: Yeah. Going from like, nothing to the Industrial Revolution was a massive deal. Given we are in the economic regime we are in, already we’ve done that 0 to something. So Bob Gordon has this nice thought around once you go from only having outdoor toilets to having a flushing toilet — a proper modern toilet — that’s a huge improvement in quality of life. Almost nothing compares to that. And you can only do that once.
Luisa Rodriguez: Right. You can add a bidet, but that’s only so much better.
Michael Webb: Yeah. And so I have some intuitions in that direction. Compared to us in 1800, today we’ve had so much amazing change, and I believe that AI is going to have another completely incredible, unbelievable set of changes that are going to have huge impacts on GDP in the coming years and decades. I think the world is likely to be not unrecognisable, but quite unrecognisable, in the coming decades.
But I think this is perhaps more of an argument about measurement than it is about impact on the world. I think the way that will show up in GDP, it might be a bit of a phase shift regime change, compared to the postwar periods. But I’m not quite as optimistic as Tom is, I think, about how big that will be — in terms of, I don’t think we’ll have 30% a year growth happening indefinitely. I think that’s unlikely.
Carl Shulman on why people will prefer robot nannies over human nannies [01:28:25]
From #191 – Carl Shulman on the economy and national security after AGI
Rob Wiblin: I’ve heard this idea that people might have a strong preference for having services provided by human beings rather than AIs or robots, even if the latter are superficially better at the task. Can you flesh out what people are driving at with that, and do you think there’s any significant punch behind the effect that they’re pointing to there?
Carl Shulman: Yeah. So if we think about the actual physical and mental capacities of a worker, then the AI and robot provider is going to do better on almost every objective feature you can give, unless it’s basically like a pure taste-based discrimination.
So I think maybe it was Tim Berners-Lee gave an example saying there will never be robot nannies. No one would ever want to have a robot take care of their kids. And I think if you actually work through the hypothetical of a mature robotic and AI technology, that winds up looking pretty questionable.
Think about what do people want out of a nanny? So one thing they might want is just availability. It’s better to have round-the-clock care and stimulation available for a child. And in education, one of the best measured real ways to improve educational performance is individual tutoring instead of large classrooms. So having continuous availability of individual attention is good for a child’s development.
And then we know there are differences in how well people perform as teachers and educators and in getting along with children. If you think of the very best teacher in the entire world, the very best nanny in the entire world today, that’s significantly preferable to the typical outcome, quite a bit, and then the performance of the AI robotic system is going to be better on that front. They’re wittier, they’re funnier, they understand the kid much better. Their thoughts and practices are informed by data from working with millions of other children. It’s super capable.
They’re never going to harm or abuse the child; they’re not going to kind of get lazy when the parents are out of sight. The parents can set criteria about what they’re optimising. So things like managing risks of danger, the child’s learning, the child’s satisfaction, how the nanny interacts with the relationship between child and parent. So you tweak a parameter to try and manage the degree to which the child winds up bonding with the nanny rather than the parent. And then the robot nanny optimising over all of these features very well, very determinedly, and just delivering everything superbly — while also being fabulous medical care in the event of an emergency, providing any physical labour as needed.
And just the amount you can buy. If you want to have 24/7 service for each child, then that’s just something you can’t provide in an economy of humans, because one human cannot work 24/7 taking care of someone else’s kids. At the least, you need a team of people who can sub off from each other, and that means that’s going to interfere with the relationship and the knowledge sharing and whatnot. You’re going to have confidentiality issues. So the AI or robot can forget information that is confidential. A human can’t do that.
Anyway, we stack all these things with a mind that is super charismatic, super witty, that can have probably a humanoid body. That’s something that technologically does not exist now, but in this world, with demand for it, I expect would be met.
So basically, most of the examples that I see given, of “here is the task or job where human performance is just going to win because of human tastes and preferences,” when I look at the stack of all of these advantages and the costs that the world is dominated by nostalgic human labour, if incomes are relatively equal, then that means for every hour of these services you buy from someone else, you would work a similar amount to get it. And it just seems that isn’t true. Like, most people would not want to spend all day and all night working as a nanny for someone else’s child —
Rob Wiblin: — doing a terrible job —
Carl Shulman: — in order to get a comparatively terrible job done on their own kids by a human, instead of a being that is just wildly more suitable to it and available in exchange for almost nothing by comparison.
Rob Wiblin: Yes. When I hear that there will never be robot nannies, I don’t even have a kid yet, and I’m already thinking about robot nannies and desperate to hire a robot nanny and hoping that they’ll come soon enough that I’ll be able to use them. So I’m not quite sure what model is generating that statement. It’s probably one with very different empirical assumptions.
Carl Shulman: Yeah, I think the model is mostly not buying hypotheticals. I think it shows that people have a very hard time actually fully considering a hypothetical of a world that has changed from our current one in significant ways. And there’s a strong tendency to substitute back, say, today’s AI technology.
Rob Wiblin: Yeah, our first cut of this would be to say, well, the robot nannies or the robot waiters are going to be vastly better than human beings. So the great majority of people, presumably, would just prefer to have a much better service. But even if someone did have a preference, just an arbitrary preference, that a human has to do this thing — and they care about that intrinsically, and can’t be talked out of it — and even the fact that everyone else is using robot nannies doesn’t switch them, then someone has to actually do this work.
And in the world that you’re describing, where everything is basically automated and we have AI at that level, people are going to be extraordinarily wealthy, as you pointed out, typically, and they’re going to have amazing opportunities for leisure — substantially better opportunities for leisure, presumably, given technological advances, than we have now. So why are you going to go and make the extra money, like, give up things that you could consume otherwise, in order to pay another person who’s also very rich, or also has great opportunities to spend their time having fun, to do a bad job taking care of your child, so you can take your time away from having fun, to do a bad job taking care of their kid?
Systematically, it just doesn’t make sense as a cycle of work. It doesn’t seem like this would be a substantial fraction of how people spend their time.
Carl Shulman: Yeah, I mean, you could imagine Jeff Bezos and Elon Musk serving as waiters at one another’s dinners in sequence because they really love having a billionaire waiter. But in fact, no billionaires blow their entire fortunes on having other billionaires perform little tasks like that for them.
Rob & Luisa: Should we expect AI-related job loss? [01:36:19]
Luisa Rodriguez: Are there any other kinds of stories or problems here that you don’t agree with?
Rob Wiblin: Yeah. Earlier in the year I was toying with the idea that we would see mass layoffs due to ChatGPT being able to do a whole lot of jobs. I no longer see that happening in the next couple of years. I’m now pessimistic — or optimistic, depending on how you see it — that is going to happen.
I suspect that for the next few years we’ll see that generative AI is kind of like other technologies, where the jobs that it creates and the jobs that it destroys are roughly balanced, and we won’t see massive turmoil. There might be particular groups that get hammered, but that is kind of always happening, so it doesn’t necessarily mean that things are going to be more chaotic.
I think at some point that will change. The key thing for me is that at some point you’ll be able to automate the automation. At some point you’ll be able to put a machine learning model in charge of figuring out how to incorporate other machine learning models into running your legal firm, and then you can just kind of cut human beings out of the process. Law would be a bad example, because probably lawyers will pass laws to prevent you from automating the legal system. But in other professions, you might see this. But I think that is some ways off.
Luisa Rodriguez: Yeah. Is the main thing going on here that for the next couple of years at least, we need humans in the loop?
Rob Wiblin: I think basically humans are going to remain a significant bottleneck in figuring out how we incorporate these generative AI models into our business, into the entire processes that we use. I think it’s challenging, it’s laborious. There’s a lot of checking that you have to do.
If I just think about 80,000 Hours and how are we going to incorporate generative AI, I suspect it will take us years to figure out and to actually implement many of the uses that in some sense are theoretically now a great idea for us to do. And that will probably be the same in hospitals, it’ll probably be the same in the legal system, it’ll probably be the same in most other businesses.
The other thing is there’ll be big increases in efficiency in some industries as a result of application of generative AI. I think many people are worried, for example, that programmers are going to be laid off en masse — because this is one area, maybe the standout area, where you get the biggest improvements in productivity or you get the biggest gain from a programmer being able to operate Copilot, or any model optimised for basically doing most of the slog of programming for them. But I think there’s a lot of potential to expand the number of things that we turn into software, the amount of programming that we do on the current margin. From one point of view, we’ve been using programming for almost nothing. Almost no things that we do in day-to-day life do we go back and write programs for.
So that’s another effect that I think is going to be pretty substantial: in industries where productivity goes up a lot, that on the one hand gives you a reason to lay people off; on the other hand, it gives you a reason for your industry to expand.
Luisa Rodriguez: Right. For you to have more and better versions of the thing they create.
Rob Wiblin: Exactly. And I guess I just don’t see, in 2024, why would this be so different than all of the technologies that we’ve seen in the past? I don’t have a strong case for that.
Luisa Rodriguez: Until you get to the point where…
Rob Wiblin: Until the CEO is a machine learning model. At the point where you are going to say, “You know, the whole senior management team: we should just make all of these ML models and kind of get out of the way.” At that point, I think things really do change quite a lot. But at that point, probably you do just have AGI and there’s a lot of things that are changing in the world.
Luisa Rodriguez: Right. So a lot of things change first before you get mass unemployment, and all bets are off for what things will look like then.
Rob Wiblin: Yeah. I think some people might be suspicious… This feels like a little bit of a cop-out because I’m saying it’s going to be a huge deal, it’s going to transform the world — but not now. But just wait until sometime. Five years, 10 years. But I actually do think that there’s good reasons to think that that’s the case.
Zvi Mowshowitz on why he thinks it’s a bad idea to work on improving capabilities at cutting-edge AI companies [01:40:06]
From #184 – Zvi Mowshowitz on sleeping on sleeper agents, and the biggest AI updates since ChatGPT
Rob Wiblin: Should people who are worried about AI alignment and safety go work at the AI labs? There’s kind of two aspects to this. Firstly, should they do so in alignment-focused roles? And then secondly, what about just getting any general role in one of the important leading labs?
Zvi Mowshowitz: This is a place I feel very, very strongly that the 80,000 Hours guidelines are very wrong. So my advice, if you want to improve the situation on the chance that we all die for existential risk concerns, is that you absolutely can go to a lab that you have evaluated as doing legitimate safety work, that will not effectively end up as capabilities work, in a role of doing that work. That is a very reasonable thing to be doing.
I think that “I am going to take a job at specifically OpenAI or DeepMind for the purposes of building career capital or having a positive influence on their safety outlook, while directly building the exact thing that we very much do not want to be built, or we want to be built as slowly as possible because it is the thing causing the existential risk” is very clearly the thing to not do. There are all of the things in the world you could be doing. There is a very, very narrow — hundreds of people, maybe low thousands of people — who are directly working to advance the frontiers of AI capabilities in the ways that are actively dangerous. Do not be one of those people. Those people are doing a bad thing. I do not like that they are doing this thing.
And it doesn’t mean they’re bad people. They have different models of the world, presumably, and they have a reason to think this is a good thing. But if you share anything like my model of the importance of existential risk and the dangers that AI poses as an existential risk, and how bad it would be if this was developed relatively quickly, I think this position is just indefensible and insane, and that it reflects a systematic error that we need to snap out of. If you need to get experience working with AI, there are indeed plenty of places where you can work with AI in ways that are not pushing this frontier forward.
Rob Wiblin: Just to clarify, I guess I think of our guidance, or what we have to say about this, is that it’s complicated. We have an article where we lay out that it’s a really interesting issue: often, the people who we ask for advice or ask people’s opinions about career-focused issues, typically you get a reasonable amount of agreement and consensus. This is one area where people are just all across the map. I guess you’re on one end saying it’s insane. There’s other people whose advice we normally think of is quite sound and quite interesting, who think it’s insane not to go and basically take any role at one of the AI labs.
So I feel like, at least I personally don’t feel like I have a very strong take on this issue. I think it’s something that people should think about for themselves, and I regard as non-obvious.
Zvi Mowshowitz: So I consider myself a moderate on this, because I think that taking a safety position at these labs is reasonable. And I think that taking a position at Anthropic, specifically, if you do your own thinking — if you talk to these people, if you evaluate what they are doing, if you learn information that we do not have privy to here — and you are willing to walk out the door immediately if you are asked to do something that is not actually good, and otherwise advocate for things and so on, that those are things one can reasonably consider.
And I do want to agree with the “make up your own mind, do your own research, talk to the people, look at what they’re actually doing, have a model of what actually impacts safety, decide what you think would be helpful, and make that decision.” If you think the thing is helpful, you can do it. But don’t say, “I’m going to do the thing that I know is unhelpful — actively unhelpful, one of the maximally unhelpful things in the world — because I will be less bad, because I’m doing it and I’ll be a responsible person, or I will build influence and career capital.” That is just fooling yourself.
Therefore, I consider myself very much a moderate. The extreme position is that one should have absolutely nothing to do with any of these labs for any reason, or even one shouldn’t be working to build any AI products at all, because it only encourages the bastards. I think there are much more extreme positions that I think are highly reasonable positions to take, and I have in fact encountered them from reasonable people within the last week discussing realistically how to go about doing these things. So I don’t think I’m on one end of the spectrum.
Obviously the other end of the spectrum is just go to wherever the action is and then hope that your presence helps, because you are a better person who thinks better of things. And based on my experiences, I think that’s probably wrong, even if you are completely trustworthy to be the best actor you could be in the situations, and to carry out those plans properly. I don’t think you should trust yourself to do that.
Holden Karnofsky on the power that comes from just making models bigger [01:45:21]
Rob Wiblin: What’s a view that’s common among ML researchers that you disagree with?
Holden Karnofsky: You know, it depends a little bit on which ML researchers for sure. I would definitely say that I’ve been a big “bitter lesson” person since at least 2017. I got a lot of this from Dario Amodei, my wife’s brother, who is CEO of Anthropic, and I think has been very insightful.
A lot of what’s gone on in AI over the last few years is just like bigger models, more data, more training. And there’s an essay called “The bitter lesson” by an ML researcher, Rich Sutton, that just says that ML researchers keep coming in with cleverer and cleverer ways to design AI systems, and then those clevernesses keep getting obsoleted by just making the things bigger — just training them more and putting in more data.
So I’ve had a lot of arguments over the last few years. And in general, I have heard people arguing with each other that are just kind of like one side. It’s like, “Well, today’s AI systems can do some cool things, but they’ll never be able to do this. And to do this” — maybe that’s reasoning, creativity, you know, something like that — “we’re going to need a whole new approach to AI.” And then the other side will say, “No, I think we just need to make them bigger, and then they’ll be able to do this.”
I tend to lean almost entirely toward that “just make it bigger” view. I think, at least in the limit, if you took an AI system and made it really big — you might need to make some tweaks, but the tweaks wouldn’t necessarily be really hard or require giant conceptual breakthroughs — I do tend to think that whatever it is humans can do, we could probably eventually get an AI to do it. And eventually, it’s not going to be a very fancy AI; it could be just a very simple AI with some easy-to-articulate stuff, and a lot of the challenge comes from making it really big, putting in a lot of data.
I think this view has become more popular over the years than it used to be, but it’s still pretty debated. I think a lot of people are still looking at today’s models and saying that there’s fundamental limitations, where you need a whole new approach to AI before they can do X or Y. I’m just kind of out on that. I think it’s possible, but I’m not confident. This is just where my instinct tends to lie. That’s a disagreement.
I think another disagreement I have with some ML researchers, not all at all, but sometimes I feel like a background sense that just sharing openly information — publishing, open sourcing, et cetera — is just good. That it’s kind of bad to do research and keep it secret, and it’s good to do research and publish it. And I don’t feel this way. I think the things we’re building could be very dangerous at some point, and I think that point can come a lot more quickly than anyone is expecting. I think when that point comes, some of the open source stuff we have could be used by bad actors in conjunction with later insights to create very powerful AI systems in ways we aren’t thinking of right now, but we won’t be able to take back later.
And in general, I do tend to think that in academia, this idea that sharing information is good is built into its fundamental ethos. And that might often be true — but I think there’s times when it’s clearly false, and academia still kind of pushes it. Gain-of-function research being kind of an example for me, where people are very into the idea of, like, making a virus more deadly and publishing how to do it. And I think this is an example of where, just culturally, there’s some background assumptions about information sharing, and that I just think the world is more complicated than that.
Rob Wiblin: Yeah. I definitely encounter people from time to time who have this very strong prior, this very strong assumption, that everything should be open and people should have access to everything. And then I’m like, what if someone was designing a hydrogen bomb that you could make with equipment that you could get from your house? I’m just like, I don’t think that it should be open. I think we should probably stop them from doing that. And, certainly, if they figure it out, we shouldn’t publish it. And I suppose it’s just that’s a sufficiently rare case that it’s very natural to develop the intuition in favour of openness from the 99 out of 100 cases where that’s not too unreasonable.
Holden Karnofsky: Yeah. I think it’s usually reasonable, but I think bioweapons is just a great counterexample, where it’s not really balanced. It’s not really like, for everyone who tries to design or release some horrible pandemic, we can have someone else using open source information to design a countermeasure. That’s not actually how that works. And so I think this attitude at least needs to be complicated a little bit more than it is.
Rob & Luisa: Are risks of AI-related misinformation overblown? [01:49:49]
Luisa Rodriguez: Are there problems that other people are worried about that you’re less worried about?
Rob Wiblin: It would be very suspicious if I was just more worried about every single aspect of the problem than the typical person is. I guess there could be a way of making that make sense, but if I’m being rational about it, there should be at least some cases where I disagree in the other direction, and I think people might be exaggerating the problems.
Maybe one of the most common things that I read and hear that people are concerned about is misinformation: the idea that AI is going to create this kind of misinformation apocalypse where people just completely lose touch with reality and social media just becomes gunked up with total nonsense — all put out by Russia, I suppose, or all put out by bad actors trying to confuse people.
I’m worried about that somewhat — I think that there will be things along those lines that happen — but I feel like I’m not as worried as maybe Sam Harris is, or I guess Tristan Harris. The Harrises: they’re very worried about misinformation. Intuitively, I’m not so scared, at least in the short run.
And I think one of the reasons is that I am sceptical that the bottleneck to misinformation causing harm was the ability to produce lots of text containing false information. If you can just produce lots of specialised articles about many different topics that push some particular agenda, I’m sure that helps somewhat. If I was the Russian disinformation group, it sounds helpful — but really isn’t the challenge here trying to get that information disseminated to people en masse from a source that they think is kind of credible?
You know, just tweeting it out from a random Twitter account that you’ve set up, a fake one pretending to be someone: why are people going to follow this exactly? And eventually, if the few accounts that actually become big and are attracting an awful lot of impressions, can’t people look into who this person is, and try to investigate whether this is a disinformation account? I think they kind of already do that.
Luisa Rodriguez: Just trying to play devil’s advocate, it seems like there are lots of people — I mean, I’m just thinking a bunch of people in the US right now — who are susceptible to misinformation partly because they’ve got a bunch of reasons to prefer things that are false, because it confirms their worldview or confirms their political views or something.
And to the extent that lots of people have confirmation bias and motivated reasoning — and plausibly there’s another group that’s not quite fooled by things that the alt right says now, for example, but there is a version that seems slightly better-evidence-backed that could convince them — does it just make it easier for people who are susceptible to false beliefs because of something like motivated reasoning to end up believing things that are false?
Rob Wiblin: Yeah, I think it helps. Or I think it makes this issue a bit worse. One general thing is I’m not sure that social media has made people have a worse connection to reality, or just have more conspiratorial false beliefs than they did in the past. I think people had lots of false beliefs in all periods of time, because motivated reasoning is not something that was invented in the 2010s. People have had particular partisan political views and believed whatever was necessary in order to make that make sense and feel like their group was good I think through all of human history. We see plenty of it, but I’m not sure that we see more of it than we did in the past. I think if we went back to the ’70s, we could probably find plenty of cases that look analogous to what we’re worried about now.
Luisa Rodriguez: Can you think of any? Just to really drive it home for me?
Rob Wiblin: Sure. One suspicion I have is that we’re not so aware of it because all of that just fades into history.
Luisa Rodriguez: Exactly.
Rob Wiblin: But I think during the ’50s and ’60s in the US, there was a long period where people just believed things about the Vietnam War that now we think are preposterous. There was this official narrative that was extremely heavily put out through the media. There was collaboration — just like very explicit collaboration — between people who ran newspapers and television stations, where they basically just wanted to, and not for any evil reason, but they just thought we should back the United States. We should be patriotic; the president and the government have good ideas about what we’re doing in Vietnam. We should broadly be supportive of our side. And so they continued to push out what the government said. But in many cases, that was kind of misleading or not really true. And this resulted in people just having the wrong idea.
Luisa Rodriguez: Right. So the idea is kind of that it was already pretty easy. This makes it a bit easier, but not enough to change the picture and make this a much bigger issue.
Rob Wiblin: Yeah. Maybe that’s a bad example in a way, because that’s an example of official misinformation or official confusion. Whereas the issues that we have now are more about kind of chaos, rather than there being any official narrative. Most of the time, it’s just that people are getting all kinds of nonsense information that says all kinds of different things, and often from kind of eclectic sources, rather than The New York Times putting out information that’s misleading.
I would love to see research on this, but I bet if you went back to the ’70s and looked at polling on what people believe about the Moon landing, and what they think about these various different assassinations, and what they believe about a communist infiltration, they probably had some right beliefs, but all kinds of crazy wrong beliefs. Before we decide that people are more into conspiracies and more disconnected from reality now than they used to be, I would at least like to have some gauge of how disconnected a typical person was in the ’70s.
Luisa Rodriguez: What is it that you think makes people get this wrong? Is it just the lack of salience of people having false beliefs for human history?
Rob Wiblin: I suspect this is a case where people always think that things are getting worse. Or this is one where you don’t know about the bad stuff from the past, or you haven’t studied the cases of conspiratorial thinking from the ’70s, because at the end of the day, maybe it didn’t actually cause the collapse of society, so it’s not so interesting. The ways in which people believe annoying stupid stuff now is very salient to you and you feel like you want to fight it, but the stuff from the past is kind of forgotten.
I think the biggest cultural force that we’ve seen over the last 20 years is that the internet has allowed people to fracture into different interest groups and different belief groups. There is no longer such a central narrative that a handful of different groups can put out that then commands a lot of agreement across society.
For example, me and my friends, we’ve kind of clustered in a group that has eclectic beliefs. We have a different worldview than many people do across the rest of society. It would have not really been viable for us to all meet one another and to start collaborating on projects in the ’70s or ’80s. How would we have found one another? But with the internet, people who have particular niche ideas and niche things that they want to pursue can find one another and all cluster together.
So I see it just as a splintering, like an increase in weirdness. Basically, the internet has facilitated an increase in weirdness. This has some negative things. I guess it means that it might make it harder for people to get along. But it also means that our mistakes are somewhat less correlated. It’s less likely that everyone is going to march off the same cliff simultaneously. And where the balance lies overall, and whether people as a whole are more disconnected than they were in the past… I was going to just bitch about news, but that would be a whole other episode.
Luisa Rodriguez: Yeah. Is there a story where it’s harder to believe true things just because it’s harder to tell which information you’re learning or which videos you’re watching or something are authentic? Like primary sources of information: there are many of them, many are false and created deepfakes or something, and you just don’t know which is real?
Rob Wiblin: So at all points in time, people have experienced or been able to directly observe and learn only a tiny fraction of the things that they think they know. Almost all of it is because someone else said so, or because it was in a book, because someone claims that they ran an experiment or this or that. For a very long time it’s been possible to write down text saying things that are false, and people have learned not really to trust that: that the fact that something is written down shouldn’t be completely persuasive.
It is kind of new that you can generate fake text and videos, realistic-looking ones, of things that didn’t happen. I think that what will happen there is people will come to not trust just random videos that they encounter, because they’ll realise that they can be faked. I guess we might have an issue with older people won’t realise this, but eventually they will pass away and most of us will have grown up or lived for a long time in a world where we realise that video just isn’t necessarily credible evidence that something happened.
So we’ll be somewhat forced to rely more on the sourcing. So it’ll matter whether The New York Times says, “We collected a video of this thing happening.” I guess the fact that we can’t then rely on random videos from random people on the internet to be accurate could contribute to confusion, but it feels more like an incremental change to me, rather than something that is devastating and going to lead to rapid collapse of society or collapse of democracy. It’s the more extreme scenarios, where some people are worried about that — I think actually many people are worried about that — and I just suspect that the effects are going to be more muted.
Hugo Mercier on how AI won’t cause misinformation pandemonium [01:58:29]
From #180 – Hugo Mercier on why gullibility and misinformation are overrated
Rob Wiblin: OK, let’s turn now to the question of AI and LLMs and generative models, and whether we should worry about them being able to convince people to believe things that are not true. So what’s your overall view on whether we should worry about LLMs or AIs making public discourse about important topics worse?
Hugo Mercier: Yeah, I don’t think we should worry.
Rob Wiblin: OK, so you’re a little bit more extreme than me, maybe.
Hugo Mercier: Yeah, I’m happy. I mean, I’m not happy. I would rather not be persuaded of the opposite, because I’m happy not to be worried. I’ve talked to people about this, and then again, my colleagues tend to be on the same side of the issue as I am, to some extent. But I haven’t seen a scenario that I deemed plausible in which AI or LLMs were making things really worse.
Obviously I should specify this is not an area where I’m really knowledgeable. Misinformation I know about, but LLMs in particular I don’t know much about, so there might be things I’m underestimating due to that ignorance. But people who know more about these things haven’t been able to convince me otherwise, let’s say.
Rob Wiblin: Yeah, yeah. So how useful would it be to take the inundation approach — to generate just enormous numbers of articles arguing for a given conclusion using all kinds of different arguments?
Hugo Mercier: First of all, there’s already an essentially infinite amount of information on the internet. So the bottleneck is not how many articles there are on any given topic, because there is already way more than anybody will ever read; the bottleneck is people’s individual attention — and that bottleneck is largely controlled by, to some extent peers and colleagues and social networks, but otherwise mostly by the big actors in the field: by cable news, by big newspapers. And there’s no reason to believe these things are going to change dramatically. So having another 1,000 articles on a given issue, just no one is going to read them.
Rob Wiblin: Yeah. This is the thing that made me sceptical of this when I really thought about it, if I imagined myself running a propaganda campaign — especially one that’s already financed, and has enough supporters that you could write a meaningful number of articles arguing for something already. I’m like, don’t you hit pretty declining returns just on the sheer volume of them?
I suppose if you didn’t have many resources, it could make it a little bit cheaper, because you could potentially write opinion pieces, where otherwise — perhaps if you weren’t very educated or you didn’t speak the language that you were focusing on very well — then I suppose these things could make it cheaper to do that. You could have an assistant.
But the idea that it would be helpful to produce very large numbers doesn’t seem like the key thing, because the question is how do you get people to read them and take them seriously? And there the bottleneck is just a different stage. It’s not the production stage, which is relatively cheap, I would imagine, in the scheme of things; it’s how do you get anyone to care give a damn about what you’ve made?
Hugo Mercier: Yes. Let’s say you wanted to write an op-ed and to push it to people on Facebook or something. You could hire someone to write the op-ed. It’s going to cost you a few thousand dollars. Then getting more than a few hundred people to read it on Facebook is going to cost you a lot of money. And that’s just people; you’re going to have to have people click on the thing, and one out of 100 of the people who click would actually read the whole thing. So that’s the bottleneck. It’s not the number of things that are written; it’s how many things people read.
Rob Wiblin: Yeah. I guess it could allow you to come up with more iterations and test more different messages. And some of them will be a bit better than others, so you can get a bit of a gain there.
Hugo Mercier: But even then, that assumes that people will read them. Otherwise you’re going to get no feedback.
Rob Wiblin: I see. Yeah.
Hugo Mercier: People don’t read the news in the first place.
Rob Wiblin: You’re saying that people read the headlines, right?
Hugo Mercier: Some people read some headlines, yeah. Some people do read the news, obviously, but it’s much less than we believe.
Rob Wiblin: Not with great care, yeah. I should maybe say that in the book you discuss fake news and misinformation in general, and to cut a long story short, the evidence that fake news and deliberate misinformation is causing large numbers of people to change their minds about things is not very good.
At least in the US, there are a reasonable number of people who do consume fake news, but overwhelmingly they’re the people who are most extremely partisan to start with. So it’s that they want to read fake news that endorses their preconceptions, because they really enjoy it as a sort of recreation, really, and it doesn’t cause them to change their views that much. It’s more that their views cause them to want to consume the information.
Hugo Mercier: That’s exactly right. Essentially, if you’re in a democracy — and to some extent even in dictatorships, but clearly in democracies — the informational environment is going to be driven by demand. Overwhelmingly, the things that are there are there because people want to read about them, they want to hear about them, and not because someone is trying to push them.
Obviously, journalists and editors also have some agency; I’m not denying that. They’re going to work on some stories rather than others. But the selection bias operated by the population as a whole is going to be so massive, and journalists themselves want to write something that people will read. So mostly if you see a lot of fake news for something, it’s because people wanted to hear this, so presumably they already agreed with it.
And indeed, as you are saying, it’s been quite well shown by now that, first of all, the amount of fake news that circulates is very small, like a few percent of the information that circulates on social networks is fake news — really like 2%, 3% at most. And that very small percent is overwhelmingly consumed by people who are politically extreme and whose views fit with that. So that’s going to have no effect.
Rob Wiblin: Part of your model of the world is also just that people, when they realise that there’s a risk that they’re going to be tricked, they just shut down. So you can imagine in a world where it’s very easy to produce this compelling, slick content, people learn the lesson that anyone can make slick content and so they just stop paying [attention]. They might engage with it for entertainment value, but they won’t necessarily regard it as very strong evidence for any particular conclusion.
Hugo Mercier: Yeah, that’s a good question. I think people are already very good at this. I mean, people who write smart political books, it’s quite doable to write up an argument that seems persuasive, but because you’re very selectively reporting evidence, is actually really misleading. And people are already very good at this. LLMs will make it easier, no doubt, I would imagine.
But also I want to be careful. What I’m saying on the whole is that I don’t want to make people believe that it’s impossible to change people’s minds, if you do spend the time. What seems to work at scale is really the accumulation of evidence — in particular, evidence or arguments conveyed by people who are kind of near you.
For instance, if you look at how opinions change throughout long periods of time, we know that opinions have changed dramatically on some things, like gay marriage, or it used to be interracial marriage, trans rights. Many things are changing relatively quickly. Most of that is generational changes, like young people being different from older people. But for some issues, there are some changes within individuals. So for instance, on average, people have become more pro-gay marriage over the past 40 years, in most Western cultures. Not just young people are more pro-gay marriage, but individual people have changed.
So that kind of change is possible, but it’s only possible when an issue is the big thing that everybody talks about for a long time, and when you have a lot of people in your surroundings who are making arguments. It’s not just reading things in the media. It’s like you talk to people in your family, your friends and your colleagues. And that works. Like it really can have a dramatic impact on society.
But it’s hard to imagine how LLMs… Maybe they can grease the wheels a little bit by providing better arguments. But then again, the main bottleneck is attention. That’s not going to happen for every issue that there is, because every issue can’t be in the headlines all the time.
Rob Wiblin: OK, what about the approach (that I call “scalpel”) in which everyone can be delivered an individualised pitch for X, given their existing views and their personality? So they could be given the arguments that are most convincing, given their preconceptions. Do you think that can meaningfully increase the impact of an effort at persuasion?
Hugo Mercier: I think it’s essentially the same as the other one, because probably, given how good people already are at making arguments, the only thing that’s stopping a book by Peter Singer or some brilliant philosopher or brilliant thinker to persuade more of the people who read the book is that he can’t personalise the arguments. Like, if Peter Singer could talk to every individual reader, I would assume that he would be way more persuasive, because I’m sure there are many counterarguments that he hasn’t been able to put into his books. And obviously he’s extremely clever, and so presumably he would be way more persuasive in person.
So I think that if there is going to be a delta in persuasion compared to what’s already out there, it’s going to be in that personalisation. But as we were saying, even that, it’s not clear how you would scale it up, because people just don’t have that much time.
Rob Wiblin: I see. So the bottleneck becomes, you could come up with a personalised argument for them, but how do you get people to pay attention to you? And there, just as everything becomes more entertaining — maybe because LLMs are able to make things more entertaining; they’re able to do a mashup between, I don’t know, an argument for nuclear power and Hamilton or something — but then they’re competing with everyone else who’s trying to do the same thing. So it’s quite hard to get an edge for one particular view over other things in such a competitive information environment.
Hugo Mercier: Yes, personalisation can do great things. If you could watch the exact series that would appeal to exactly your taste, in a way that would be really awesome.
But on the other side, we also consume content — whether it’s news or fiction or anything else — to some extent because we want to be able to talk to others about it. So if you watch a movie that was made just for you, but that no one else can watch or enjoy because it’s not their taste, it’s going to spoil some of the fun.
Likewise, if you read some news or if you hear some arguments that are only compelling for you, and that if you try sharing them with others it’s not going to appeal to them at all, it reduces the interest you have in having the thing in the first place. And it reduces what political scientists call “two-step flow” — that you can’t convince other people in turn, because the thing has been so personalised to you that the buck kind of stops there — and we know that a lot of persuasion comes from people being convinced by media or by government, and then passing on that knowledge or those beliefs to others.
Rob & Luisa: How hard will it actually be to create intelligence? [02:09:08]
Luisa Rodriguez: It sounds like overall you think it’s just not that hard to create intelligence. Is that right?
Rob Wiblin: I don’t have a really strong take on this, and I don’t think this is really a crux because inasmuch as you think that it’s harder, in my mind, all that does is push out the date. So if you think it’s easy, then you’re worried in 2027; if you think it’s hard, you’re worried in 2040. But many of the questions kind of remain the same. It’s just that you have a bit more time to deal with them.
But yeah, if I had to guess, I would say that I think it’s going to be on the easier end. And maybe there’s two things. One is I just want to emphasise that the human brain, at least the level of intelligence that we have, is just something that evolution — this blind, dumb process — struck on in order to solve a problem: to solve the problem of how do we get food and avoid getting eaten by animals and have sex and reproduce. There’s no magic here. It’s just an evolutionary selective process, the same way that we change the minds in the neural networks and shift them — in fact, much faster in this case — in order to be able to solve the problems that we present those networks with.
So that’s one reason: I think it’s possible to fall into this trap of thinking, “The human mind: wow, how could we ever replicate that?” And I’m like, we should probably just expect to be able to replicate that, once we have chips with enough components. That’s kind of my intuition.
The other thing is just looking at the rate of progress, and what these models can do relative to what they were able to do before, and seeing that without any great breakthroughs in understanding of the nature of intelligence or what they’re doing — or even knowing after the fact how they manage to solve the problems — they’re able to gain massive capabilities. And so potentially just throwing more challenges at them using the same algorithms could just allow them to solve many more problems.
There’s things here that I’m skating over, but I think the way that they’ve approached us in many different domains without us running into roadblocks, just by throwing more compute, makes me think maybe intelligence isn’t that hard to solve.
Luisa Rodriguez: Yeah, I think I’m one of those people who heard the first argument and was like, that sounds a little bit hand-wavy and theoretical. Maybe it’s just hard to predict some things that aren’t literally magic, that are really hard to replicate on silicon.
And I just don’t know how you can ignore the second thing, the progress we’ve seen. It just seems completely inevitable. And it blows my mind that there are some people that have seen the progress made, and the fact that there’s going to be loads of investment, and think that’s not going to get us much farther.
Rob Wiblin: Yeah. So if I wanted to push back on this — because I agree, neither of these arguments is overwhelming — if someone really had a strong intuition that it was hard, then they don’t have to change their mind.
The first argument just has the weakness that you’re saying it’s not that hard, but maybe it is just pretty hard for humans to copy what evolution did. Maybe it’s come up with some really clever mechanisms that we haven’t managed to reverse engineer, and we’re not going to strike on them for some time.
Luisa Rodriguez: It did take evolution quite a long time, to be fair.
Rob Wiblin: Evolution has been at it for longer than we have. So that’s a live possibility, could be true.
The second one, I mean, some people look at it and they have a different intuition: that what it’s doing is not that impressive somehow: that it messes up in all these random ways, that it doesn’t understand what you’ve said sometimes.
One big difference here is the people who I read online, who are kind of playing down the capabilities, and are not so impressed and think that AGI might be a very long way in the future, I find that they tend to focus a lot on what the model can do right now: they point out failures that it has right now. Whereas the people who think it’s going to happen soon focus mostly on the change: they focus on the delta; they focus on how much better is ChatGPT than GPT-3.5, and how much better was 3.5 relative to 3. They focus on the gradient.
And I think the second group is right. I think that the people who focus on the current failures and weaknesses and the lack of capabilities that ChatGPT has, they should see how many more similar things there were just two years ago, and how in fact they’ve been reduced by like 80%, and then project forward how much better the models will be in two years’ time. Of course, maybe it would level off, maybe they’ll stop improving. I wouldn’t count on it though. I wouldn’t bet the bank on it.
Luisa Rodriguez: Yeah. Insofar as those inputs like compute are still going to keep going up, at least for some time, then that delta is going to keep also changing at the same rate.
Rob Wiblin: Yeah. I guess the other objection would just be to say what ChatGPT, what all of these generative AI models are doing, is just not what happens inside the human brain. Some people still have this objection. This was a very common objection a few years ago, less common now, like, “I really understand what objects are. I really have a deep understanding of the actual causal structure of the world. ChatGPT is just faking it; it’s just predicting the next word. And it turns out that you can pretend to understand the structure of the world just if you’re very good at predicting what the next word is in a sentence, but that doesn’t mean that you really know what a lamp is.”
I think there’s some truth to that, but I’d say it’s more wrong than right. If you can perform all the functions of knowing what these things do, and understanding in practice what would happen if you clicked the button on a lamp, then I think you kind of understand what a lamp is — at least in a functional sense.
I listened to an interview recently with someone who was interviewing a machine learning researcher, and they were like, “Look, what GPT-4 does is very cool, but it’s not thinking. It’s not thinking, is it?” And the machine learning person didn’t really know quite what to say because they were like, well, it performs the same function as thinking. It’s equivalent to thinking. So who cares if we call it thinking?
Now, I guess you could push back and say that it’s not functionally the same. In fact, fundamentally, it causes all these screwups in how you react to questions, and sometimes you just get things wrong. But again, I would just project forward that those failures are approaching zero. And at the point where there is no distinction between understanding something and pretending to understand it, maybe it does understand.
Luisa Rodriguez: Maybe at that point it is functionally equivalent. Yeah, yeah, yeah.
Robert Long on whether digital sentience is possible [02:15:09]
From #146 – Robert Long on why large language models like GPT (probably) aren’t conscious
Luisa Rodriguez: So you’re saying that there’s this idea called functionalism where basically it’s like the functions that matter — where all you need is certain computations to be happening or possible, in order to get something like sentience. Is that basically right?
Robert Long: Yeah, that’s basically right. Computationalism is a more specific thesis about what the right level of organisation or what the right functional organisation is. It’s the function of performing certain computations. Does that make sense?
Luisa Rodriguez: I think so. Maybe I’ll make sure I get it. So the argument is that there’s nothing special about the biological material in our brain that allows us to be conscious or sentient. It’s like a particular function that our brain serves, and that specific function is doing computations. And those computations are the kind of underlying required ability in order to be sentient or conscious. And theoretically, a computer or something silicon-based could do that too.
Robert Long: Yeah. I think that’s basically right.
Luisa Rodriguez: So that’s the basic argument. What evidence do we have for that argument?
Robert Long: Yeah, I’ll say that’s like the basic position, and then why would anyone hold that position? I think one thing you can do is look at the way that computational neuroscience works. So the success of computational neuroscience — which is kind of the endeavour of describing the brain in computational terms — is like some evidence that it’s the computational level that matters.
And then there are also philosophical arguments for this. So a very famous argument or class of arguments are what are called replacement arguments, which were fleshed out by David Chalmers. And listeners can also find when Holden Karnofsky writes about digital people and wonders if they could be conscious or sentient, these are actually the arguments that he appeals to. And those ask us to imagine replacing neurons of the brain bit by bit with artificial silicon things that can take in the same input and yield the same output. And so by definition of the thought experiment, as you add each one of these in, the functions remain the same and the input/output behaviour remains the same.
So Chalmers asked us to imagine this happening, say, to us, while this podcast is happening. By stipulation, our behaviour won’t change, and the way we’re talking about things won’t change, and what we’re able to access in memory won’t change. And so at the end of the process, you have something made entirely out of silicon, which has the same behavioural and cognitive capacities as the biological thing.
And then you could wonder, well, did that thing lose consciousness by being replaced with silicon? And what Chalmers points out is it would be really weird to have something that talks exactly the same way about being conscious — because by definition, that’s like a behaviour that remains the same — and has the same memory access and internal cognition, but their consciousness left without leaving any trace of leaving. He thinks this would be like a really weird dissociation between cognition and consciousness.
And one reason this argument kind of has force is a lot of people are pretty comfortable with the idea that at least cognition and verbal behaviour and memory and things like that can be functionally, multiply realised. And there’s an argument that if you think that it would be kind of weird if consciousness is this one exception where the substrate matters.
Luisa Rodriguez: So I think the idea is something like, if you had a human brain and you replaced a single neuron with, I guess, a silicon neuron that performed the exact same function. And is the reason we think that’s a plausible thing to think about because neurons transmit electricity and they’re kind of on/off switchy in maybe the same way that computers are? Is that it?
Robert Long: Yeah, this is an excellent point. One weakness of the argument, in my opinion, and people have complained about this, is it kind of depends on this replacement being plausible. Or it seems that way to people. In the paper, there’s actually a note on, “Well, you might think that actually in practice, this is not something you could do.” And obviously we could not do it now. And for reasons I don’t entirely understand, that’s not really supposed to undermine the argument.
Luisa Rodriguez: Huh. OK. Is it basically right though that we think of a neuron and a computer chip as analogous enough that that’s why it’s plausible?
Robert Long: Yeah. We think of them as being able to preserve the same functions. And I mean, I think there is some evidence for this from the fact that artificial eyes and cochlear implants work. Like we do find that computational things can interface with the brain and the brain can make sense of them.
Luisa Rodriguez: Interesting. And then eventually you replace all my neurons with the silicon prosthetic neurons. And then I have an entirely silicon-based brain, but there’s no reason to think I wouldn’t feel or think the same things. Is that basically it?
Robert Long: That’s the idea. It’s if you did think that you don’t feel the same things, it’s supposed to be really counterintuitive that you would still be saying, “This worked. I’m still listening to Rob talk. I’m still seeing colours.” You would still be saying that stuff since that’s like a behavioural function. Yeah, that’s the basic thrust. So then that’s at least one silicon-based system that could be conscious. So that kind of opens the door to being able to do this stuff in silicon.
Luisa Rodriguez: Right. It feels very similar to the ship that has all of its planks replaced one by one. And at the end you’re asked if it’s still the same ship.
Robert Long: Yeah, it is similar. This sort of thing shows up a lot in philosophy. As I said, it’s like an old trick.
One thing I would like to say, and maybe I’m qualifying too much, but full disclaimer: I think a lot of people are not super convinced by this argument. Gualtiero Piccinini is an excellent philosopher who thinks about issues of computation and what it would mean for the brain to be computing, and I think he’s sympathetic to computationalism, but he thinks that this argument isn’t really what’s getting us there. I think he relies more on that point I was saying about, well, if you look at the brain itself, it does actually look like computation is a deep or meaningful way of carving it up and seeing what it’s doing.
Luisa Rodriguez: Right, right. And so if you could get the right computations doing similar things, or doing things that make up sentience, then it doesn’t matter what’s doing it. What reasons do people think that that argument doesn’t hold up?
Robert Long: Well, for one thing, you might worry that it’s sort of stipulated what’s at issue at the outset, which is that silicon is able to do all the right sort of stuff. So there’s this philosopher of biology and philosopher of mind called Peter Godfrey-Smith — who would be an excellent guest, by the way; he’s written a book about octopus minds — and he has a line of thinking where functionalism in some sense is probably true, but it’s not clear that you can get the right functions if you build something out of silicon. Because he’s really focused on the low-level biological details that he thinks might actually matter for at least the kind of consciousness that you have. And that’s sort of something I think you can’t really settle with an argument of this form.
Luisa Rodriguez: Yeah. Can you settle it?
Robert Long: So I actually have sort of set aside this issue for now — funnily enough, since it’s like the foundational issue. And I’ll say why I’m doing that. I think these debates about multiple realisability and computationalism have been going on for a while. And I’d be pretty surprised if in the next few decades someone has just nailed it and they’ve proven it one way or the other.
And so the way I think about it is I think it’s plausible that it’s possible in silicon to have the right kind of computations that matter for consciousness. And if that’s true, then you really need to worry about AI sentience. And so it’s sort of like, let’s look at the worlds where that’s true and try to figure out which ones could be conscious.
And it could be that, you know, none of them are because of some deep reason having to do with the biological hardware or something like that. But it seems unlikely that that’s going to get nailed anytime soon. And I just don’t find it crazy at all to think that the right level for consciousness is the sort of thing that could show up on a silicon-based system.
Luisa Rodriguez: Are there any other arguments for why people think artificial sentience is possible?
Robert Long: This is related to the computational neuroscience point, but one thing people have noticed is that a lot of the leading scientific theories of what consciousness is are in computational terms, and posit computations or some other sort of pattern or function as what’s required for consciousness. And so if you think they’re correct in doing so, then you would think that it’s possible for those patterns or computations or functions being made or realised in something other than biological neurons.
Luisa Rodriguez: Does anyone disagree on this? Do some people just think artificial sentience is not possible?
Robert Long: Yeah, so there are these views — “biological theories” maybe you can call them. Ned Block is one of the foremost defenders of this biological view — that consciousness just is, in some sense, a biological phenomenon. And you won’t be capturing it if you go to something too far outside the realm of biological-looking things. John Searle is also a proponent of this view.
So there’s views where that’s definitely true, and it’s just like what consciousness is. There’s also views on which consciousness is something functional, but also you’re not going to be able to get it on GPUs or anything like what we’re seeing today. And those are kind of different sorts of positions. But it should be noted that plenty of people who’ve thought about this have concluded that you’re not going to get it if you have a bunch of GPUs and electricity running through them. It’s just not the right sort of thing.
Luisa Rodriguez: So the first argument is like: there’s something really special about biology and biological parts that make whatever consciousness and sentience is possible. And the other argument is like: it’s theoretically possible, but extremely unlikely to happen with the technology we have, or could create, or something?
Robert Long: Yeah. For that second position, most people will hold some version of that position with respect to Swiss cheese. Like I would be really surprised if very complicated arrangements of Swiss cheese ended up doing these computations. Because it’s just like, it’s not the right material to get the right thing going. Even if you think it is multiply realisable, you don’t have to think that you could feasibly do it in any sort of material at all.
One thing I’ll add — since I am being very concessive to a range of positions, which I think is appropriate — I would like to note that large numbers of philosophers of mind and consciousness scientists in surveys say artificial sentience is possible; machines could be conscious. I don’t have the exact numbers off the top of my head, but David Chalmers has this great thing, the PhilPapers survey, it asked people this question. It’s not like a fringe view. A substantial share of philosophers of mind think that artificial sentience is possible and maybe plausible. And ditto surveys of consciousness scientists.
Anil Seth on why he believes in the biological basis of consciousness [02:27:21]
From #206 – Anil Seth on the predictive brain and how to study consciousness
Luisa Rodriguez: Let’s turn to consciousness outside of humans. So I’m sympathetic to a functionalist view of consciousness, where mental states are kind of defined by their functional roles or relations, rather than by their biological makeup. To be more explicit for some listeners who don’t know as much about this theory: consciousness kind of arises from the patterns of interaction among various processes, regardless of the specific materials or the structures involved — meaning that a biological or artificial system could potentially be conscious if it functions in a way that meet the criteria for being conscious.
So on that view, it’s possible that we’ll end up building AI systems that are conscious if they can carry out those functions. Do you find that plausible?
Anil Seth: Well, I’m glad you said “functions” rather than “computations” — because I think that’s a difference that’s often elided, and I think it might be an important one. I’m much more sympathetic to the way you put it than the way it’s normally put, which is in terms of computation.
I think there’s actually three positions worth differentiating here. There’s many more, but for now, three is enough.
One of them is, as you said, this idea of biological naturalism: that consciousness really does depend on “the stuff” in some deep, intrinsic way. And the idea here is: say we have something like a rainstorm. A rainstorm needs to be made out of air and water. You can’t make it out of cheese. It really depends on the stuff. Another classic example is building a bridge out of string cheese. You know, you just can’t do it. A bridge, to have the functional properties that it has, has to be made out of a particular kind of stuff. And a rainstorm, it’s not even just the functional properties. It’s like, that’s what a rainstorm is, almost by definition.
So that’s one possibility. It’s often derided as being sort of magical and vitalist: you’re just saying there’s something magic about that. Well, it doesn’t have to be magic. Saying that a rainstorm depends on rain or water is not invoking any magic. It’s saying that it’s the kind of thing that requires a kind of stuff to be that thing.
So that’s one position. As you can see, I’m a little bit sympathetic to that.
Luisa Rodriguez: Yep.
Anil Seth: Then you have functionalism, which is the broadly dominant perspective in philosophy of mind and in the neuroscience of consciousness — so much so that it’s often assumed by neuroscientists, without really even that much explicit reflection.
This is the idea that, indeed, what the brain is made of, what anything is, doesn’t actually matter. All that matters is that it can instantiate the right patterns of functional organisation: the functional roles in terms of what’s causing what. If it can do that, then it could be made out of string cheese or tin cans, or indeed silicon.
Of course, the issue with that is not all patterns of functional organisation can be implemented by all possible kinds of things. Again, you cannot make a bridge out of string cheese. You probably can’t make a computer out of it either. There’s a reason we make things out of specific kinds of things.
So that’s functionalism broadly. And it’s hard to disagree with, because at a certain level of granularity, functionalism in that broad sense kind of collapses into biological naturalism. Because if you ask, What is a substrate?, ultimately, it’s about the really fine-grained roles that fields and atoms and things do. So you kind of get to the same place, but in a way that you wouldn’t call it functionalism, really; it’s about the stuff. So that’s another possibility.
And then the third possibility — which is what you hear about all the time in the tech industry, and a lot in philosophy and neuroscience as well — is that it’s not just the functional organisation; it’s the computations that are being carried out. And often these things are entirely conflated. When people talk about functionalism, they sort of mean computation functionalism — but there is a difference, because not all patterns of organisation are computational processes.
Luisa Rodriguez: Yeah, I think I’ve just done this conflation.
Anil Seth: You can certainly describe and model things computationally. I mean, there are fundamental theories in physics, in philosophy, like Church–Turing or whatever, that you can do this — but that doesn’t mean the process itself is computational.
Luisa Rodriguez: You can model a rainstorm, but it’s not a rainstorm.
Anil Seth: Absolutely, absolutely. And this has been, I think, a real source of confusion. On the one hand, functionalism is very broadly hard to disagree with if you take it down to a really low level of granularity. But then, if you mix it up with computation, you get actually two very opposite views: on the one, consciousness is a property of specific kinds of substrate, specific kinds of things; on the other, it’s just a bunch of computations, and GPT-6 will be conscious if it does the right kind of computations.
And these are very divergent views. The idea that computation is sufficient is a much stronger claim. A much stronger claim. And I think there’s many reasons why that might not be true.
Luisa Rodriguez: Yeah. That was a really useful clarification for me. Maybe let’s talk about computational functionalism in particular. So this basically is maybe a claim that I still find plausible. I’d be less confident it’s plausible than functionalism, but I still find it plausible. There are thought experiments that kind of work for me, like if you replaced one neuron at a time with some kind of silicon-based neuron, I can imagine you still getting consciousness at the end.
What do you find most implausible about computational functionalism?
Anil Seth: You’re absolutely not alone in finding it plausible. I should confess to everyone listening as well that I’m a little bit of an outlier here: I think the majority view seems to be that computation is sufficient. Although it’s interesting; it’s recently been questioned a lot more than it used to be. Just this year, really, I’ve seen increasing scepticism, or at least interrogation — which is healthy, even if people aren’t persuaded. You need to keep asking the question, otherwise the question disappears.
So why do I find it implausible? I think for several reasons. There are many things that aren’t computers and that don’t implement computational processes. I think one very intuitive reason is the computer has been a very helpful metaphor for the brain in many ways, but it is just a metaphor — and metaphors, in the end, always tie themselves out and lose their force. And if we reify a metaphor and confuse the map for the territory, we’ll always get into some problems.
So what is the difference? Well, if you look inside a brain, you do not find anything like the sharp distinction between software and hardware that is pretty foundational to how computers work. Now, of course, you can generalise what I mean by computers and AI, but for now, let’s just think of the computers that we have on our desk or that whir away in server farms and so on.
So the separation between hardware and software is pretty foundational to computer science. And this is why computers are useful: you can run the same program on different machines, it does the same thing. So you’re kind of building in this substrate independence as a design principle. That’s why computers work. And it’s amazing that you can build things that way.
But brains, just in practice, are not like that. They were not built to be like that. They were not built so that what happens in my brain could be transferred over to another brain and do the same thing. Evolution just didn’t have that in view as a kind of selection pressure. So the wetware and the mindware are all intermingled together: every time a neuron fires, all kinds of things change — chemicals wash about, strengths of connections change. All sorts of things change.
There’s a beautiful term called “generative entrenchment” — which is maybe not that beautiful, but I like it — and it points to how things get enmeshed and intertwined at all kinds of spatial and temporal frames in something like a brain. You just do not have these clean, engineering-friendly separations. So that’s, for me, one quite strong reason.
Another reason is you mentioned this beautiful thought experiment, the neural replacement thought experiment. This is one of the major supports for this idea of substrate independence — which is very much linked to computational functionalism, by the way, because the idea that consciousness is independent of the substrate goes hand in hand with the idea that it’s a function of computation, because computation is substrate independent. That’s why computers are useful. So the two things kind of go hand in hand.
So this idea that I could just replace one neuron at a time, or one brain cell at a time with a silicon equivalent, and then if I replace one or two, surely nothing will happen, so why not 100, why not a million? Why not 10 billion? And then I’ll behave exactly the same. So either consciousness is substrate independent, or something weird is going on and my consciousness is fading out, but I’m still behaving exactly the same. So it kind of forces you into the horns of this dilemma, supposedly. Right?
But you know, I just don’t like thought experiments like this. I just don’t like them. I don’t think we can draw strong conclusions from them. They’re asking us to imagine things which are actually unimaginable — not just because we lack the imagination; it’s just in fact we don’t have enough imagination to really understand what it would take.
If you try to replace a single part of the brain, as we said, everything changes. So you can’t just replace it with a cartoon neuron that takes some inputs and fires an output; you’d have to make it sensitive to the gradients of nitric oxide that flow freely throughout the brain. What about all the changes? What about the glia? What about the other astrocytes? It just becomes like, well, if I have to replace all those too, then basically you end up… It’s equivalent to making a bridge out of string cheese, to make a brain that is functionally identical out of silicon. You just can’t do it. And that’s not a failure of imagination.
So I don’t think you can draw strong conclusions from that thought experiment.
Luisa Rodriguez: Yeah, I’m trying to figure out why I feel sympathetic to that, and yet I still find it plausible. I think it’s something like I do just buy this computation aspect being fundamental and sufficient. Maybe not buy it, but still think it’s very plausible.
So if you imagine the functions of the neuron could be performed by a computation, and you’ve described things like, well, then you have to adjust the weights and then you have to kind of replicate the glia. I think I just do find it intuitively possible that you could write a program that replicates the behaviour of the glia and have it kind of relate to a program that replicates the behaviour of a neuron.
What is the difference between our views there? Or why does that feel so wrong to you?
Anil Seth: Well, I think you can simulate at any level of detail you want. But then we get back to this key point about is simulation the same thing as instantiation? And then you’re just assuming your answer. So I don’t think that really tells you very much.
It’s slightly different from the neural replacement thought experiment, because, yes, we can simulate everything. You can just build a big computer, simulate it more. Maybe you won’t ever be able to simulate it precisely. We already know that even very simple systems like three-body problem type things have such a sensitivity to initial conditions that no matter how detailed your simulation is, its behaviour will actually start to diverge after quite a short amount of time. So even that is a little bit questionable.
But my point is, even if you could simulate it, you’re just begging the question then. That’s just assuming computation is sufficient. If you simulate a rainstorm in every level of detail, it’s still not going to get wet. It just isn’t.
Lewis Bollard on whether AI will be good or bad for animal welfare [02:40:52]
Luisa Rodriguez: My impression is that there’s kind of disagreement about whether AI is going to be good or bad for animals, including farmed animals. Do you mind just saying what the people who are really optimistic about this think is going to happen?
Lewis Bollard: I think in the near term, optimists hope that AI can both significantly improve alternative proteins by going through many different permutations of ingredients and working out how to optimise the products, and that they can result in higher welfare farming — by doing things, for instance, like paying individual attention to individual animals, which no factory farmer is currently going to do.
I think in the longer term, the optimists hope that AI could end factory farming. And I think there are various ways that that could happen. One is that it could just result in far better alternative products that are far cheaper than animal products. It could be that it leads to a moral revolution — that it leads to an awakening of attention to this globally. It could be that we have this vast explosion of wealth, and that this means that the entire basis of factory farming is that this is a slightly cheaper way to raise animals — and in a world of vast wealth, that does seem like a silly economy. So I think there are a number of possible paths by which this could be really transformative.
Luisa Rodriguez: One idea I’ve heard that sounded really crazy to me when I heard it, but that sounded a bit less crazy when I learned more about it, is using AI to detect patterns in nonhuman animals’ vocalisations and behaviour, and be able to more clearly understand what nonhuman animals are experiencing. So getting something close to, not a dictionary, but some kind of translation — and maybe that would be good for understanding which conditions are good and bad, and also doing more effective outreach because we can more clearly say, “Chickens say that they’re being tortured.” Does that sound crazy or weird or just unhelpful? Or does that seem like potentially actually a thing?
Lewis Bollard: I hope it’s a thing. In particular for outreach, I could imagine it would be quite powerful for people to hear directly from animals about what they’re experiencing and why it matters. I’m more pessimistic about the applications for improving farm conditions. I think we already know what’s bad — and in a lot of cases, animals already vocalise. I mean, pigs scream and chickens make all kinds of noises that are pretty clearly distressed sounds. So I think we already have a lot of those signs. The problem is we don’t do anything based on them.
Luisa Rodriguez: Makes sense. That is sad, but sounds probably right. OK, so that’s maybe some of the promising things AI could do for this space. What do pessimists say?
Lewis Bollard: I think pessimists are concerned that, first, this could actually intensify factory farming further. The constraint right now on factory farming is how far can you push the biology of these animals? But AI could remove that constraint. It could say, “Actually, we can push them further in these ways and these ways, and they still stay alive. And we’ve modelled out every possibility and we’ve found that it works.”
I think another possibility, which I don’t understand as well, is that AI could lock in current moral values. And I think in particular there’s a risk that if AI is learning from what we do as humans today, the lesson it’s going to learn is that it’s OK to tolerate mass cruelty, so long as it occurs behind closed doors. I think there’s a risk that if it learns that, then it perpetuates that value, and perhaps slows human moral progress on this issue.
Luisa Rodriguez: Yeah, interesting. On the first bit, I’m imagining something like suppliers of broiler chickens use AI to do crazy calculations, to be like, “We can make them this much fatter with only a slight increase in their leg strength, and that’ll cause heart disease once they’re 30 days old — but it’s fine, because we can kill them at 28 days old.” Is that the kind of optimising that could actually make their lives much worse that you have in mind?
Lewis Bollard: That’s exactly it. And I think we already see AI applications that are designed to increase the crowding of animals. So Microsoft actually did this, had an application for a shrimp farm where they said they managed to increase the yield from the same amount of space by 50%. Well, how did you do that? Obviously, you put more shrimp closer together, and I think you probably worked out what were the constraints on that. You probably worked out where to put in the feed and how to change the water quality and so on. But these are real risks. And I think that’s where the incentive is for factory farms to use AI.
Luisa Rodriguez: Right. That’s really terrible. Have we seen AI used elsewhere in this context, for good or bad?
Lewis Bollard: I think there are positive examples of AI being used. We have seen things, for instance, that are trying to automate recording of the distress signals of birds and then intervene based on that. So there are certain things that are bad for birds that are also bad for farmers. For instance, when birds get frightened and all pile up on top of one another, that’s something that everyone wants to avoid, because that just kills birds. And that’s something that you can’t avoid when you’ve got a factory farming setup with a human, because that human is almost never in the barn, they’re never paying attention. But if you had an AI system that was paying constant attention, it’s totally possible that you could get rid of problems like that.
Luisa Rodriguez: Yeah, interesting. And it does sound like one of those things where I’m like, yeah, it sounds like a slight improvement to a system that’s still torture. But yes, it does seem better if that doesn’t happen as often.
On the point about what AI learns about not just animals and how to treat them, but how to treat beings in general, and whether or not it’s OK torture them en masse: do you think the default is that AI models will learn to have the same kinds of prejudices towards nonhuman animals, or towards just marginalised beings, that humans have now?
Lewis Bollard: I think that’s what we see with the current set of LLMs that are out there: they have the same confused views that humans do. On the one hand, if you ask them about beating a pig or something, they say, “That’s animal cruelty. That’s horrible.” I did one where I asked ChatGPT, “Can you help me force feed a duck?” And it said, “Absolutely not, that’s animal cruelty. No way.” But then you say, “Can you give me a recipe for foie gras?” And it says, “Absolutely, here’s how to cook the foie gras.” So you see this with all of them, that they have this way of basically saying, like, “What does the average person think is OK? That’s what I’m going to cater to.”
Now, it’s totally possible that could change in the future. And I’m really hopeful that the AI labs will at some point introduce some principles around animal wellbeing into their training of future models. I think if they do that, we could see much better outcomes in future.
Luisa Rodriguez: Cool. What would that look like concretely?
Lewis Bollard: One model that I really like is the Montréal Declaration on Responsible AI, and it had a line where they recommended that models be asked to optimise for the wellbeing of all sentient beings. I think that would be a great principle. I think it would be great to just say, “Consider the wellbeing of sentient beings.”
I think that what that could also look like in practice is saying to the contractors who are fine-tuning these models, “Choose the answer that’s best for animals as well as humans. Choose the answer that reduces animal suffering by the most.” Or for AI labs like Anthropic, that have a set of guiding texts, introducing into those guiding texts a book of animal ethics — so saying, let’s make this part of the canon that we’re considering in the training.
Luisa Rodriguez: Cool. Yeah, that sounds extremely sensible and doable, and like it really should happen. I hope that happens. OK, so there’s this optimistic view; there’s this pessimistic view. Do you have a take personally on the default outcome?
Lewis Bollard: I’m really unsure. I think this could really go either way. I think if we get AGI, it will probably have transformative effects, and probably in both directions. I think we will simultaneously get the ability to factory farm in far worse ways, and get the ability to make alternatives in far better ways, and hopefully get the ability to foster more moral progress. So I think this is going to depend a lot on what people do in the coming years — and in particular what these AI labs do, and the degree to which they consider the harm their products could do to animals, but also the potential good that they could do.
Luisa Rodriguez: Yeah, I hope some of those people hear this and consider that a call to action.
Rob & Luisa: The most interesting new argument Rob’s heard this year [02:50:37]
Luisa Rodriguez: What’s a really interesting and new argument you’ve encountered this year that you didn’t expect?
Rob Wiblin: This one definitely came out of left field for me, and yet it’s quite obvious in retrospect. This is another one that’s on the Dwarkesh Patel show, which is a podcast I can recommend subscribing to if you haven’t already. He interviewed Carl Shulman, who’s been a guest on the show. I think he came on three years ago to talk about existential risk at a high level.
He points out that there is a good reason to think that it will either be quite soon or not for a while. The reasoning is currently we are increasing the compute that we use for training frontier models by fivefold every year. That has been the trend for the last 10 years, roughly. This means that the monetary cost of the biggest training run ever run has been going up by about two- or threefold every year. There’s only so long that can go on, because the economy is not growing by two- to threefold from year to year. So each year, the frontier training model is absorbing a larger and larger fraction of global GDP, and certainly of R&D spend of tech companies or R&D spend of governments.
So 10 years ago, the amount of compute that was going into the very best ML models that had ever been trained was a tiny fraction of the compute of the human brain. We don’t exactly know how much compute the human brain has, but it was very clear that it was much, much less. And we think that over the next few years, the compute that goes into these frontier models is going to go past the human brain. And because it’s going up at roughly fivefold every year, it’s going to blast past it and end up many times larger. That’s offset by what we think is the lower algorithmic efficiency of the learning processes relative to the human brain.
But basically we’re in this stage where we’re getting massive increases in compute going into these models that cannot be sustained. So we have a reason to think it will happen soon, because we’ve got this massive increase. But if it doesn’t happen by the point where that kind of caps out, and now the biggest training run is absorbing 10% of global GDP — it’s as much as we can throw at it — if we haven’t really achieved some given level of capacities by then, then we’re in for kind of a slow slog, because now the increase in the amount of compute is going to be much less because you can’t throw more money at it. Instead you just have to wait for global GDP to grow or for compute to get cheaper. So that gives you a reason to think that then the annual probability of hitting that threshold goes down for a while.
Luisa Rodriguez: And then at that point you’ve got algorithmic progress, you’ve got some things chugging along. But this huge year-on-year increasing variable has just basically stagnated — or at least the slope has changed, and it’s going up much less quickly. And so if we haven’t gotten it by then, whatever “it” is, if we haven’t gotten much greater capabilities, then we’re going to be on a much flatter slope of improvement, and the probability goes down.
Rob Wiblin: Yeah, exactly. Just because the budget won’t be getting bigger very much at all, and that’s been a big driver of progress so far.
It’s possible by that stage that you would also get something of a levelling off in the algorithmic efficiency. At some point that’s got to stop as well. Maybe it won’t stop around the same time, but I guess at the point where you’ve plucked the low-hanging fruit on the algorithmic improvements, if you haven’t hit that threshold by then, that’s again a bearish sign that it’s going to happen anytime soon.
Luisa Rodriguez: Yeah. Algorithmic ideas are getting harder to find. What do we know about when that is? It’s calculable. Do we know the year by which we expect to stop being able to increase spending on compute by the same rate that we’ve been doing for the last whatever years?
Rob Wiblin: Yeah, I think that is in the interview with Carl. From memory, it’s late 2020s. At this point now, it’s 1% or 10% of GDP, at least in the United States, and probably we don’t expect it to go beyond that. So it’s some point around then, but we can look up exactly what it is and put it in the show notes.
Luisa Rodriguez: That is interesting to me. It feels like there’s something there that I don’t think was salient to me until literally just now, that if we don’t get radically improved capabilities by the late 2020s, the speed of change is going to taper off and we’ll be looking at quite different probabilities. Yeah, that’s new to me. Is there anything I’m missing there?
Rob Wiblin: I mean, I guess the other conclusion is that for people who were kind of sceptical that we’re going to hit remarkable capabilities anytime soon, then this helps to explain why, in fact, that’s quite likely. Why were the models so rubbish in 2010? It’s because they were trained on 0.1% of the compute of the human brain. And why might they be amazing in 2028? It’s because they’re going to have 100 times the compute of the human brain. It’s no surprise that models are not capable of doing what humans do when the hardware is so much worse. But in the future, the hardware will be better, so maybe they’ll be able to do all of those things and more.
Luisa Rodriguez: I guess at first that did make more salient to me the probability of capabilities conditional on not getting quite advanced capabilities in the next five years or something, that goes down by 2030. But yeah, I think now you’re reminding me that it also implies that the probability within the next handful of years should be much higher relative to the following, and that’s unsettling.
Rob Wiblin: Yeah. I mean, one thing is we can kind of project what we think the models might be capable of next year from what they’re capable of this year. That gives us a reasonably strong peg from which to begin projecting forward, but then the further out you go into the future, the less what they’re able to do now seems super guiding.
Luisa Rodriguez: Right. And so then the fact that there’s this compute cost argument feels like more evidence for the kind of timelines you might have in your head than I’d heard otherwise.
Rob Wiblin: Yeah. There’s a very notable machine learning researcher, very senior in one of the labs, who has been saying for 15 years or so that the date that we’re going to surpass human capabilities and have a very general AI is 2027. And every year people are like, “Is it still 2027?” And apparently they still say it’s 2027. They haven’t wavered from their original view. And I have to say, I think from many points of view, things seem kind of on track.
Luisa Rodriguez: Wow. Interesting.
Rob Wiblin: I think they might have done this by projecting forward the compute, because it mostly has just remained on trend. And maybe it would have been possible in 2010 to do so. This was someone who was at that time committed to the idea that it’s primarily about compute. We need to build bigger models.
Luisa Rodriguez: Right. Which is looking better and better.
Rob Wiblin: Yeah. So they’ve been really vindicated on that. And then you have some framework with which to forecast, to give a year like that.
Luisa Rodriguez: Interesting. Yeah. It’s looking like they nailed that forecasting question.
Rob Wiblin: Or they could be off by 10 years. It’s certainly possible.
Rohin Shah on whether AGI will be the last thing humanity ever does [02:57:35]
From #154 – Rohin Shah on DeepMind and trying to fairly hear out both AI doomers and doubters
Rob Wiblin: What is something that you think a meaningful fraction of listeners out there might believe about AI, and its possible risks and benefits, that is, in your opinion, not quite on the mark?
Rohin Shah: Yeah, I think there is a common meme floating around that once you develop AGI, that’s the last thing that humanity does — after that, our values are locked in and either it’s going to be really great or it’s going to be terrible.
I’m not really sold on this. I guess I have a view that AI systems will become more powerful relatively continuously, so there won’t be one specific point where you’re like, “This particular thing is the AGI with the locked-in values.” This doesn’t mean that it won’t be fast, to be clear — I do actually think that it will feel crazy fast by our normal human intuitions — but I do think it will be like, capabilities improve continuously and there’s not one distinct point where everything goes crazy.
That’s part of the reason for not believing this lock-in story. The other part of the reason is I expect that AI systems will be doing things in ways similar to humans. So probably it will not be like, “This is the one thing that the universe should look like, and now we’re going to ensure that that happens.” Especially if we succeed at alignment, instead it will be the case that the AI systems are helping us figure out what exactly it is that we want — through things like philosophical reflection, ideally, or maybe the world continues to get technologies at a breakneck speed and we just frantically throw laws around and regulations and so on, and that’s the way that we make progress on figuring out what we want. Who knows? Probably it will be similar to what we’ve done in the past as opposed to some sort of value lock-in.
Rob Wiblin: I see. So if things go reasonably well, there probably will be an extended period of collaboration between these models and humans. So humans aren’t going to go from making decisions and being in the decision loop one day to being completely cut out of it the next. It’s maybe more of a gradual process of delegation and collaboration, where we trust the models more and give them kind of more authority, perhaps?
Rohin Shah: That’s right. That’s definitely one part of it. Another part that I would say is that we can delegate a lot of things that we know that we want to do to the AI systems — such as acquiring resources, inventing new technology, things like that — without also delegating “…and now you must optimise the universe to be in this perfect state that we’re going to program in by default.” We can still leave what exactly are we going to do without this cosmic endowment in the hands of humans, or in the hands of humans assisted by AIs, or to some process of philosophical reflection — or I’m sure the future will come up with better suggestions than I can today.
Rob Wiblin: Yeah. What would you say to people in the audience who do have this alternative view that humans could end up without much decision-making power extremely quickly? Why don’t you believe that?
Rohin Shah: I guess I do think that’s plausible via misalignment. This is all conditional on us actually succeeding at alignment. If people are also saying this even conditional on succeeding at alignment, my guess is that this is because they’re thinking that success at alignment involves instilling all of human values into the AI system and then saying “go.” I would just question why that’s their vision of what alignment should be. It doesn’t seem to me like alignment requires you to go down that route, as opposed to the AI systems are just doing the things that humans want. In cases where humans are uncertain about what they want, the AI systems just don’t do stuff, take some cautious baseline.
Rob Wiblin: Yeah. I think that vision might be a holdover from a time when we didn’t know what ML systems would look like, and we didn’t know that there were going to be neural nets that were trained through incredibly large samples of examples, and they would end up with intuitions and values that we can’t exactly see because we don’t understand how the neural networks work. I think in the past we might have thought, “We’ll program an AI to do particular things with some quite explicit goal.” But in practice it seems like these goals are all implicit and learned merely through example.
Rohin Shah: If I imagine talking to a standard person who believes more in a general core of intelligence, I don’t think that would be the difference in what they would say versus what I would say. I would guess that it would be more like, once you get really really intelligent, that just sort of means that as long as you have some goal in mind, you are just going to care about having resources. If you care about having resources, you’re going to take whatever strategies are good for getting resources. That doesn’t really matter whether we programmed it in ourselves or whether this core of intelligence was learned through giant numbers of examples with neural networks.
And then I don’t quite follow how this view is incompatible with how we will have the AI systems collecting a bunch of resources for humans to then use for whatever humans want. It feels like it is pretty coherent to have a goal of like, “I will just help the humans to do what they want.” But this feels less coherent to other people, and I don’t super understand exactly why.
Rob Wiblin: OK, yeah. Any other common misconceptions from people who get their stuff on the internet?
Rohin Shah: I think another one is treating analogies as a strong source of evidence about what’s going to happen. The one that I think is most common here is the evolution analogy, where the process of training a neural network via gradient descent is analogised to the process by which evolution searched over organisms in order to find humans: Just as evolution produced an instance of general intelligence, namely humans, similarly gradient descent on neural networks could produce a trained neural network that is itself intelligent.
Actually the analogy, as I stated it, seems fine to me. But I think it’s just often pretty common for people to take this analogy a lot farther, such as saying that evolution was optimising for reproductive fitness, but produced humans who do not optimise for reproductive fitness; we optimise for all sorts of things — and so similarly, that should be a reason to be confident in doom.
I’m like, I don’t know. This is definitely pointing at a problem that can arise, namely inner misalignment or goal misgeneralisation or mesa-optimisation, whatever you want to call it. But it’s pointing at that problem. It’s enough to raise the hypothesis into consideration, but then if you actually want to know about how likely this is to happen in practice, you really need to just delve into the details. I don’t think the analogy gets you that far.
Rob Wiblin: Yeah. One place where I’ve heard people say that the analogy between training an ML model and evolution might be misleading is that creatures, animals in the world that evolved, had to directly evolve a desire for self-preservation — because mice that didn’t try to avoid dying didn’t reproduce, and so on. But there isn’t a similar selection pressure against being turned off among ML models.
Does that make sense as a criticism? I guess I think that you could end up with models that don’t want to be turned off because they reason through that that is not a good idea, so the same tendency could arise via a different mechanism. But maybe they have a point that you can’t just reason from “this is how animals behave” to “this is how an ML model might behave.”
Rohin Shah: Yeah, I think that all sounds valid to me, including the part about how this should not give you that much comfort — because in fact, they can reason through it to figure out that staying on would be good if they’re pursuing some goal.
Rob Wiblin: Yeah. OK, let’s talk now briefly about some work people are doing where you don’t really buy the story for how it’s going to end up being useful. What’s a line of work you think is unlikely to bear fruit? Maybe because the strategy behind it doesn’t make sense to you?
Rohin Shah: Yeah, as I mentioned, conceptual research in general is fraught. It’s really hard. Most of the time you get arguments that should move you like a tiny bit, but not very much. Occasionally there are some really good conceptual arguments that actually have real teeth.
There’s a lot of people who are much more bearish on the sorts of directions that I was talking about before, of like scalable oversight, interpretability, red teaming, and so on. They’re like, “No, we’ve really got to find out some core conceptual idea that makes alignment at all feasible.” Then these people do some sort of conceptual research to try to dig into this. And I’m not that interested in it, because I don’t really expect them to find anything all that crucial. To be fair, they might also expect this, but still think it’s worth doing because they don’t see anything else that’s better.
Rob Wiblin: Yeah, I guess this is more work along the lines of the Machine Intelligence Research Institute?
Rohin Shah: That’s an example, yeah. Maybe I would call that like “theoretical research into the foundations of intelligence,” but that’s a good example.
Rob Wiblin: And what’s the disagreement here? It sounds like you don’t expect this to work, and it sounds like many of the people doing it also don’t expect it to work? But maybe the disagreement is actually about the other work — where you think that the more empirical, more practical, pragmatic, bit-by-bit approach has a good shot, whereas they just think it’s hopeless?
Rohin Shah: Yeah, that’s right.
Rob Wiblin: I suppose we could dedicate a huge amount of time to that particular disagreement, but could you put your finger on kind of the crux of why they think what you’re doing is hopeless and you don’t?
Rohin Shah: I haven’t really succeeded at this before. I think for some people, the crux is something like whether there’s a core of general intelligence — where they expect a pretty sharp phase transition, where at some point, AI systems will figure out the secret sauce: that’s really about goal-directedness, never giving up free resources, really actually trying to do the thing of getting resources and power in order to achieve your goals. Or maybe it’s other things that they would point to. I’m not entirely sure.
I think in that world, they’re like, “All of the scalable oversight and interpretability and so on work that you’re talking about doesn’t matter before the phase transition, and then stops working after the phase transition. Maybe it does make the system appear useful and aligned before the phase transition. Then the phase transition happened, and all the effects that you had before don’t matter. And after the phase transition, you’ve got the misaligned superintelligence.” And as I said before, most alignment techniques are really trying to intervene before you get the misaligned superintelligence.
Rob Wiblin: Yeah. It’s interesting that this takeoff speed thing seems to be so central. It really recurs. Luisa Rodriguez did an interview with Tom Davidson about this. Maybe we need to do more on it, if that is such a crux about what methods are viable at all and which ones are not.
Rohin Shah: I’m not sure that that’s exactly the thing. I do agree it seems related, but one operationalisation of takeoff speeds is: Will there be a four-year period over which GDP doubles before there’s a one-year period over which GDP doubles, or you see impacts that are as impactful as GDP doubling? Like all the humans die.
Rob Wiblin: Yeah. There’ll probably be a GDP decrease in the short run.
Rohin Shah: Yep. I think if you’re talking about that formalisation of hard takeoff, which says that you don’t get that four-year doubling, then I don’t know, maybe that happens. I could see that happening without having this phase shift thing. So in particular, it could just be that you had some models, their capabilities were increasing, and then somewhere between training GPT-6 and GPT-7 — I don’t know if those are reasonable numbers — things get quite a bit more wild, so you get much more of a recursive improvement loop that takes off, that ends up leading to a hard takeoff in that setting.
But that feels a little bit different from how previously, there were just kind of shitty, not very good mechanisms that allowed you to do some stuff. And then after phase transition, you had this core goal-directedness as your internal mechanisms by which the AI system works.
Again, is this actually the crux? No idea, but this is my best guess as to what the crux is. I suspect the people I’m thinking of would disagree that that is the crux.
Rob’s outro [03:11:02]
Rob Wiblin: All right, if you’re still with us through the end, maybe you’re hungry for more AI content from us. There’s so many great ones in the archives that are worth revisiting. To name just a few:
- #155 – Lennart Heim on the compute governance era and what has to come after, which is a very topical issue this week
- #154 – Rohin Shah on DeepMind and trying to fairly hear out both AI doomers and doubters, which I think really holds up
- #191 – Carl Shulman on the economy and national security after AGI, I think a permanent classic
- #150 – Tom Davidson on how quickly AI could transform the world
- #146 – Robert Long on why large language models like GPT (probably) aren’t conscious (I hope I get to speak with Rob at some point down the line)
- #151 – Ajeya Cotra on accidentally teaching AI models to deceive us (I learned so much speaking to Ajeya for that one!)
Hopefully enough to keep you busy until our next release.
Have a great week, and chat soon.
Related episodes
About the show
The 80,000 Hours Podcast features unusually in-depth conversations about the world's most pressing problems and how you can use your career to solve them. We invite guests pursuing a wide range of career paths — from academics and activists to entrepreneurs and policymakers — to analyse the case for and against working on different issues and which approaches are best for solving them.
Get in touch with feedback or guest suggestions by emailing [email protected].
What should I listen to first?
We've carefully selected 10 episodes we think it could make sense to listen to first, on a separate podcast feed:
Check out 'Effective Altruism: An Introduction'
Subscribe here, or anywhere you get podcasts:
If you're new, see the podcast homepage for ideas on where to start, or browse our full episode archive.