Enjoyed the episode? Want to listen later? Subscribe by searching “80,000 Hours” wherever you get your podcasts, or click one of the buttons below:

…there’s two parts to the problem. The first is calling someone’s attention to a place. I think that’s the harder part by far. You can’t just bury a thing, because hundreds and millions of years is long enough that the surface of the earth is no longer the surface of the earth…

Paul Christiano

Imagine that, one day, humanity dies out. At some point, many millions of years later, intelligent life might well evolve again. Is there any message we could leave that would reliably help them out?

In his second appearance on the 80,000 Hours Podcast, machine learning researcher and polymath Paul Christiano suggests we try to answer this question with a related thought experiment: are there any messages we might want to send back to our ancestors in the year 1700 that would have made history likely to go in a better direction than it did? It seems there probably is.

We could tell them hard-won lessons from history; mention some research questions we wish we’d started addressing earlier; hand over all the social science we have that fosters peace and cooperation; and at the same time steer clear of engineering hints that would speed up the development of dangerous weapons.

But, as Christiano points out, even if we could satisfactorily figure out what we’d like to be able to tell our ancestors, that’s just the first challenge. We’d need to leave the message somewhere that they could identify and dig up. While there are some promising options, this turns out to be remarkably hard to do, as anything we put on the Earth’s surface quickly gets buried far underground.

But even if we figure out a satisfactory message, and a ways to ensure it’s found, a civilization this far in the future won’t speak any language like our own. And being another species, they presumably won’t share as many fundamental concepts with us as humans from 1700. If we knew a way to leave them thousands of books and pictures in a material that wouldn’t break down, would they be able to decipher what we meant to tell them, or would it simply remain a mystery?

That’s just one of many playful questions discussed in today’s episode with Christiano — a frequent writer who’s willing to brave questions that others find too strange or hard to grapple with.

We also talk about why divesting a little bit from harmful companies might be more useful than I’d been thinking. Or whether creatine might make us a bit smarter, and carbon dioxide filled conference rooms make us a lot stupider.

Finally, we get a big update on progress in machine learning and efforts to make sure it’s reliably aligned with our goals, which is Paul’s main research project. He responds to the views that DeepMind’s Pushmeet Kohli espoused in a previous episode, and we discuss whether we’d be better off if AI progress turned out to be most limited by algorithmic insights, or by our ability to manufacture enough computer processors.

Some other issues that come up along the way include:

  • Are there any supplements people can take that make them think better?
  • What implications do our views on meta-ethics have for aligning AI with our goals?
  • Is there much of a risk that the future will contain anything optimised for causing harm?
  • An outtake about the implications of decision theory, which we decided was too confusing and confused to stay in the main recording.

Interested in applying this thinking to your career?

If you found this interesting, and are thinking through how considerations like these might affect your career choices, our team might be able to speak with you one-on-one. We can help you consider your options, make connections with others working on similar issues, and possibly even help you find jobs or funding opportunities.

Apply to speak with our team

Get this episode by subscribing to our podcast on the world’s most pressing problems and how to solve them: type 80,000 Hours into your podcasting app. Or read the transcript below.

The 80,000 Hours Podcast is produced by Keiran Harris.


My overall picture of alignment has changed a ton since six years ago. I would say that’s basically because six years ago, I reasoned incorrectly about lots of things. It’s a complicated area. I had a bunch of conclusions I reached. Lots of the conclusions were wrong. That was a mistake. Maybe an example of a salient update is I used to think of needing to hit this, like you really need to have an AI system that understands exactly what humans want over the very long term.

I think my perspective shifted more to something maybe more like a commonsensical perspective of, if you have a system which sort of respects short-term human preferences well enough, then you can retain this human ability to course correct down the line. You don’t need to appreciate the full complexity of what humans want, you mostly just need to have a sufficiently good understanding of what we mean by this course correction, or remaining in control, or remaining informed about the situation.

If you imagine the first time that humans could have discovered a message sent by a previous civilization, it would have been– I mean it depends a little bit on how you’re able to work this out, but probably at least like a hundred years ago. At that point, the message might’ve been sent from a civilization which was much more technologically sophisticated than they are. Also, which has experienced the entire arc of civilization followed by extinction.

At a minimum, it seems like you could really change the path of their technological development by selectively trying to spell out for them or show them how to achieve certain goals. You could also attempt, although it seems like a little bit more speculative, to help set them on a better course and be like, “Really, you should be concerned about killing everyone.” It’s like, “Here’s some guidance on how to set up institutions so they don’t kill every new one.”

I’m very concerned about AI alignment, so I’d be very interested in as much as possible being like, “Here’s a thing, which upon deliberation we thought was a problem. You probably aren’t thinking about it now, but FYI, be aware.” I do think that would put a community of people working on that problem and that future civilization into a qualitatively different place.

It’s very hard to figure out what the impact would be had we stumbled across these very detailed messages from the past civilization. I do think it could have a huge technological effect on the trajectory of development, and also reasonably likely have a reasonable effect either on deliberation and decisions about how to organize yourselves or on other intellectual projects.

I think ethical consumption is actually a really good comparison point for divestment. Where you could say, “I want to consume fewer animal products in order to decrease the number of animals we get produced.” And there you have a very similar discussion about what the relative elasticities are like. One way you could think about it is if you decrease demand by 1%, you decrease labor force by 1% and you decrease the availability of capital by 1%. If you did all of those things then you would kind of decrease the total amount produced by 1% roughly under some assumptions about how natural resources work and so on.

The credit for that 1% decrease is somehow divided up across the various factors on the supply side and demand side and the elasticities determine how it is divided up. I think it’s not like 100% consumption or like 100% of labor, I think all of those factors are participating to a nontrivial extent.

I haven’t done this analysis really carefully and I think it would be a really interesting thing to do and would be a good motivation if I wanted to put together the animal welfare divestment fund. I think under pretty plausible assumptions you’re getting a lot more bang for your buck from the divestment than from the consumption choices. Probably the investment thing would be relatively small compared to your total consumption pattern. It wouldn’t be like replacing your ethical consumption choice. But when you would have bought one dollar of animal agricultural companies and instead you sell $10, I think stuff like that could be justified if you thought that ethical consumption was a good thing.


Robert Wiblin: Hi listeners, this is the 80,000 Hours Podcast, where each week we have an unusually in-depth conversation about one of the world’s most pressing problems and how you can use your career to solve it. I’m Rob Wiblin, Director of Research at 80,000 Hours.

Last year for episode 44, I interviewed Paul Christiano for almost four hours about his views on a wide range of topics, including how he thinks AI will affect life in the 21st century and how he thinks we can increase the chances that those impacts are positive.

That episode was very popular, and Paul is a highly creative and wide-ranging thinker, who’s always writing new blog posts about topics that range from the entirely sensible to the especially strange, so I thought it would be fun to get him back on to talk about what he’s been thinking about lately.

On the sensible side, we talk about how divesting from harmful companies might be a more effective way to improve the world than most people have previously thought.

On the stranger side, we think about whether there are any messages we could leave future civilizations that could help them out should humanity go extinct, but intelligent life then to re-evolve on Earth at some point in the far in the future.

We also talk about some speculative papers suggesting that taking creatine supplements might make people a bit sharper, while being in a stuffy carbon-dioxide-filled room might make people temporarily stupider.

Honestly we just have a lot of fun chatting about some things we personally find interesting.

On the more practically useful side of things though, I get his reaction to my interview with Pushmeet Kohli at DeepMind for episode 58 a few months back.

I should warn people that in retrospect this episode is a bit heavy on jargon, and might be harder to follow for someone who is new to the show. That’s going to get harder to avoid over time, as we want to dig deeper into topics that we’ve already introduced in previous episodes, but we went a bit further than I’d ideally like this time.

Folks might get more out of it if they first listen to the previous interview with Paul back in episode 44 – that’s Dr. Paul Christiano on how we’ll hand the future off to AI, & solving the alignment problem. But a majority of this episode should still make sense even if you haven’t listened to that one.

This episode also has the first outtake we’ve made. I encouraged Paul to try recording a section in the interview on a subfield of philosophy called decision theory, and some heterodox ideas that have come out of it, like superrationality and acausal cooperation.

I planned to spend half an hour on that, but that was really very silly of me. We’d need a full hour just to clearly outline the problems of decision theory and the proposed solutions, if we could do it at all. And explaining and justifying the possible implications of various unorthodox solutions that are out there could go on for another hour or two, and it is hard to do it all without a whiteboard.

So, by the end of that section, we thought it was more or less a trainwreck, though potentially quite a funny trainwreck for the right listener. We do come across as a touch insane, which I’m fairly sure we’re not.

So if you’d like to be a bit confused and hear what it sounds like for a technical interview to not really work out, you can find a link to the MP3 for that section in the show notes. If you happen to have the same level of understanding of decision theory that I did going into the conversation, you might even learn something. But I can’t especially recommend listening to it as a good use of time.

Instead, we’ll come back and give decision theory a proper treatment in some other episode in the future.

Alright, with all of that out of the way, here’s Paul Christiano.

Robert Wiblin: My guest today is Dr. Paul Christiano. Back by popular demand, making his second appearance on The 80,000 Hours Podcast. Paul completed a PhD in Theoretical Computer Science at UC Berkeley and is now a technical researcher at OpenAI, working on aligning artificial intelligence with human values. He blogs about that work at ai-alignment.com and about a wide range of other interesting topics at sideways-view.com. On top of that, Paul is not only a scholar, but also always and everywhere a gentleman. Thanks for coming on the podcast, Paul.

Paul Christiano: Thanks for having me back.

Robert Wiblin: I hope to talk about some of the interesting things you’ve been blogging about lately, as well as what’s new in AI reliability and robustness research. First, what are you doing at the moment and why do you think it’s important work?

Paul Christiano: I guess I’m spending most of my time working on technical AI safety at OpenAI. I think the basic story is similar to a year ago, that is, building AI systems that don’t do what we want them to do, that push the long-term future in a direction that we don’t like, seems like one of the main ways that we can mess up our long-term future. That still seems basically right. I maybe moved a little bit more towards that being a smaller fraction of the total problem, but it’s still a big chunk. It seems like this is a really natural way for me to work on it directly, so I think I’m just going to keep hacking away at that. That’s the high level. I think we’re going to get into a lot of the details, probably in some questions.

What’s new in AI research?

Robert Wiblin: We started recording the first episode last year, almost exactly a year ago, actually. When it comes to AI safety research, and I guess your general predictions about how advances in AI are going to play out, have your opinions shifted at all? If so, how?

Paul Christiano: I think the last year has felt a lot like, there’s no big surprises and things are settling down. Maybe this has been part of a broader trend where my view is– five years ago, my view was bouncing around a ton every year, three years ago, it was bouncing around a little bit, over the last year, has bounced around even less, so I think my views haven’t shifted a huge amount. I think we haven’t had either big downward or upward surprises in terms of overall AI progress. That is, I think we’ve seen things that are probably consistent with both concerns about AI being developed very quickly, but also like the possibility of it taking a very, very long time.

In terms of our approach to AI alignment, again, I think my understanding of what there is to be done has solidified a little bit. It’s moved more– continues to move from some broad ideas of what should be done, to here at the particular groups implementing things. That’s continuing to happen but there haven’t been big surprises.

Robert Wiblin: Yes, last time we spoke about a bunch of different methods, including AI safety via debate. I mean different AIs debate one another and then we’re in a position to– Well, hopefully, we’re in a position to adjudicate which one is right. Is there any progress on that approach or any of the other ones that we spoke about?

Paul Christiano: Yes. I work on a sub-team at OpenAI that probably works on that idea, the safety via debate, as well as amplification. I would say that over the last year, a lot of the work has been on some combination of building up capacity and infrastructure to make those things happen, such as scaling up language models and integrating with good large language models, so it’s things that understand some of the reasoning humans do when they talk or when they answer questions. Trying to get that to the point where we can actually start to see the phenomena that we’re interested in.

I think there’s probably, generally, been some convergence in terms of how different– at least different parts within OpenAI, but I think also across organizations have been thinking about possible approaches. E.g. I guess within OpenAI, for people thinking about this really long-term problem, we mostly think about amplification and debates.
There’s this on-paper argument that those two techniques ought to be very similar. I think they maybe suggest different emphases on which experiments you run in the short term. I think as we’ve been trying things, both the people who started more on the amplification side are running experiments that look more similar to what you might suspect from the debate perspective, and also vice versa, so I think there’s less and less big disagreements about that.

I think similarly, independent thinking at, most people think about long-term safety at DeepMind, I guess I feel like there’s less gap between us now. Maybe that’s good because it’s easier to communicate and be on the same page and more shared understanding of what we’re doing. I think that compared to a year ago, things feel like this is just related to things settling down and maturing people are– It’s still a long way from being like we’re almost any normal field of academic inquiries, it’s nowhere close to that.

Robert Wiblin: Don’t people disagree more or just have very different perspectives and–?

Paul Christiano: Yes, they disagree more, they less have of a common sense. They have less of a mature method of inquiry, which everyone expects to make progress. It’s still a long way away from more mature areas, but it is moving in that direction.

Robert Wiblin: This is maybe a bit random, but do you feel like academic fields are often held back by the fact that they codify particular methods, and particular kinds of evidence, and particular worldviews that blinkers them to other options? Maybe it’s an advantage to have this sort of research to be a bit more freewheeling and diverse.

Paul Christiano: I think that’s an interesting question. I guess I would normally think of it as an academic field that’s characterized by this set of tools and its understanding of what constitutes progress. If you think of the field as characterized by problems, then it makes sense to talk about the field being blinkered in this way or having value left on the table. If you think about the field as characterized by this set of tools, then that’s the thing they’re bringing to the table.

I would say from that perspective, it’s both bad that you can’t use some existing set of tools. That’s a bummer and it’s not clear. I think there’s a lot of debate with people about how much we should ultimately expect the solutions to look like using existing set of tools. That’s bad. It’s also sort of a little bit bad to not have yet like mature tools that are specific to this kind of inquiry. I think that’s more how I think of it.

I think many academic fields are not good at– If you think of them as, they’re answering this set of questions and they’re the only people answering the set of questions maybe they’re not really set up that optimally to do that. I think I’ve shifted to not mostly thinking of academic fields that way.

Robert Wiblin: I guess economics has problems with its method, but then those are covered by other fields that use different methods. That would be the hope.

Paul Christiano: That’s the hope.

Robert Wiblin: [laughs]

Paul Christiano: I think economics is an interesting case since there are a bunch of problems. I think that most fields do have this like their bunch of problems sort of fit in economics and the set of tools economists use. If there’s a problem that fits in nominally, the economics meant– under their purview, which is not a good fit for their tools, then you’re in sort of a weird place. I think economics also may be, because of there being this broad set of problems that fit in their domain like– I think it’s not- this distinction is not obvious. There’s some–

Robert Wiblin: It’s an imperial field kind of, notoriously. The ones that go and colonize it are like every other or questions that it can touch on. Then sometimes, I guess, yes, the method might be well suited to those questions that it wants to tackle.

Paul Christiano: Yes, although in some sense, if you view it as like a field that has a set of tools that it’s using, it’s very reasonable to be going out and finding other problems that are actually– If you’re actually correct about them being amenable to those tools. I think there’s also a thing on the reverse, where like, you don’t want to be really that staking claim on these questions. You should be willing to say, “Look, these are questions that sort of we’ve traditionally answered but there are other people.” Sometimes, those can be answered in other ways.

Robert Wiblin: Yes. It’s an interesting framing of problems with academic fields that its kind of not so much that the field is bad, but maybe that it’s tackling the wrong problem or, it’s tackling problems that are mismatched to the methods.

Paul Christiano: I think about this a lot maybe because in computer science, you more clearly have problems which like, it’s not so much staked out. It’s not like, “Here’s a problem and this problem fits in a domain.” It’s more like, there are several different approaches, like there are people who come in with a statistics training, and there are people who will come in as theorists, and there are people who come in as like various flavors of practitioners or experimentalists, and they sort of– You can see the sub-fields have different ways they would attack these problems. It’s more like you understand, like, this sub-field’s going to attack this problem in this way, and it’s a reasonable division of labor.

Robert Wiblin: Let’s back up. You talked about running experiments, what kind of experiments are they concretely?

Paul Christiano: Yes. I think last time we talked, we discussed three kinds of big uncertainties or room for making progress. One of them, which isn’t super relevant to experiments is figuring out conceptual questions about how are we going to approach, like find some scalable approach to alignment? The other two difficulties, both were very amenable to different kinds of experiments.

One of our experiments involving humans, where you start to understand something about the character of human reasoning, you understand sort of– We have some hopes about human reasoning, we hope that, in some sense, given enough time or given enough resources, humans are universal and could answer some very broad set of questions if they just had enough time, enough room to reflect. It’s like one class of experiments that’s sort of getting at that understanding. In what sense is that true? In what sense is that false? That’s a family of experiments I’m very excited about.

OpenAI has recently started hiring people. We just hired two people who will be scaling up those experiments here. Ought has been focused on those experiments and it’s starting to really scale up their work. That’s one family of experiments. There’s a second difficulty or third difficulty, which is, understanding how both theoretical ideas about alignment and also, these facts about how human reasoning work. How do those all tie together with machine learning?

Ultimately, at the end of the day, we want to use these ideas to produce objectives that can be used to train ML systems. That involves actually engaging with a bunch of detail about how ML systems work. Some of the experiments are directly testing those details, so saying, “Can you use this kind of objective? Can machine learning systems learn this kind of pattern or this kind of behavior?” Some of them are just experiments that are like– maybe more in the family of, “We expect them to work if you just iterate a little bit”. So you sort of expect there is going to be some way that we can apply language models to this kind of task, but we need to think a little bit about how to do that and take a few swings at it.

Robert Wiblin: I saw that OpenAI was trying to hire social scientists and kind of making the case that social scientists should get more interested in AI alignment research. Is this the kind of work that they’re doing, running these experiments or designing them?

Paul Christiano: Yes, that’s right. I think we hired– we’re aiming initially to hire one person in that role. I think we’ve now made that hire and they’re starting on Monday. They’ll be doing experiments, trying to understand like, if we want to train to use human reasoning in some sense as a ground truth or gold standard, like how do we think about that? How do we think about– In what sense could you scale up human reasoning to answer hard questions? In what sense are humans a good judge of correctness or incentivize honest behavior between two debaters?

Some of that is like, what are empirically the conditions under which humans are able to do certain kinds of tasks? Some of them are more conceptual issues, where like humans are just the way you get traction on that because humans are the only systems we have access to that are very good at this kind of flexible, rich, broad reasoning.

Robert Wiblin: I mentioned on Twitter and Facebook that I was going to be interviewing you again and a listener wrote in with a question. They had heard, I think, that you thought there’s a decent probability that things would work out okay or that the universe would still have quite a lot of value, even if we didn’t have a solid technical solution to AI alignment and AI took over and was very influential. What’s the reasoning there, if that’s a correct understanding?

Paul Christiano: I think there’s a bunch of ways you could imagine ending up with AI systems that do what we want them to do. One approach which is, as a theorist, the one that’s most appealing to me, is to have some really good understanding on paper. Like, “Here’s how you train an AI to do what you want,” and we just nail the problem in the abstract before we’ve even necessarily built a really powerful AI system.

This is the optimistic case where we’ve really solved alignment, it’s really nailed. There’s maybe a second category where you’re like, or this broad spectrum where you’re like, “We don’t really have a great understanding on paper of a fully general way to do this, but as we actually get experienced with these systems, we get to try a bunch of stuff.” We get to see what works, we got to– If we’re concerned about a system failing, we can try and run it in a bunch of exotic cases and just try and throw stuff at it and see. Maybe if its like- we stress-tested enough on something that actually works.

Maybe we can’t really understand a principled way to extract exactly what we value, but we can do well enough at constructing proxies. There’s this giant class of cases where you like, don’t really have an on-paper understanding but you can still wing it. I think that’s probably not what the asker was asking about. There’s a further case where you try and do that and you do really poorly, and as you’re doing it, you’re like, “Man, it turns out these systems do just fail in increasingly catastrophic ways. Drawing the line out, we think that could be really bad.”

I think for now, even in that worst case that you don’t have an on-paper understanding, you can’t really wing it very well. I still think there’s certainly more than a third of a chance that everything is just good, and that would have to come through, like people probably understanding that there’s a problem, having a reasonable consensus about, it’s a serious problem, being willing to make some sacrifices in terms of how they deploy AI. I think that at least on paper, many people would be willing to say, like, “If really rolling AI out everywhere would destroy everything we value, then we are happy to be more cautious about how we do that, or roll it out in a more narrow range of cases, or take development more slowly.”

Robert Wiblin: People showing restraint for long enough to kind of patch over the problems well enough to make things okay.

Paul Christiano: Yes. Somehow, there’s a spectrum of how well– some substitution between how much restraint you show and how much you are able to either ultimately end up with a clean understanding or wing it. One-third is my number if it turns out that winging it doesn’t work at all, like we’re totally sunk, such that you have to show very large amounts of restraint. People have to actually just be like, “We’re going to wait until they are so much smarter.” We’ve either used AI to become much smarter, or better able to coordinate, better able to resolve these problems, or something like that. You have to wait until that’s happened before you’re actually able to deploy AI in general.

I think that’s still reasonably likely. I think that’s a point where lots of people disagree, I think, on both ends. A lot of people are much more optimistic, a lot of people have the perspective that’s like, “Look, people aren’t going to walk into razorblades and have all the resources in the world get siphoned away or like deploy AI in a case where catastrophic failure would cause everyone to die.” Some people have the intuition like, “That’s just not going to happen and we’re sufficiently well coordinated to avoid that.”

I’m not really super on the same page there. I think if it was a really hard coordination problem, I don’t know, it looks like we could certainly fail. On the other hand, some people are like, “Man, we can’t coordinate on anything.” Like if there was a button you could just push to destroy things or someone with $1 billion could push to really mess things up, things would definitely get really messed up. I just don’t really know.

In part, this is just me being ignorant and in part, it’s me being skeptical of both of the extreme perspectives, like when people advocating them are also about as ignorant as I am of the facts on the ground. I certainly think there are people who have more relevant knowledge and who could have much better calibrate estimates if they understood the technical issues better than I do. I’m kind of at like some form of pessimism. If things were really, really bad, if we really, really don’t have an understanding of alignment, then I feel pessimistic, but not radically pessimistic.

Robert Wiblin: Yes. It seems like a challenge there is that you’re going to have a range of people who can have a range of confidence about how safe the technology is. Then you have this problem that whoever thinks it’s the safest, is probably wrong about that because most people disagree and they’re the most likely to deploy it prematurely.

Paul Christiano: Yes. I think it depends a lot on what kind of signals you get about the failures you’re going to have, so like how much you have a– Yes, we can talk about various kinds of near misses that you could have. I think the more clear of those are, the easier it is for there to be enough agreement. That’s one thing.

A second thing is we’re concerned or I’m concerned about a particular kind of failure that really disrupts the long-term trajectory of civilization. You can be in a world where that’s the easiest kind of failure. That’s sort of getting things to work in practice, is much easier than getting them to work in a way that preserves our intention over the very long term.

You could also imagine worlds though where a system which is going to fail over the very long term, is also reasonably likely to be a real pain in the ass to deal with in the short term. In which case, again, it will be more obvious to people. Then, I think a big thing is just– we do have techniques, especially if we’re in a world where AI progress is very much driven by large amounts of giant competing clusters.

In those worlds, it’s not really like any person can press this button. It’s one, there’s a small number of actors. The people who are willing to spend, say, tens of billions of dollars and two, they are those actors who have some room to sit down and reach agreements or like- which could be formalized to varying degrees, but it won’t be like people sitting separately in boxes making these calls.

At worst, it’ll be like in that world. At worst, still be like a small number of actors who can talk amongst themselves. At best, it’ll be like a small number of actors who agree like, here are norms. We’re going to actually have some kind of monitoring and enforcement to ensure that even if someone disagreed with the consensus, they wouldn’t be able to mess things up.

Robert Wiblin: Do you think you or OpenAI have made any interesting mistakes in your work on AI alignment over the years?

Paul Christiano: I definitely think I have made a lot of mistakes, which I’m more in a position to talk about.

Robert Wiblin: [laughs] Go for it. [laughs]

Paul Christiano: I guess there’s, yes, one category or there’s been a lot of years I’ve been thinking about alignment, so that’s a lot of time to rack up mistakes made. Many of which aren’t as topical though. It was a class of intellectual mistakes, I feel like, I made like four years ago or say, five years ago, when I was much earlier in thinking about alignment, which we could try and get into.

I guess my overall picture of alignment has changed a ton since six years ago. I would say that’s basically because six years ago, I reasoned incorrectly about lots of things. It’s a complicated area. I had a bunch of conclusions I reached. Lots of the conclusions were wrong. That was a mistake. Maybe an example of a salient update is I used to think of needing to hit this, like you really need to have an AI system that understands exactly what humans want over the very long term.

I think my perspective shifted more to something maybe more like a commonsensical perspective of, if you have a system which sort of respects short-term human preferences well enough, then you can retain this human ability to course correct down the line. You don’t need to appreciate the full complexity of what humans want, you mostly just need to have a sufficiently good understanding of what we mean by this course correction, or remaining in control, or remaining informed about the situation.

I think it’s a little bit hard to describe that update concisely, but it does really change how you conceptualize the problem or what kinds of solutions are possible. That’s an example of a long-ago, or that there’s a whole bunch of those that have been racked up over many years. Certainly I also made a ton of low-level tactical mistakes about what to work on. Maybe a more recent mistake that is salient is like, I don’t feel like I’ve done very well in communication about my overall perception of the problem. That’s not just expressing that view to others but also really engaging with reasons that maybe more normal perspectives are skeptical of it.

I’ve been trying to work a little bit more and I’m currently trying to better pin down, here is a reasonably complete, reasonably up-to-date statement of my understanding of the problem, and how I think we should attack the problem. Really iterating on that to get to the point where it makes sense to people who haven’t spent years thinking in this very weird style that’s not well-vetted. I’m pretty excited about that. I think that’s probably something I should’ve been doing much more over the last two years.

Robert Wiblin: Have you seen Ben Garfinkel’s recent talk and blog post about how confident should we be about all of this AI stuff?

Paul Christiano: I think I probably have seen a Google Doc or something. Yeah.

Robert Wiblin: Do you have any views on it, if you can remember it? [laughs]

Paul Christiano: I think there are lots of particular claims about AI that I think were never that well-grounded but people were kind of confident in, which I remain pretty skeptical about. I don’t remember exactly what he touches on in that post, but claims about takeoff, I think people have, of a really or very, very rapid AI progress and particularly claims about the structure of that transition. I think people would have pretty strong, pretty unconventional views. I guess to me it feels like I’m just taking more of an agnostic opinion, but I think to people in the safety community, it feels more like I’m taking this outlier position. That’s definitely a place where I agree with Ben’s skepticism.

I think in terms of the overall, how much is there an alignment problem? I think it’s right to have a lot of uncertainty thinking about it and to understand that the kind of reasoning doing is pretty likely to go wrong. I think you have to have that in mind. That said, I think it is clear, there’s something there. I don’t know if he’s really disagreeing with that.

Robert Wiblin: I think his conclusion is that he– it’s well worth quite a lot of people working on this stuff, but a lot of the arguments that people have made for that are not as solid as maybe we thought when you really inspect all the premises and think, yes.

Paul Christiano: Yes. I definitely think it’s the case that people have made a lot of kind of crude arguments and they put too much stock in those arguments.

Robert Wiblin: One point that he made which stood out to me was, there’s been technologies that have dramatically changed the world in the past, electricity, for example, but it’s not clear that working on electricity in the 19th century would have given you a lot of leverage to change how the future went. It seems like even though it was very important, it was just on a particular track and there was only so much that even a group of people could have steered how electricity was used in the future. It’s like possible that AI will be similar. It’d be very important, but also that you don’t get a ton of leverage by working on it.

Paul Christiano: Yes. I think it’s an interesting question. Maybe a few random comments are like one, it does seem like you can accelerate the adoption. If you had an understanding early enough– and I’m not exactly sure how early you would have acted to get much leverage. If you understand the problem early enough, you could really change the timeline for adoption. You can really imagine the small groups having pushed adoption forward by six months or something to the extent of like–

There are a lot of engineering problems and conceptual difficulties that were distinctive to this weird small thing, which in fact did play a big– sometimes there’s part of the overall machine and trajectory of civilization, but it really was well-leveraged and progress. faster progress in that area seems like it would have had unusually high dividends for faster overall technological progress.

Maybe going along with that. I think it is also reasonable to think that if a small group had positioned themselves to understand that technology well and be pushing it and making investments in it, they probably could have had– They couldn’t like have easily directly steered from a great distance, but they could’ve ended up in a future situation where they’ve made a bunch of money or in a position to understand well an important technology, which not that many people understand well as it gets rolled out.

I think that’s again, a little bit different from the kind of thing he’s expressing skepticism about. It seems like an important part of the calculus if one is thinking about trying to have leverage by working on AI, thinking about AI.

I do think the alignment problem is distinctive from anything you could have said in the context of electricity. I’m not mostly trying to do the like , “Make investments in AI sets you in a better position to have influence later, or make a bunch of money”. I’m mostly in the like, “I think we can identify an unusually crisp issue,” which seems unusually important and can just hack away at that. It seems like it should have a lot of question marks around it, but I don’t really know if historical cases which use a similar heuristic.

Sometimes people cite them and I’d tried to look into a few of them, but I don’t know historical cases where you would have made a similarly reasonable argument, and then ended up feeling really disappointed.

Robert Wiblin: Do you have any thoughts on what possible or existing AI alignment work might yield the most value for a traditional person or million dollars that it receives at the moment?

Paul Christiano: Yes. I mentioned earlier these three categories of difficulties. I think different resources will be useful in different categories, and each of them is going to be best for some resources, like some people, or some kinds of institutional will. Briefly going over those again, one was conceptual work on how is this all going to fit together if we imagine what kinds of approaches potentially scale to very, very powerful AI systems, and what are the difficulties in that limit as systems become very powerful? I’m pretty excited for anyone who has reasonable aptitude in that area to try working on it.

It’s been a reasonable fraction of my time over the last year. Over my entire career, it’s been a larger fraction of my attention, and it’s something that I’m starting to think about scaling up again. This is thinking, doing theoretical work directly at the alignment problem. Asking on paper, “What are the possible approaches to this problem? How do we think that this will play out moving towards having a really nailed down solution that we’ll feel super great about?” That’s one category. It’s a little bit hard to down money on that, but I think for people who like doing the theoretical or conceptual work, that’s a really good place to add such people.

There’s a second category that’s like this sort of understanding facts about human reasoning. Understanding in the context of debate, can humans be good judges between arbitrating different perspectives, competing perspectives? How would you set up a debate such that in fact, the honest strategy wins an equilibrium? Or on the amplification side, asking about this universality question, is it the case you can decompose questions into at least slightly easier questions.

I’m also pretty excited about throwing people at that. Just running more experiments, trying to actually get practice, engaging in this weird reasoning and really seeing, can people do this? Can we iterate and try and identify the hard cases? I’m pretty excited about that. I think it involves some overlap in the people who do those kinds of work, but it maybe involves different people.

There’s this third category of engaging with ML and actually moving from the theory to implementation. Also getting in a place with infrastructure, and expertise, and so on to implement whatever we think is the most promising approach. I think that again, requires a different kind of person still and maybe also requires a different kind of institutional will and money. It’s also a pretty exciting thing to me. That maybe it can both help provide a sanity check for various ideas coming out of the other kinds of experiments, and it can also be a little bit more of this being in a position to do stuff in the future.

We don’t necessarily know exactly what kind of alignment work will be needed, but just having institutions, and infrastructure, and expertise, and teams that have experience thinking hard about that question, actually building ML systems, and trying to implement say, “Here’s our current best guess. Let’s try and use this alignment or integrate these kind of ideas about alignment into state-of-the-art systems.” Just having a bunch of infrastructures able to do that seems really valuable.

Anyway, those are the three categories where I’m most excited about throwing resources on alignment work. I mostly don’t think– It’s very hard to talk in the abstract about which one’s more promising, just because there’s going to be lots of comparative advantage considerations, but I think there’s definitely a reasonable chunk of people that push, “I think it’s best to go into any of those three directions.”

Sending a message to the future

Robert Wiblin: Changing gears into something a bit more whimsical about a blog post that I found really charming. You’ve argued recently that a potentially effective way to reduce existential risk would be to leave messages somewhere on earth for our descendants to find in case civilization goes under or humans go extinct and then life reappears, intelligent life reappears on the earth, and we maybe want to tell them something to help them be more successful where we failed. Do you want to outline the argument?

Paul Christiano: Yes. The idea is, say, humanity– if every animal larger than a lizard was killed and you still have the lizards, lizards have a long time left before the lizards would all die, before photosynthesis began breaking down. I think based on our understanding of evolution, it seems reasonably likely that in the available time, lizards would again be able to build up to a spacefaring civilization. It’s definitely not a sure thing and it’s a very hard question to answer, but my guess would be more likely than not, lizards will eventually be in a position to also go and travel to space.

Robert Wiblin: It’s a beautiful image.

Paul Christiano: That’s one place where we’re like, “That’s a weird thing.” There’s another question, “How much do you care about that lizard civilization?” Maybe related to these other arguments, related to weird decision theory arguments about how nice should you be to other value systems? I’m inclined to be pretty happy if the lizards take our place. I prefer we do it, but if it’s going to be the lizards or nothing, I consider a real– I would really be inclined to help the lizards out.

Robert Wiblin: Maybe this is too much of an aside here, but I kind of– In that case, I had the intuition that, “Yes, future lizard people or humans now…” It’s like I’m not sure which is better. It’s like humans were drawn out of the pool of potential civilizations. It’s not obvious whether we’re better or worse than like– If you reran history with lizards rather than people.

Robert Wiblin: I just wanted to jump in because some of my colleagues pointed out that apparently there’s some insane conspiracy theory out there about so-called ‘lizard people’ secretly running the world, which I hadn’t heard of. To avoid any conceivable possible confusion, what we’re talking about has nothing to do with any such ‘lizard people’. [laughter]

‘Lizard people’ is just our jokey term for whatever intelligent life might one day re-evolve on Earth, many millions or hundreds of millions of years into the future, should humans at some point die out. Perhaps ‘lizard people’ was a slightly unfortunate turn of phrase in retrospect! OK, on with the show.

Paul Christiano: Yes. I think it’s an interesting question. I think this is, to me, one of the most– it’s related to one of the most important open philosophical questions, which is, just in general, what kinds of other value systems should you be happy with replacing you? I think the lizards would want very different things from us and on the object level, the world they created might be quite different from the world we would’ve created.

I share this intuition of like, I’m pretty happy for the lizards. It’s like I’d feel pretty great. If I’m considering, should we run a risk of extinction to let the lizards take over? I’m more inclined to let the lizards take over than run a significant risk of extinction. Yes, it’s like I would be happy. If there’s anything we could do to make life easier for the lizards, I’m pretty excited about doing it.

Robert Wiblin: I’m glad we’ve made this concrete with the lizard people.
Okay, carry on.

Paul Christiano: I’d say lizards also in part because if you go too much smaller than lizards, at some point, it becomes more dicey. If you only had like plants, it’s a little bit more dicey whether if they have enough time left. Lizards, I think, are kind of safe-ish. Lizards are pretty big and pretty smart. Most of the way to spacefaring.

Then there’s this question of, what could we actually do? Why is this relevant? The next question is, is there a realistic way that we could kill ourselves and all the big animals without just totally wiping out life on earth or without replacing ourselves with say AIs pursuing very different values? I think by far the most likely way that we’re going to fail to realize our values is like we don’t go extinct, but we just sort of are doing the wrong thing and pointing in the wrong direction. I think that’s much, much more likely than going extinct.

My rough understanding is that if we go extinct at this point, we will probably take really most of the earth’s ecosystem with us. I think if you thought that climate change could literally kill all humans, then you’d be more excited. It’s like there’s some plausible ways that you could kill humans but not literally kill everything. It’s a total really brutal collapse of civilization. Maybe there’s like some kinds of bioterrorism that kill all large animals but don’t kill- or kill all humans, but don’t necessarily kill everything.

If those are plausible, then there’s some chance that you end up in this situation where we’ve got the lizards and now it’s up to the lizards to colonize space. In that case, it does seem like we have this really interesting lever, where lizards will be evolving over some hundreds of millions of years. They’ll be like in our position some hundreds of millions of years from now. It does seem probably realistic to leave messages that is to like, somehow change earth such that a civilization that appeared several hundred million years later could actually notice the changes we’ve made and could start investigating them.

At that point, we would probably have like, if we’re able to call the attention of some future civilization to a particular thing, I think then we can encode lots of information for them and we could decide how we want to use that communication channel. Sometimes people talk about this, they normally are imagining radically shorter time periods than hundreds of millions of years, and they’re normally not being super thoughtful about what they’d want to say. I think my guess would be that like, if there are ways– You could really substantially change the trajectory of civilization by being able to send a message from a much, much more–

If you imagine like the first time that humans could have discovered a message sent by a previous civilization, it would have been– I mean it depends a little bit on how you’re able to work this out, but probably at least like a hundred years ago. At that point, the message might’ve been sent from a civilization which was much more technologically sophisticated than they are. Also, which has like experienced an entire civil– the entire arc of civilization followed by extinction.

At a minimum, it seems like you could really change the path of their technological development by like selectively trying to spell out for them or show them how to develop- how to achieve certain goals. You could also attempt, although it seems like a little bit more speculative to help set them on a better course and be like, “Really, you should be concerned about killing everyone.” It’s like, “Here’s some guidance on how to set up institutions so they don’t kill every new one.”

I’m very concerned about AI alignment, so I’d be very interested as much as possible being like, “Here’s the thing, which upon deliberation we thought was a problem. You probably aren’t thinking about it now, but FYI, be aware.” I do think that would put a community of people working on that problem and that future civilization into a qualitatively different place than if like– It’s just sort of– I don’t know.

It’s very hard to figure out what the impact would be had we have stumbled across these very detailed messages from the past civilization. I do think it could have a huge technological effect on the trajectory of development, and also reasonably likely have a reasonable effect either on deliberation and decisions about how to organize ourselves or on other intellectual projects.

Robert Wiblin: Yes. Give this hypothetical again, could we have made history go better if we could just send as much text as we wanted back to people in 1600 or 1700? Then it kind of on reflection does seem like, “Well yes, we could just send them lots of really important philosophy and lots of important discoveries in social science, and tell them also the things that we value that maybe they don’t value. Like speed up kind of the strains of philosophical thought that we think are particularly important.”

Paul Christiano: You also just choose what technology– [chuckles] like pick and choose from all the technologies that exist in our world and be like, “Here’s the ones we think are good on balance.”

Robert Wiblin: Right, yes. You just like, you don’t give them the recipe for nuclear weapons. Instead, you give them the game theory for mutually assured destruction so they can– or you like tell them everything we do about how to sustain international cooperation, so whenever they do develop nuclear weapons, they’re in a better position to not destroy themselves.

Paul Christiano: Yes, and “Here’s a way to build a really great windmill.”

Robert Wiblin: [laughs] Yes, “Here’s solar panels. Why not? Yes, get some solar panels stuff.”

Paul Christiano: I don’t know how much good you could do with that kind of intervention and it’s a thing that would be interesting to think about a lot more. My guess would be that there’s some stuff which in expectation is reasonably good, but it’s hard to know.

Robert Wiblin: Yes. There’s a pretty plausible case that if humans went extinct, intelligent life might reemerge. Probably, if we thought about it long enough, we could figure out some useful things that we could tell them that would probably help them and give them a better shot at surviving, and thriving, and doing things that we value. How on earth would you leave a message that could last hundreds of millions of years? It seems like it could be pretty challenging.

Paul Christiano: Yes, I think there’s two parts of the problem. One part is calling someone’s attention to a place. I think that’s the harder part by far. For example, if you were to like– you can’t just bury a thing in most places on earth, because hundreds and millions of years is long enough in that the surface of the earth is no longer the surface of the earth. I think the first and more important problem is calling someone’s attention to a spot or to one of a million spots or whatever.
Then the second part of the problem is, after having called someone’s attention to a spot, how do you actually encode information? How do you actually communicate it to them? It’s also probably worth saying, this comes from a blog post that I wrote. I expect, I think that there are people who have a much deeper understanding of these problems, that have probably thought about many of these exact problems in more depth than I have. I don’t want to speak as if I’m like a–

Robert Wiblin: An authority– leaving messages for future civilizations. [laughs]

Paul Christiano: That’s right. I thought about it for some hours.[laughter]
In terms of calling attention, I thought of a bunch of possibilities in the blog post that I was interested in– started some discussions online with people brainstorming possibilities. I think if we thought about a little bit, we could probably end up with a clearer sense.

Probably the leading proposal so far is, I think Jan Kulveit had this proposal of– There’s this particularly large magnetic anomaly in Russia, which is very easy for civilization to discover quite early, and which is located such that it’s unlikely to move as tectonic plates move. It seems pretty plausible, it’s a little bit difficult to do but it’s pretty plausible that you could use modifications to that structure or locating things and Schelling points in the structure in a way that at least our civilization would very robustly have found. It’s hard to know how much a civilization quite different from ours would have…

Robert Wiblin: You said, just the straightforward idea of a really big and hard rock that’s jots out of the earth. Hopefully, we’ll survive long enough to be– [crosstalk]

Paul Christiano: Yes, it’s really surprisingly hard to make things like that work. [chuckles]

Robert Wiblin: Yes, I guess it’s over that period of time, even a very durable rock is going to be broken down by erosion.

Paul Christiano: Yes. Also stuff moves so much. Like you put the rock on the surface of the earth, it’s not going to be on the surface of the earth in hundreds of millions of years anymore.

Robert Wiblin: It just gets buried somehow. Yes, interesting. [crosstalk]

Paul Christiano: Surprisingly, I really updated a lot towards it being rough. When I started writing this post, I was like, “I’m sure this is easy,” and I was like, “Aw jeez, Really, basically, everything doesn’t work.”

Robert Wiblin: What about a bunch of radioactive waste that would be detectable by Geiger counters?

Paul Christiano: Yes, so you can try and do things– You have to care about how long these things can last, and how easy they are to detect, and how far from the surface they remain detectable, but I think there are options like that, that work. [chuckles] I think also magnets for me and magnets are longer-lasting than we might have guessed and a reasonable bet. I think it can be easily as effective.

Robert Wiblin: You made this point that you can literally have thousands of these sites and you can make sure that in every one, there’s a map of where all the others are, so they only have to find one. Then they can just go out and dig up every single one of them, which definitely improves the odds.

Paul Christiano: Yes. Also, there are some fossils around, so if you think you got a million very prone-to-be fossilized things, then it’s probably not going to work. Yes, I haven’t thought about that in a while. I think probably if you sat down though, if you just took a person, and that person spent some time really flushing out these proposals, and digging into them, and consulting with experts, they probably find something that would work.

Similarly, on the social side, if you thought about a really long time, expect you could find– you sort of have a more conservative view about whether there’s something to say that would be valuable. The first step would be, do you want to pay someone to spend a bunch of time thinking about those things? Is there someone who’s really excited to spend a bunch of time thinking about those things, nailing down the proposals? Then seeing whether it was a good idea and then if it was a good idea, spending millions or tens of millions of dollars you need to do to actually make it happen.

Robert Wiblin: In terms of how you would encode this information, it seemed like you thought of probably just etching it in rock would be a plausible first pass. That would probably be good enough for most of the time. You could probably come up with some better material on which you could etch things that is very likely to last a very long time. At least if it’s buried properly.

Paul Christiano: I think other people have thought more about this aspect of the problem and I think in general, with more confidence, something will work out, but I think just etching stuff is already good enough under reasonable conditions. It’s a lot easier to have a little thing that will survive. It’s easier to have a small thing that will survive for hundreds of millions of years than to disfigure the earth in a way that will be noticeable and would call someone’s attention to it in hundreds of millions of years.

Robert Wiblin: Okay, this brings me to the main objection I had, which is that the lizard people probably don’t speak English, and so even if we bury Wikipedia, I think they might just find it very confusing. How is it clear that we can communicate any concepts to lizard people in a hundred million years time?

Paul Christiano: Yes, I think that’s a pretty interesting question. That goes into things you want to think about. I do think when people have historically engaged in the project of like trying to figure out easy, like if you have a lost language or you have some relatives you’re trying to make sense of, you’re really in a radically worse position than like the lizard people would be in with respect to this artifact, since we would have put a lot of information into it really attempting to be understood. I think we don’t really have examples of humans having encountered this super information-rich thing that’s attempting to be understood.

I guess this is like a game, you can try and play it amongst humans and I think humans can win very easily at it, but it’s unclear the extent to which, it’s because we have all this common context and because I think humans do not need anything remotely resembling language because art easily wins this game. In order to easily build up the language of concepts just by simple illustrations, and diagrams, and so on.

I think it’d be right to be skeptical of even when it’s not a language, we just are using all of these concepts that are common. We’ve thought about things in the same way, we know what we’re aiming at. I think I’m like reasonably optimistic, but it’s pretty unclear. This is also a thing that I guess people have thought about a lot, although in this case, I’m a lot less convinced in their thinking, than in the ‘writing stuff really small in a durable way case’.

Robert Wiblin: My understanding was that the people who thought about it a lot seemed very pessimistic about our ability to send messages. Well, I guess, to be honest, the only case I know about is, there was a project to try to figure out what messages should we put at the site where we’re burying really horrible nuclear waste. You’re putting this incredibly toxic thing under the ground and then you’re like, “Wow, we don’t want people in the future to not realize what this is, and then dig it up, and then kill themselves.”

There was quite a lot of people, I guess linguists, sociologists, all these people who were trying to figure out what signals do we put there? Is it signs? Is it pictures? Whatever it is. They settled on some message that I think they drew out in pictures, that was, I thought, absolutely insanely bad because it was like– I couldn’t see how any future civilization would interpret it as anything other than like religious stuff that they would be incredibly curious about, and then would absolutely go and dig it up.

I’ll find the exact message that they decided to communicate and potentially read it out here, and people could judge for themselves.

Rob Wiblin: Hey folks, I looked up this message to add in here so you can pass judgement on it. Here it is:

“This place is a message… and part of a system of messages …pay attention to it!
Sending this message was important to us. We considered ourselves to be a powerful culture.
This place is not a place of honor… no highly esteemed deed is commemorated here… nothing valued is here.
What is here was dangerous and repulsive to us. This message is a warning about danger.
The danger is in a particular location… it increases towards a center… the center of danger is here… of a particular size and shape, and below us.
The danger is still present, in your time, as it was in ours.
The danger is to the body, and it can kill.
The form of the danger is an emanation of energy.
The danger is unleashed only if you substantially disturb this place physically. This place is best shunned and left uninhabited.”

As I said, I really think a future civilization, human or otherwise, would be insanely curious about anything attached to a message like that, and would guess that the site was religious in nature. If they hadn’t learned about nuclear radiation themselves already, I think they’d be more likely to dig at that spot than if it were simply left unmarked. Alright, back to the conversation.

Anyway, they did have this– I think actually the plan there was to write it in tons of languages that exist today in the hope that one of those would have survived. That was one of the options.

Paul Christiano: That’s not going to be an option here.

Robert Wiblin: Not an option here.

Paul Christiano: I think it’s quite a different issue– It’s different if you want to make a sign, so someone who encounters that sign can tell what it’s saying versus if I want to write someone a hundred million words, such as like somehow if they’re willing to spend– if we encountered a message from some civilization that we can tell has technological powers much beyond our own, we’re like, “Okay, that’s really high up on our list of priorities. I don’t know what the hell they’re talking about.” It’s just a very different situation where they were in this huge amount of content. It’s like the most interesting academic project of all academic– it goes to the top of the intellectual priority queue upon discovering such a thing.

I have a lot more confidence in our ability to figure something out or a civilization who has a similar ability to us to figure something out under those conditions. Than under like, they’re walking around, they encounter a sign– perhaps they’re somewhat primitive at this point. I also have no idea what’s up with it. It’s also just not that much content. It’s unclear how you– in the case where you’re like are only giving them like 10,000 words of content or some pictures, they just don’t have enough traction to possibly figure out what’s up.Whereas, in this case, we have– we’re not just coming in with one proposal of how you could potentially build a shared conceptual language, we’re like, “We have a hundred proposals, we’re just trying them all, just every proposal like any fourth-grader came up with.

“That’s fine. Throw it in there too.” [laughs] Bits are quite cheap so you can really try a lot of things in a much– Yes, I think it’s just a much better position than people normally think

Robert Wiblin: I think archaeologists, when they’ve dug up a writing, sometimes they’ve decoded it by like analogy to other languages that we do have records about. Sometimes they’re just like the Rosetta Stone where it’s like, “Now, here we’ve got a translation so then we can figure out what that–” I think they had like a translation for two of them and there was a third language that was the same thing. Then they could figure out what the language sounded like from that, and then figure out very gradually what the words meant.

I think there’s other cases where just from context, they’ve dug up stones and they’re like, “What is this?” It turns out that it’s a bunch of financial accounts for a company, and they’re like, figuring out like imports and exports from this place, which like makes total sense. You can imagine that they’ll be doing that. Your hope here is that we will just bury so much content, and then we’ll have like a bunch of pictures, like lots of words, repeating words, that eventually, they’ll be able to decode.

They’ll figure out from some sort of context, I guess, they’ll be flicking through the encyclopedia and then they’ll find one article about a thing that they can figure out what it is, because they also have this thing. They’re like trees. Okay, we’ve got the article about trees and we still have trees. Then they kind of work out, “Well, what would I say about trees if I was writing an encyclopedia? They read an article about trees, so they guess what those words are. Then they kind of go out from there.

Paul Christiano: We can make things a lot simpler than encyclopedia articles where you can be like, “Here’s a lexicon of a million concepts. For each of them or whatever, 10,000 concepts. For each of them, a hundred pictures, and a hundred sentences about them, and a hundred attempts to define them. Attempted to organize well.

Robert Wiblin: Yes. Okay, I agree. I think if you went to that level, then probably you could do it. Although some concepts might be extremely hard to illustrate.

Paul Christiano: Yes, I’m more optimistic about like communi– Well, I don’t know. Communicating technology seems easier than–

Robert Wiblin: Just like, “Here’s a picture of a steam engine.” Whereas, maybe philosophy is a bit trickier or religion. In the blog post, you suggested. That this might be a pretty good bang for your buck in terms of reducing existential risk. I think you had a budget of $10 million for a minimum viable product of this. You were thinking, “Yes, this could improve their odds of surviving by one percentage point is if we’re very careful about what messages we send them and what messages we don’t send them.” Do you still think something like that?

The budget of $10 million seemed incredibly low to me. I guess here we’ve been envisaging something potentially a lot more ambitious than perhaps what you were thinking about at the time.

Paul Christiano: Yes, $10 million, I think, does seem– After talking to people about what the actual storage options are or how to make a message, how the biggest people could find a message, $10 million seems low and $100 million seems probably more realistic, which makes cost-effectiveness numbers worse.

I think it is worth pointing out that you have to go separately on the– If you imagine three phases, four phases of the project: figuring out what to say, somehow making a landmark people can identify, actually including a bunch of information, and then actually writing, trying to communicate the information, the thing that you wanted to say.
If any one of those is expensive, you can relatively easily bring the others up to the same cost.

If we’re getting to spend millions of dollars on each of those phases. I think actually, I’m probably imagining the lion’s share of the cost going into leaving a landmark, but that still leaves you with millions of dollars to spend on other components, which is a few people working full-time for years.

Robert Wiblin: I would have thought that the most difficult thing would be to figure out what to say and then figure out how to communicate it. If we’re talking about, it’s like drawing pictures for every word that we think lizard people would be able to understand, that seems more like a lot of homework.

Paul Christiano: I think it’s hard to ballpark the cost of that kind of work. Are we talking a hundred-person years or a thousand-person years? How many person years of effort is that? You can think about how many person years of effort go into reasonable encyclopedias. It’s tricky thinking about the costs. I think at $100 million, I feel good about how thoroughly– again, you’re not going to be able to have a great answer what to send, but you’re going to have an answers supported by people who are going to think a few years. I guess probably if you’re doing this project, you’re doing it under a certain set of rules.

This project is already predicated on a bunch a crazy views about the world, and so you’re making an all out bet on those crazy views about the world. When you’re doing these other stages, you’re also sort of just conditioning on those crazy views about the world being correct, about what basic things are important, and how things basically work, which I think does in some sense help. You only have to eat those factors of those crazy views being right ones. You don’t have to pay them again.

I guess I’ve always imagined that it would take less than a few person years of effort to produce like– if I wanted to produce something that could be understood by future civilization. Maybe I’m just way too optimistic about that. I haven’t engaged with any of the communities that have thought about this problem in detail. Totally possible that I’m way off base.

Anyway, when I imagine people spending 10 years on that, I’m like, “10 years? That seems pretty good. It seems that they’re going to have this nailed. They’re going to have tested it a bunch of times. They’re going to have like six independent proposals that are implemented separately. Each of them is going to be super exhaustive with lots of nice pictures.” Nice pictures are actually a little bit hard, but they probably just get these bits and they’re like, “What do they do with all the bits?”

Robert Wiblin: Should listeners maybe fund this idea? Has anyone expressed interest in being the team lead on this?

Paul Christiano: Yes, there’ve been some conversations, very brief conversations of the landmarking step. I think that’s probably the first thing I would be curious about. What is the cost like? I don’t think it’s a big project to be funded yet. I don’t think anyone’s really expressed interest in taking it up and running with it. [chuckles] I think the sequence would probably be, first check to see if the landmark thing makes sense and roughly, how it’s going to survive if it would necessarily be. Then think about the– Maybe do a sanity check on all the details, and then start digging in a little bit for a few months on how you would send things and how good does it actually look? Then six months in, you’d be like, now we have a sense of whether this is a good deal.

Robert Wiblin: If one of you listeners out there is interested in taking on this project, send me an email because you sound like a kind of fun person.

Do you have any other neglected or crazy sounding ideas that might potentially compare favorably to more traditional options for reducing existential risk?

Paul Christiano: I do think it’s worth caveating, I think, if there’s any way to try and address AI risk, that’s probably going to be better than this kind of thing related to my comparative advantage seeming to be in AI risk stuff. In terms of weird, altruistic schemes, I feel like I haven’t thought that much about this kind of thing over the last year. I don’t have anything that feels both very weird and very attractive.

Robert Wiblin: [laughs] What about anything that’s just attractive? I’ll settle. [chuckles]

Paul Christiano: I remain interested in– There’s a few things we discussed last time that, maybe very shallowly or maybe we didn’t have a chance to touch on, but I remain excited about. Some basic test of interventions that may affect cognitive performance seem like pretty weirdly neglected. Right now, I’m providing some funding to some clinical psychiatrists in Germany to do a test of creatine in vegetarians, which seems pretty exciting. I think the current state of the literature on carbon dioxide and cognition is absurd. I probably complained about this last time I was here. It’s just– [crosstalk]

Robert Wiblin: Let’s dive into this. It was a mistake of mine not to put these questions in. Just to go back on this creatine issue, there’s been some studies, one study in particular that suggested that for vegetarians and potentially for non-vegetarians as well, taking creatine gives you an IQ boost of a couple of points. It was very measurable even with a relatively small sample. This was a pretty big effect size by the standards of people trying to make people smarter.

Paul Christiano: Small by the standards of people normally looking for effects. Like a third of a standard deviation. This is respectable, but it’s huge, I don’t know of many interventions being that effective.

Robert Wiblin: Yes. If we can make like everyone three IQ points smarter, that’s pretty cool. Then there was just not much follow up on this even though it seems like this is way better than most of the other options we have for making people smarter other than, I suppose, improving health and nutrition.

Paul Christiano: Yes, this review is on the effects in omnivores. That’s been better studied. I think it doesn’t look that plausible that it has large effects in omnivores and there’s been some looking into mechanisms, and in terms of mechanisms, it doesn’t look great. If you look at like how creatine– I don’t know much about this area, all these areas we’re listing now are just random shit I’m speculating about sometimes. I really want to– I’ve got to put that out there. There should be a separate category for my views on AI.

Anyway, yes, looking at mechanisms, it doesn’t look that great. It would be surprising given what we currently know about biology for creatine supplementation to have this kind of cognitive effect. It’s possible and it’s not ruled out in vegetarians. The state in vegetarians is, I think, one inconclusive thing and there’s one really positive result. It seems just worth doing a reasonably powered check in vegetarians again.

I would be very surprised if something happened, but I think it’s possible. Some people would be more surprised, some people are like obviously nothing, but I’m at the like, 5-10% seems like a reasonable bet.

Robert Wiblin: On the vegetarianism point, when I looked at that paper, it seemed like they had chosen vegetarians mostly just because they expected the effect to be larger there because it’s the case that creatine supplementation also increases, like free creatine in the body for meat eaters. Just to explain for listeners who don’t know, meat has some creatine in it, although a lot less than people tend to supplement with. Vegetarians seem to have less because they’re not eating meat. The supplementation eventually has a larger effect.

Paul Christiano: Most likely that was just the choice that study made and then there was random variation where some studies– I’ve definitely updated more in the direction of their study is showing everything and it’s very, very easy to mess up studies or very, very easy to get not even just in the like 5% of the time you have results significant peak was .05, but just radically more often than that you get results that are wrong for God knows what reason.

Anyway, so most likely that’s a study that happened to return a positive result since they happened to be studying vegetarians. That was the reason they did it. Seemed like it should have a larger effect. I think since we’ve gotten negative evidence about the effects and omnivores, it doesn’t seem that likely. Although that would also be consistent with them just being three times smaller and omnivores would be plausible and then it would be compatible with what we know.

Robert Wiblin: You were kind of, “Goddamn, this is really important but like people haven’t put money into it, people haven’t run enough replications of this.” You just decided to–

Paul Christiano: One replication. It’s one pre-registered replication. That’s all I want.

Robert Wiblin: You were like, “I’m going to do it myself.” Talk about that for a minute?

Paul Christiano: Well, I feel like in this case, providing funding is not the hard part, probably. I’m happy for stuff like this. I’m very interested in providing funding. I made a Facebook post like, “I’m really interesting providing funding” and then EA stepped up and was like, “I know a lab that might be interested in doing this.” They then put me in touch with them.

Robert Wiblin: When might they have results?

Paul Christiano: In a year. I don’t know.

Robert Wiblin: Okay. Are you excited to find out?

Paul Christiano: I am. Yes, I’m excited to see how things go.

Robert Wiblin: Yes, talk about the carbon dioxide one for a minute because this is one that’s also been driving me mad the last few months just to see that carbon dioxide potentially has enormous effects on people’s intelligence and in offices but you eventually just have extremely– And lecture halls especially just have potentially incredibly elevated CO2 levels that are dumbing us all down when we most need to be smart.

Paul Christiano: Yes. I reviewed the literature a few years ago and I’ve only been paying a little bit of attention since then, but I think the current state of play is, there was one study with preposterously large effect sizes from carbon dioxide in which the methodology was put people in rooms, dump some gas into all the rooms. Some of the gases were very rich in carbon dioxide and the effect sizes were absurdly large.

They were like, if you compare it to the levels of carbon dioxide that occur in my house or in the house I just moved out of, the most carbon dioxide-rich bedroom in that house had one standard deviation effect amongst Berkeley students on this test or something, which is absurd. That’s totally absurd. That’s almost certainly–

Robert Wiblin: It’s such a large effect that you should expect that people, when they walk into a room with carbon dioxide which has elevated carbon dioxide levels, they should just feel like idiots at that point or they should feel like noticeably dumber in their own minds.

Paul Christiano: Yes, you would think that. To be clear, the rooms that have levels that high, people can report it feels stuffy and so part of the reason that methodology and the papers like just dumping in carbon dioxide is to avoid like if you make a room naturally that CO2 rich, it’s going to also just be obvious that you’re in the intervention group instead of the control.

Although to be fair, even if I don’t know, at that point, like even a placebo effect maybe will do something. I think almost certainly that seems wrong to me. Although maybe this is not a good thing to be saying publicly on a podcast. There’s a bunch of respected researchers on that paper. Anyway, it would be great to see a replication of that. There was subsequently replication with exactly the same design which also had p = 0.0001.

Now, we’ve got the two precise replications with p = 0.0001. That’s where we’re at. Also the effects are stupidly large. So large. You really, really need to care about ventilation effects. This room probably is, this is madness. Well, this building is pretty well ventilated but still, we’re at least a third of a standard deviation dumber.

Robert Wiblin: Yes, I’m sure dear listeners you can hear us getting dumber over the course of this conversation as we fill this room with poison. Yes, I guess potentially the worst case would be in meeting rooms or boardrooms where people are having very long– Yes prolonged discussions about difficult issues. They’re just getting progressively dumber as the room fills up with carbon dioxide and it’s going to be more irritable as well.

Paul Christiano: Yes, it would be pretty serious and I think that people have often cited this in attempts to improve ventilation, but I think people do not take it nearly as seriously as they would have if they believed it. Which I think is right because I think it’s almost certainly, the effect is not this large. If it was this large, you’d really want to know and then–

Robert Wiblin: This is like lead poisoning or something?

Paul Christiano: Yes, that’s right.

Robert Wiblin: Well, this has been enough to convince me to keep a window open whenever I’m sleeping. I really don’t like sleeping in a room that has no ventilation or no open door or window. Maybe I just shouldn’t worry because at night who really cares how smart I’m feeling while I’m dreaming?

Paul Christiano: I don’t know what’s up. I also haven’t looked into it as much as maybe I should have. I would really just love to be able to stay away, it’s not that hard. The facts are large enough but it’s also short term enough to just like extremely easy to check. In some sense, it’s like “What are you asking for, there’s already been a replication”, though, I don’t know, the studies they use are with these cognitive batteries that are not great.

If the effects are real you should be able to detect them in very– Basically with any instrument. At some point, I just want to see the effect myself. I want to actually see it happen and I want to see the people in the rooms.

Robert Wiblin: Seems like there’s a decent academic incentive to do this, you’d think, because you’d just end up being famous if you pioneer this issue that turns out to be extraordinarily important and then causes buildings to be redesigned. I don’t know, it could just be a big deal. I mean, even if you can’t profit from it in a financial sense, wouldn’t you just want the kudos for like identifying this massive unrealized problem?

Paul Christiano: Yes, I mean to be clear, I think a bunch of people work on the problem and we do have– At this point there’s I think there’s the original– The things I’m aware of which is probably out of date now is the original paper, a direct replication and a conceptual replication all with big looking effects but all with slightly dicey instruments. The conceptual replication is funded by this group that works on ventilation unsurprisingly.

Robert Wiblin: Oh, that’s interesting.

Paul Christiano: Big air quality. Yes, I think that probably the take of academics, insofar as there’s a formal consensus process in academia, I think it would be to the effect that this is real, it’s just that no one is behaving as if the effect of that size actually existed and I think they’re right to be skeptical of the process, in academia. I think that does make– The situation is a little bit complicated in terms of what you exactly get credit for.

I think people that would get credit should be and rightfully would be the people who’ve been investigating it so far. This is sort of more like checking it out more for– Checking it out for people who are skeptical. Although everyone is implicitly skeptical given how much they don’t treat it like an emergency when carbon dioxide levels are high.

Robert Wiblin: Yes, including us right now. Well, kudos to you for funding that creatine thing. It would be good if more people took the initiative to really insist on funding replications for issues that seemed important where they’re getting neglected.

Paul Christiano: Yes, I think a lot of it’s great– I feel like there are lots of good things for people to do. I feel like people are mostly at the bottleneck just like people who have the relevant kinds of expertise and interests. This is one category where I feel people could go far and I’m excited to see how that goes.

Effect of more compute

Robert Wiblin: Last year OpenAI published this blog post which got people really excited. Showing that there has been a huge increase in the amount of compute used to train cutting edge ML systems. I think for the algorithms that have absorbed the most compute, there was a 300,000 fold increase in the amount of compute that had gone into them over six years.

It seemed like that’d been potentially a really big driver of more impressive AI capabilities over recent years. Would that imply faster progress going forward? Or did you think it will slow down as the increasing compute runs its course and gets harder and harder to throw more thermal processes at these problems?

Paul Christiano: I think it just depends on what your prior perspective was. If you had a prior perspective where you were eyeballing progress in the field and being like, “Does this feel like a lot of progress?” Then in general, it should be bad news or not bad news. It should make you think AI is further away. Then you’re like, “Well there was a lot of progress.” I had some intuitive sense of how much progress that was.

Now I’m learning with that rate of progress can’t be sustained that long or a substantial part of it has been this unscalable thing. You could talk about how much more you could go but maybe you had a million X over that period and you can have a further thousand X or something like that maybe 10,000 X.

Robert Wiblin: Well, I suppose there’s only so fast that process of getting faster and then also just the cost of buying tons of these things. People were able to ramp it up because previously it was only a small fraction of the total costs of their projects but I guess it’s now getting to be a pretty large fraction of the total cost of all of these AI projects in just buying enough processes.

Paul Christiano: Yes, a lot of things have a large compute budget. It’s still normally going to be small compared to staff budget and you can go a little bit further than that, but it’s getting large and you should not expect, if you’re at the point where you’re training human-level AI systems that the cost of– Like the compute cost for this training run should be a significant fraction of global outputs.

You could say maybe this trend could continue until you got up there. It’s probably not at this pace, it’s going to have to slow down a long time before it gets to like we are spending 2% of GDP on computers doing AI training. If you had that perspective we were eyeballing progress, then I think it should generally be an update towards longer timelines.

I think if you had a perspective that this is more random, coming from where you’re like, “Man, it’s really hard to tell.” It’s very hard to eyeball progress and be like, “How impressive is this? How impressive is beating humans at chess or beating humans at Go or classify images as well?” To do this particular image classification task, I find it very hard to really eyeball that kind of progress and make a projection.

I think if instead your estimates were coming from– Well, we think there is some more– We have some sketchy ways of estimating how much computing might be needed. We can make some analogy with the optimization done by evolution or by an extrapolation of training times or by arguments about other kinds of arguments about the human brain, which are really anchored to amounts of compute, then I think you might have a perspective that’s more like, “Well, this tells us something about, on paper, these arguments would have involved using large amounts of compute.”

There’s a lot of engineering effort in that kind of scale-up. There’s a lot of genuine uncertainty, especially if you’re talking about moderate timelines of, “Will that engineering effort actually be invested and will that willingness to spend actually materialize?” I think that might make you move in the direction of like, “Yes, apparently, people are putting in the effort and engineering practices are reasonably brisk.”

If instead, you were doing an estimate that was really driven by how much compute– This is the style of the old estimates futurists made. If you look at like, I mean Moravec. Like one of the earlier estimates of this flavor and Kurzweil has a very famous estimate of this flavor where they’re like, “It really matters like how much you compute you’re throwing at this task.”

If you have that kind of view and then you see this compute spending is rising really rapidly, I guess that’s evidence that maybe it will continue to rise and therefore, it will be shorter than you would have thought.

Robert Wiblin: Some people seem to think that we may be able to create a general artificial intelligence just by using the algorithms that we have today, but waiting for another decade or two worth of processing power to come online, progress in the chips and just building that infrastructure. How realistic do you think that is? Is that a live possibility in your mind?

Paul Christiano: I think it’s really hard to say, but it’s definitely a live possibility. I think a lot of people have an intuitive reaction– Some people have an intuition that’s very much “That’s obviously how it’s going to go.” I don’t think I sympathize with that intuition. Some people on the other side have an intuition, obviously, they’re really important things we don’t yet understand which will be difficult, so it’s hard to know how long they will take to develop, it’s going to be much longer the amount of time required to scale up computing.

I also I’m not super sympathetic to that either. I kind of feel like it’s really hard to know, it seems possible. It’s hard to rule it out on a priori grounds. Our observations are pretty consistent with things being loosely driven by compute. If you think of it like, what is the trade-off rate between compute and progress, conceptual progress or algorithmic progress.

I think our observations are pretty compatible with a lot of importance on compute, and also are compatible with the scale-up of existing things eventually getting you to– I guess that’s like definitely a view I have that eventually, enough scale-up will certainly almost certainly work. It’s just a question of how much and was that waiting to be seen over the next one or two decades, or is it like going to take you far past physical limits? Or, I’ll end up just pretty uncertain. I think a lot of things are possible.

Robert Wiblin: How does this question of the importance of compute relate to Moravec’s paradox? I guess, what is that for the audience of people who haven’t heard of it?

Paul Christiano: This is the general observation. There are some tasks humans think of as being intellectually difficult. A classic example is playing chess, and there are other tasks that they don’t think would be computationally difficult, that are like picking up an object. Looking at a scene, seeing where the objects are, picking up an object, and then bringing it.It has seemed to be the case that the tasks that people think of as traditionally intellectually challenging were easier than people suspected relative to the task people thought of as not that intellectually demanding. It’s not super straightforward because there’s still certainly big chunks of intellectual inquiry that people have no idea how to automate it and I think that’s the general pattern.

Robert Wiblin: You mean for example, humans think of philosophy is difficult and it’s also hard for computers to do philosophy or they don’t seem to be beating us at that.

Paul Christiano: Or mathematics or science. I guess people might often think to humans, it feels similar maybe to be doing mathematics and to be playing a really complicated board game, but to a machine, these tasks are not that similar.

Robert Wiblin: The board game is way easier.

Paul Christiano: Board games it turned out was very, very easy relative to all the other things even for– At this point, Go is a reasonable guess for the hardest board game. It was much easier than it is for other tasks for humans to automate. Yes, I think in general part of what’s going on there is the reasoning humans have conscious access to is just not that computationally demanding. We have some understanding, and it is a part of the very early optimism about AI.

We understand that when a human is consciously manipulating numbers or symbols or actually casting their attention to anything, they’re just not doing things that fast. A human is lucky if they can be doing 100 operations per second. That’s insane if a human is able to multiply numbers at that kind of speed that implies that or something. You’re like, “Wow, that’s incredible.”

But when a human is doing, underneath that there’s this layer, which is using vastly, vastly more computation. In fact, a lot of the difficulty, especially if you’re in compute-centric world is when you look at the task, you say, “How hard is that task for humans relative to a machine?”

A lot of the questions are like, “How well is a human leveraging all the computational capacity that they have when they’re doing that task?”

For these tasks, any task that is involved in conscious reasoning, maybe it’s less likely, at least the conscious part is not doing anything computationally interesting. Then you have this further issue for things like board games, where it’s like a human is not under much selection pressure to use– A human has not really evolved to play board games well. They’re not using much compute in their brain very well at all. Best guess would be if you evolved like much, much tinier animals that are much much better at playing board games than humans.

Robert Wiblin: Is it not the case that the human brain just has a ridiculous fraction of itself devoted to visual processing that has just required a ton of compute and I guess also evolution to tease that part of the brain well.

Paul Christiano: Yes. I don’t know off hand what the number is, but we’re talking about like the log scale, it just doesn’t even matter that much. It uses a reasonable– Vision uses a reasonable chunk of the brain and it’s extremely well optimized for it. It’s like when people play board games, they are probably leveraging some very large faction of their brain. Again, the main problem is like, the visual cortex is really optimized for doing vision well. They’re really using their brain for all that.

Usually, the luckiest case when you’re doing mathematics or playing a game somehow has enough– Makes enough intuitive sense or maps on well enough intuitively, you can build up these abstractions to leverage the full power of your brain through that task. It’s pretty unusual. This is not obvious, a priori [inaudible 01:00:39] this is just an after the facts story. You could imagine that there are people who are actually able to use their entire machine of visual processing to play some board games. You can imagine that.

I think that’s actually a live possibility. We talk about Go for example and we look at the way that we’ve now resolved Go. The amount of compute you would need to beat humans at Go using entirely a brute force strategy, using alpha-beta search or something, is a lot compared to your visual cortex or the individual system more broadly. You can make a plausible case that people are able to use a lot of that machinery– They are able to use a lot of machinery in playing Go and to a slightly lesser extent, chess for doing position evaluation, intuitions about how to play the game.

Robert Wiblin: You’re saying that you think the part of the brain that does visual processing is getting brought online to notice patterns in Go and is getting co opted to do the board game work.

Paul Christiano: Yes, at least that’s possible and consistent with our observations of how hard it is to automate the game. We just don’t know very much. Lot’s of things are inconsistent with our observations.

Robert Wiblin: Do you hope to find out whether we’re constrained by compute or algorithmic progress?

Paul Christiano: Yes. I generally think– In some sense it’s not going to be being constrained by one or the other, it’s going to be some marginal returns to each. What is the rate of substitution between more compute and more algorithmic progress? In general, I think it seems better from a long-term perspective, if it takes a lot of algorithmic progress to substitute for small amount of compute.

The more you’re in that world, the more concentrated different actors compute needs are. They are probably building really powerful AI systems. Everyone who’s building them is going to have to use– You’re going to have to be paying attention. They’re going to be using a very large fraction of their computational resources and any actor who wants to develop very powerful AI will be also using a reasonable fraction of the world’s resources and that means that it is much easier to know who is in that game, it’s much harder for someone to unilaterally do something.

It’s much easier for the players to be having a realistic chance of modern reinforcement and also just have a realistic chance of getting in a room and talking to each other. Probably not literally a room but reaching understanding and agreement. That’s one thing. Maybe the other thing which is harder is for algorithmic progress to substitute for hardware progress to slow the subsequent rate of progress is likely to be relative to what we’ve observed historically.

If you’re in a world where it turns out that just clever thinking really can drive AI progress extremely rapidly and the problem is just that we haven’t had that much clever thinking to throw at the problem, you can really imagine as one skills up AI and is able to automate all that thinking, having a pretty fast ongoing progress which might mean there’s less time between when long-term alignment problems become obvious and start mattering and AI can start helping with them and the point where it’s catastrophic to have not resolved them.

Generally if clever ideas can shorten that period a lot, it’s a little bit bad. It’s a little bit less likely that the automation, like the AI will have an incredible overnight effect on the rate of hardware progress and will also presumably accelerate it. Automation will help there as well but–

Robert Wiblin: You think if compute is what predominantly matters, then it’s going to be a more gradual process. We’ll have longer between the point when machine learning gets, starts to get used for important things and we start noticing where they work and where they don’t work and when a lot of things are getting delegated to machine learning relative to the algorithmic case where it seems like you get like really quite abrupt changes in the capabilities.

Paul Christiano: Yes, I think a lot of that. This could also change the nature of AI research. A lot of that is from hardware being this very immature industry with lots of resources being thrown at it and performance being really pretty well understood and it would be hard to double investment in that and also it’s not that sensitive, it’s a weird question about quality of human capital or something. You just sort of understand it. You have to do a lot of experimentation. It’s relatively capital intensive.

Robert Wiblin: There’s quite big lags as well.

Paul Christiano: Yes. It just seems like generally it would be more stable. Sounds like good news. This is one of the reasons one might give for being more excited about faster AI progress now. You might think that probably the biggest reason to be excited is like, if you have faster AI progress now, you’re in the regime where we’re using– If you manage to get some frontier, we’re using all the available competition as well as you could then subsequent progress can be a little more stable.

If you have less AI progress now and at some point, people only really start investing a bunch once it becomes clear they can automate a bunch of human labor, then you have this more whiplash effect where you’d have a bust of progress as people really start investing.

Thoughts on Pushmeet episode

Robert Wiblin: A few weeks ago, we published our conversation with Pushmeet Kohli who’s an AI robustness and reliability researcher at DeepMind over in London. I guess to heavily summarize Pushmeet’s views, I think he might’ve made a couple of key claims.

One was that alignment and robustness issues and his view appear everywhere throughout the development of machine learning systems, so they require some degree of attention from everyone who’s working in the field and according to Pushmeet, this makes the distinction between safety research and non-safety research somewhat vague and blurry and he thinks people who are working on capabilities are also helping with safety and improving reliability also improves capabilities for you because you can then you can actually design algorithms that do what you want.

Secondly, I think he thought that an important part of reliability and robustness is going to be trying to faithfully communicate our desires to machine learning algorithms and this is analogous, although a harder instance of the challenge of just communicating with other people, getting them to really understand what we mean. Although of course it’s easy to do that with other humans than with other animals or machine learning algorithms.

A third point, was, I guess just a general sense of optimism that DeepMind is working on this issue quite a lot and are keen to hire more people to work on these problems and I guess they sense that probably we’re going to be able to gradually fix these problems with AI alignment as we go along and machine learning algorithms, will get more influential. I know you haven’t had a chance to listen to the whole interview, but you skimmed over the transcript. Firstly, where do you think Pushmeet is getting things right? Where do you agree?

Paul Christiano: I certainly agree that there’s this tight linkage between getting AI systems to do what we want and making them more capable. I agree with the basic optimism that people will need to address in getting assistance to tackle this ‘do what we want ‘problem. I think it is more likely than not that people will have a good solution to that problem. I think even if you didn’t have sort of long termist, maybe there’s this interesting intervention of, “Should long termists be thinking about that problem in order to increase the probability?”

I think even if the actions of the long termist are absent, there’s a reasonably good chance that everything would just be totally fine. In that sense, I’m on board with those claims, definitely. I think that I would disagree a little bit in thinking that there is a meaningful distinction between activities whose main effect is to change the date by which various things become possible activities, whose main effect is to like change the trajectory of development.

I think that’s the main distinguishing feature of working on alignment, per se. You care about this differential progress towards being able to build systems the way we want. I think in that perspective, it is the case like the average contribution of AI work is almost by definition zero on that front, because it’s bringing the entire– If you just increased all the AI work by a unit, you’re just bringing everything forward by one unit.

I think that doesn’t mean there’s like this well-defined theme which is, “Can we change the trajectory in any way?” and that’s an important problem to think about. I think there’s also a really important distinction between the failure which is most likely to disrupt like the long term trajectory of civilization and the failure which is most likely to be an immediate deal breaker for systems actually being useful or producing money and maybe one way to get at that distinction is related to the second point you mentioned.

Communicating your goals to an ML system is very similar to communicating with a human. I think there is a hard problem of communicating your goals to an ML system which we can view as a capabilities problem. Are they able to understand things people say? Are they able to form the internal model that would let them understand what I want or understand– In some sense, it’s very similar to the problem of predicting what Paul would do or it’s a little slice of that problem, like briefing under what conditions Paul would be happy with what you’ve done.

That’s most of what we’re dealing with when we’re communicating with someone. We’d be totally happy. If I’m talking with you, I would be like completely happy if I just managed to give you a perfect model of me then the problem is solved. I think that’s a really important AI difficulty for making AI systems actually useful. I think that’s less core to the– That’s less the kind of thing that could end up pushing us in a bad, long-term direction mostly because we’re concerned about the case– We’re concerned about behavior as AI systems become very capable and have a very good understanding of the world around them, of the people they’re interacting with and the really concerning cases are ones where AI systems actually understand quite well, what people would do under various conditions, understand quite well like what they want we think about as normal communication problems between people that are not motivated to act in sort of understand what Paul wants but aren’t trying to help Paul get what he wants and I think that a lot of the interesting difficulty, especially from a very long-term perspective is really making sure that no gaps opens up there.

Again, the gap between the problems that are most important in the very long run perspective and the problems that people will most be confronting in order to make AI systems economically valuable, I do think that there’s a lot of overlap, that there are both of these problems that people are working on that make AI systems more valuable and also helping very directly with the long run outcome.

I think if you’re interested in differentially changing the trajectory or improving the probability of things going well over the long term, you’re more inclined to focus precisely on those problems, which won’t be essential for making AI systems economically useful in the short term and I think that’s really distinctive to what your motivation is or why you– How you’re picking problems or prioritizing problems.

Robert Wiblin: One of the bottom lines here for Pushmeet, I guess was that people who want to make sure that AI goes well, they needn’t be especially fussy about whether they’re working on something that’s safety specific or on something that’s just about building a new product that works while using machine learning.

Robert Wiblin: Sounds like you’re a little bit more skeptical of that or do you think ideally people should in the medium term be aiming to work on things that seem they disproportionately push on robustness and reliability?

Paul Christiano: Yes, I think people who are mostly concerned about the long term trajectory, they face this dilemma in every domain where if you live in the world where you think that almost all of humanity’s problems, almost all of the most serious challenges to humanity are caused by things humans are doing or by things not only the humans are doing, but by things humans are doing that we would often think of as part of productive progress, part of the goal.

We’re building new technologies but those technologies are also the things that pose the main risks. Then you have to be picky if you’re a person who wants to change the long term trajectory just because the average– It’s just sort of like I probably am helping address those problems if I go to work and I just go do a random thing. I go work on a random project. Make a random product better.

I am helping address the kinds of problems we’re concerned about but I’m also at the same time contributing to bringing those problems closer to us in time. It’s sort of like roughly awash. If you’re on the average product, making the average product work and there are subtle distinctions we could make of like– I think if you are motivated to make products work well, if you’re like, “Not only do I want to do the thing that’s most economically valuable, I want to have more of an emphasis on making this product robust.” I think you’re just generally going to make a bunch of low level decisions that will be helpful. I definitely think that there’s a big– You’re going to have a pretty big impact by being fussy about which problems you work on.

Robert Wiblin: I guess there’s this open question of whether we should be happy if AI progress across the board just goes faster. What if yes, we can just speed up the whole thing by 20%. Both all of the safety and capabilities. As far as I understand there’s kind of no consensus on this. People vary quite a bit on how pleased they’d be to see everything speed up in proportion.

Paul Christiano: Yes. I think that’s right. I think my take which is a reasonably common take, is it doesn’t matter that much from an alignment perspective. Mostly, it will just accelerate the time at which everything happens and there’s some second-order terms that are really hard to reason about like, “How good is it to have more computing hardware available?” Or “How good is it for there to be more or less kinds of other political change happening in the world prior to the development of powerful AI systems?”

There’s these higher order questions where people are very uncertain of whether that’s good or bad but I guess my take would be the net effect there is kind of small and the main thing is I think accelerating AI matters much more on the like next 100 years perspective. If you care about welfare of people and animals over the next 100 years, then acceleration of AI looks reasonably good.

I think that’s like the main upside. The main upside of faster AI progress is that people are going to be happy over the short term. I think if we care about the long term, it is roughly awash and people could debate whether it’s slightly positive or slightly negative and mostly it’s just accelerating where we’re going.

Robert Wiblin: Yes, this has been one of the trickier questions that we’ve tried to answer and in terms of giving people concrete career advice. It seems to me if you’re someone who has done a PhD in ML or is very good at ML, but you currently can’t get a position that seems especially safety-focused or is going to disproportionately affect safety more than capabilities. It is probably still good to take a job that just advances AI in general mostly because you’ll be right in the cutting edge potentially of what’s going on, and improving your career capital a lot and having like relevant understanding of the key issues.

The work I guess you think is close to awash. It speeds things up a little bit, everything goes in proportion. It’s not clear whether that’s good or bad but then you can potentially later on go and work on something that’s more alignment specific and that is the dominant term the equation. Does that seem reasonable?

Paul Christiano: Yes. I think that seems basically right to me. I think there’s some intuitive hesitation with the family of advice that’s like, “You should do this thing, which we think is roughly awash on your values now but there will be some opportunity in the future where you can sort of make a call.” I think there’s some intuitive hesitation about that, but I think that is roughly right. Imagine if you offered Paul, if there were two possible worlds.
In one there’s twice as many people working on machine learning and AI but half of them really care about the long term and ensuring that AI has developed in a way that’s good for humanity’s long term and that sounds like a good trade. We maybe then have less chance, less opportunity to do work right now. I think that’s the main negative thing. There will be less time to think about the alignment problem per se but on the other hand, it seems really good if a large fraction of the field really cares about making things go well.

I just expect a field that has that character to be much more likely to handle issues in a way that’s good for the long term. I think you can scale that down. It’s easiest for me to imagine the case that for a significant fraction of the field, it is like that, but I think that if anything, like the marginal people at the beginning are having a probably larger, a better cost-benefit analysis for them.

Robert Wiblin: I was suggesting that this would be the thing to do if you couldn’t get a job that was like AI alignment specific already. Say that they want to join your team but they’re like just not quite good enough yet, they need to learn more potentially. There’s just only so fast that the team can grow. Even though they’re good, you just can’t hire as quickly as people are coming on board but I suppose you have to make sure if people go into these roles that we think are currently kind of just neutral but good for improving their skills that they don’t forget about that. That the original plan was at some point to switch to something different.

There’s a bit of a trap . It seems people just in general, they tend to get stuck in doing what they’re doing now and convince themselves that whatever they’re doing is actually really useful so you might think, “Yes, it would be good to go and then switch out but they might have some doubts about whether in fact he will follow through on that.”

Paul Christiano: Yes, I think that’s right. It would be an even happier world certainly if you took those half of people who might’ve gone into ML, and you’d instead moved them all into really thinking deeply about the long term and how to make things go well. That sounds like an even better world still. It seems to be pretty good if you really trusted someone to– If someone really cared about the long term, you’re like, “What should I do.” It’s a reasonably good option to just be like, “Go do this thing which is good on the short term and adjacent to an area we think is going to be really important over the long term.”

Robert Wiblin: There’s been this argument over the years that it would just be good in some way that we can’t yet anticipate to have people at the cutting edge of machine learning research who are concerned about the long term and alert to safety issues and alert to alignment issues that could play out or could have effects on the very long term. People have gone back and forth on how useful that actually would be to just be in the room where the decisions are getting made.

It just occurred to me that it seems the machine learning community is really moving in the direction of sharing the views that you and I hold. A lot of people are just becoming concerned about “Will AI be aligned in the long term?” It might be that if you’re particularly concerned about that now, then maybe that makes you different from your peers right now, but in 10 years’ time or 20 years’ time everyone we’ll have converged on a similar vision as we have a better idea of what machine learning actually looks like and what the risks are when it’s deployed.

Paul Christiano: Yes, I think that’s an interesting question or an interesting possible concern with that approach. I guess my take would be that there are some I don’t know if you’d call them values differences or deep empirical or worldview differences that are relevant here where I think to the extent that we’re currently thinking about problems
that are going to become real problems, it’s going to be like much, much more obvious there are real problems.

I think that to the extent that some of the problems we think about over the very long term are already obviously problems, people in the ML community are very interested in problems that are obviously problems. Or problems that are affecting the behavior of systems today. Again, if these problems are real, that’s going to become more and more the case over time and some people will become more and more interested in those problems.
I still think there are likely to be– There is this question of how much are you interested in making the long term go well versus how much are you doing your job or pursuing something which has a positive impact over the short term, or that you’re passionate about or interested in this other non-long term impact of. I do think there’s just continuously going to be some calls to be made or some different decisions. The field embodies some set of values.

I think that people’s empirical views are changing more than the set of implicit values that they have. I think if you just said everyone who really cares about the long term isn’t going into this area, then the overall orientation the field will persistently be different.

Robert Wiblin: Do you have any views on the particular technical approaches that Pushmeet mentioned in the episode or that the DeepMind folks have written up on their safety blog?

Paul Christiano: The stuff I’m most familiar with that Pushmeet’s group is working on is verification for robustness to perturbations. Some work on verification and more broadly and some work on adversarial training and testing. Maybe those are the three things, I don’t know if there’s something else. I’m happy to go through those in order.

Robert Wiblin: Yes, go through those.

Paul Christiano: I guess I’m generally pretty psyched about adversarial testing and training and verification. That is, I think there is this really important problem over both– This is one of those things at the intersection of like it matters over the short term. I think maybe matters even more over the very long term of like you have some AI system, you want to delegate a bunch of work to maybe not just one but a whole bunch of AI systems.
If they fail catastrophically, it would be really irrecoverably bad. You can’t really rule out that case with traditional ML training because you’re just going to try a thing out a bunch of cases that you’ve generated so far, experienced so far. You’re really not going to be getting–

Your training processes aren’t at all constraining. There’s potential catastrophic failure in any situation that comes up.

We just want to have something, we want to change the ML training process to respect– To have some information about what constitutes a catastrophic failure and then not do that. I think that’s a problem that is in common between the short and long term. I think it matters a lot on the long term. It’s a little bit hard to say whether it’s more on the long term or short term, but I care more about the long term..

I think that the main approaches we have to that are these– The three I really think about are adversarial training and testing, verification and interpretability or transparency. I just think people getting familiar with those techniques, becoming good at them, thinking about how you would apply them to richer kinds of specifications, how you grapple with these fundamental limitations and adversarial training where you’re like you have to rely on the adversary to think of a kind of case.

The way the technique works in general is like, “I’m concerned about my system failing in the future. I’m going to have an adversary who’s going to generate some possible situations under which the system might fail. Then we’re going to run on those and see if it fails catastrophically.” You have this fundamental limitation where adversaries aren’t going to think of everything.

It’s like people who are just getting experience, how do we grapple with that limitation? In some sense, verification is like a response to that limitation maybe the space between or when– I think it’s productive to have people thinking about both of verification and the limits of verification and testing and limits of testing. Overall I’m pretty excited about all of that.

Robert Wiblin: Do you share Pushmeet’s general optimism?

Paul Christiano: I don’t know quantitatively exactly how optimistic he is. My guess would be that I’m less optimistic in the sense that I’m like “Well, there’s like 10% chance that we will mess this up and lose the majority of the value of the future.” Whereas that’s not I get when listening to him, it’s not the overall sense I get of where he’s at. It’s a little bit hard to know how to translate between a vibe and an actual level of optimism.

Robert Wiblin: Yes, it is interesting. Someone can think there is a 20% chance that we’ll totally destroy everything, but still just a kind of a cheerful disposition. [laughs] I came across this. Well, things go well. Among people working on existential risks and global catastrophic risks, and I guess AI in particular, there’s this trade-off between not wanting to do things that other people disagree with or aren’t enthusiastic about, and at the same time not wanting to have a field that’s so conservative but that there are no experiments done unless there is a consensus behind them. Do you think people are too inclined to make ‘unilateralist curse type’ mistakes or not trying things enough?

Paul Christiano: I think my answer to this probably varies depending on the area. For reference, I think the policy you want to follow is: is the update on the fact that no one else wants to do this thing and then take that really seriously, engage with it a lot before deciding whether you want to do it. Ideally that’s going to involve engaging with the people who’ve made that decision to understand where they’re coming from.

I think I don’t have a very strong general sense of whether we’re more likely to make one mistake or the other. I think I’d expect the world systematically to make too much of the sort of thing can be done unilaterally so it gets done. In the context of this field, I don’t know if there are as many– Yes, I guess I don’t feel bad. I don’t feel super concerned about either failure mode. Maybe I don’t feel that bad about where people are at.

Robert Wiblin: The vibe I get in general from the AI policy and strategy people is that they are pretty cautious, quite cautious about what they say and what they do. I guess that’s been a deliberate decision, but I do sometimes wonder whether that’s swung too far in favor of not speaking out enough about their views?

Paul Christiano: Yes, I guess there are certainly people who have taken, and it’s a diversity of what people do, which I guess is the whole problem. I guess there are definitely people who take a very cautious perspective.

Robert Wiblin: I think that they sometimes get a bit cut out of the public discussion, because they’re just not inclined to speak out, which can be a loss at times.

Paul Christiano: Yes, I definitely think it seems you have a real problem, and you think that a positive part of your channel for impact is communicating your views, but then are very hung up on that or take a strong, shouldn’t communicate views because of unilateralist concerns. I guess in general, on the family of unilateralist concerns I’m least sympathetic to, is probably one intervention is to talk seriously about what kinds of mechanisms might be in place, and how we might respond, if it turned out that AI alignment was hard or if AI progress was rapid. That’s probably the place I’m least, overall sympathetic but I think the cost-benefit looks pretty good on that discussion.

Robert Wiblin: Could you say you default towards just on most issues, just even if you’re not taking action, express your true views?

Paul Christiano: Or at least to the extent there’s useful collaborative cognitive work of thinking about what should be done, how would we respond or what would happen. Being willing to engage in that work as a community, rather than people thinking in private. Maybe taking some care to be well, you don’t want to look say inflammatory stuff, you don’t want to get people really upset but you can be reasonable about it. I guess it’s not really my view is so much, I don’t care one way or the other. It’s more I’m ambivalent, I don’t think it’s obvious that there’s a big error one way or the other.


Robert Wiblin: All right. Let’s talk about something pretty different which is you recently wrote a post about why divesting from companies that do harmful things could, in moderation, actually be quite an effective way to improve the world. That’s in contrast to what most people who’ve looked into that as part under the rubric of effective altruism have tended to conclude, which is that it’s actually not that useful because if you sell shares in a company or don’t lend money to a company, then someone else will just take your place, and you haven’t really made any difference.
Do you want to explain the mechanism by which divesting from harmful companies, I guess like cigarette companies, for example, could be useful.

Paul Christiano: Yes, I think there’s two important things to say upfront. One is that I’m mostly thinking about the ratio of costs to benefits. You can end up for some companies in a regime where divestment has relatively little effect, but is also quite cheap. In general, I think the first epsilon divestment will tend to be literally free, the cost of second-order in terms of how far you divest in the benefits or first order. That is almost always going to be worth it to divest at least by that epsilon.

That’s the first part of the picture. This can be mostly a story about the costs being very, very low, rather than benefits being large, and if the costs are very, very low, then it’s mostly an issue of having to do the analysis and having to deal with the logistics in which case, I think it is plausible that one should– You can imagine really getting those costs down if someone both did the research and actually produced a fund.

I could imagine myself personally being sure, I will put 0.1% of my wealth in some fund. There’s just this roughly market-neutral thing that shorts all the companies I really don’t like the activities of and goes long on most of those correlated companies. That’s one thing, it might just be about the costs and benefits both being small, such as it’s not going to be a big deal for an individual investor is maybe not worth thinking that much about it, but if someone was willing to produce a product that could be scaled a lot, and everyone could just very quickly or very easily buy the fund, then they might do that.

Maybe the second thing in terms of how it could actually be possible or why it isn’t literally completely offset. I think the rough mechanism is when I get out of a company, like let’s suppose I care about oil and I divest from companies that are producing oil that increases as I divest more. The whole way that does good is by increasing the expected returns to investment in oil companies. The concern is other investors will just buy continue putting more money into oil companies until the expected returns have fallen to market returns because otherwise why not just keep putting more money in. The thing that simplified picture misses is that there is idiosyncratic risk in the oil industry namely as oil becomes a larger and larger part of my portfolio, more and more of the volatility of my portfolio is driven not by what is overall going on in the market which is the composite of many sectors, just volatility in oil in particular.

If I try and go overweight, like 10% oil, like there was a lot of divestment and people had to go overweight by 10% to offset it, they would actually be significantly increasing the riskiness of marginal oil investments. The returns that they would demand in order to offset that risk would also go up. There’s two things. One is like it actually does require doing a kind of, well, it depends a little bit how rational I believe investors are. In some sense, the divestment story, like the pessimism already relied on rational investors, so it’s maybe more reasonable to say, let’s actually dig in and see how rational investors would respond and do those calculations. That’s like yes, that’s my perspective. I think it’s unusually reasonable to look into that when pessimism is coming from this home economic model.

Once you’re doing that, then there are two questions. One is just qualitatively, how large are the effects that we’re talking about and it’s something I tried to run through in this blog post and a thing that I was surprised how large they were when I thought about a little bit.
Maybe the second observation is that actually there’s this cancellation that occurs where roughly speaking if the oil industry, it has no idiosyncratic risk or has very low idiosyncratic risk, your divestment will get almost entirely offset but at the same time it has almost no cost to you because the industry has almost no access returns because those returns would be tied to the idiosyncratic risk.

You end up with actually the fact that the cost-effectiveness doesn’t depend. This parameter which governs like how much your divestment is going to get offset, that cost-effectiveness doesn’t actually depend, or like the ratio between costs and benefits doesn’t depend on that parameter because it affects both costs and benefits equally. It affects like what is the overall upside to organizing this fund like how much will get offset but it doesn’t affect like how attractive is it for an individual investor. So like I think, like yes, we could go into the details and talk about how much it makes sense to divest. It might often make sense to divest completely or maybe even go like 100% or 200% short.

In industries you don’t like, it’s going to become better, like an example of divestment that might be particularly cost-effective is like suppose there’s two companies who are producing very similar products and so are very correlated like maybe two companies that both produce poultry and one of them has substantially worse animal welfare practices.

You might think there’s a lot of the risk in animal agriculture, in general, which is going to be experienced equally by those two companies. So if you’re long the company you like and short the company you dislike that has relatively little risk and it still has idiosyncratic risks specific to those companies and there’s a complicated analysis there, that like you can end up with relatively little risk compared to how much effect you have on capital availability for the two companies. We can actually talk about the mechanism by which this causes less bad stuff to happen in the world. Here we’re really just talking about why I’m skeptical of the skepticism.

Robert Wiblin: Yes. Let’s set that aside for a minute. Just to explain this in really simple language. The previous thinking has been that if you sell shares in a company then someone else who just doesn’t care about the moral issues to do with raising chickens or producing oil, they’re just going to swoop in and buy it at the same price and the share price won’t be changed or the amount of money that the company can borrow won’t really be changed. The thing that misses, is that if a decent number of people or even like a small number of people stop buying oil stocks then say, like, rich investment funds that don’t care about the moral issues, for them to go and buy even more of these fossil fuel companies that they don’t want to do that or their willingness to do that isn’t unlimited because they want to be diversified across all of the different assets in the world. In order to buy extra oil shares to make up for the fact that you and I don’t want to own them, they have to reduce the diversification that they have which is unappealing to them.

If a bunch of people or even just a few short or sell these shares, it actually probably will suppress their price a little bit because people will have to be compensated for the reduced diversification with a lower share price to make it more appealing to buy. That’s one thing. Also, while that effect might be pretty small in the scheme of things, it’s also the case that just like selling those first few shares of those oil companies, it wasn’t important to you to own those specific companies anyway. It’s just like you can like slightly reduce your diversification. Just sell tiny amounts of these companies that you own in your portfolio, which costs you practically nothing. Even though the benefit is quite small, the costs could potentially be even smaller because this just doesn’t matter that much. Then the ratio of benefits to costs could be pretty large even if like this is not the best way to have an impact in the world.

Paul Christiano: Yes. That’s right. If you want to think about what the total impact is like, it’s reasonable to imagine scaling this up to large numbers of investors doing it like a lot of the effects are going to be roughly linear in the relevant range. The total impacts are not that bad. They don’t look great but they look like if you imagine a change where everyone like large factions of people divested I think it would meaningfully decrease the amount of oil that gets extracted or the number of chickens raised in captivity, especially in cases where you have– I think maybe the oil case is a little bit unfavorable in this way compared to the chicken case where you could really imagine slightly shifting towards more practices of better animal welfare. Yes. You can imagine slightly shifting towards practices that are better for animal welfare or towards different kinds of meat or so on.

The total effect is probably not that big. The total effect may still be large enough to justify really getting the logistics sorted out. It’s very easy to do. It’s from the investor’s perspective other than the hassle of doing it, I think it’s actually pretty definitely the first unit is a very good deal.

Robert Wiblin: Well, couldn’t you just buy into an investment fund that doesn’t own oil companies or doesn’t own animal agricultural companies? That seems like the first part, that’s pretty straightforward to do.

Paul Christiano: Yes. Also, I’m thinking then, so when I buy a fund there’s a bunch of things that constrain my choices and it’s kind of annoying if now I have this extra constraint on top of those that might be a reason and–

Robert Wiblin: It’s not quite worth it?

Paul Christiano: Yes. Even if you slightly raise management fees on that fund. So like Vanguard is going to offer me some tinsy–

Robert Wiblin: 0.074.

Paul Christiano: Yes. Now I have to pay 0.1% on the same thing. That’s no good. I would normally imagine my baseline implementation would be a fund that shorts the relevant, the particular companies you care about and maybe also opens up the offsetting rates. The reason it was bad to sell this company is because we were losing diversification so they can try to do things to offset those costs as part of the same bundle. I would be very interested just seeing that there’s the optimal divestment fund for people who care about animal welfare or whatever. That just holds mostly if there’s really large short positions in the company that have the worst animal welfare effects.

Then also construct a portfolio to capture as much as possible the diversification benefits that those would have added to your portfolio. The cost of investing in that can be pretty low and you can just then put that on top of– do whatever else you would have done in investing, then take 0.1% of your money or whatever, 1% of your money and put it in this fund. On average that fund is going to make zero dollars and it’s going to have some risks. The cost to you is just the risk of this fund, that on average, is making no money but it could be relatively- if you’re not investing that much of your money in it, the risk is just not that bad.

Robert Wiblin: To what extent, if it is, is this analogous to being useful to not go and work at an evil company?

Paul Christiano: I think it is fairly analogous. There’s a bunch of quantitative parameters. If you take a certain economic perspective they’re very structurally analogous. The discussion we’re having about risk, which is important to determine some of the relevant elasticities is quite different from the analogous discussion in the case of working in a problematic industry. I think the overall thing is kind of similar. Well, if you don’t work in that industry overall what happens is that prices go up or wages go up a little bit in the industry and that induces more people to enter. We just have to talk about how much do wages go up.

One thing I sort of think about is if we consider say animal agriculture. It’s also kind of analogous to the discussion with ethical consumption. I think that’s actually a really good comparison point for divestments. Where you could say, “I want to consume fewer animal products in order to decrease the number of animals we get produced.” Then you have a very similar discussion about what are the relative elasticities are like. One way you could think about it is if you decrease demand by 1%, you decrease labor force by 1% and you decrease the availability of capital by 1%. If you did all of those things then you would kind of decrease the total amount produced by 1% roughly under some assumptions about how natural resources work and so on.

The credit for that 1% decrease is somehow divided up across the various factors on the supply side and demand side and the elasticities determine how it is divided up. I think it’s not like 100% consumption or like 100% of labor, I think all of those factors are participating to a nontrivial extent.

The comparison to ethical consumption I think looks reasonably good. I think under pretty plausible assumptions you’re getting more bang for your buck from divesting from– I haven’t done this analysis really carefully and I think it would be a really interesting thing to do and would be a good motivation if I wanted to put together the animal welfare divestment fund. I think under pretty plausible assumptions you’re getting a lot more bang for your buck from the divestment than from the consumption choices. Probably you’d still want like the consumption– the investment thing would be relatively small compared to your total consumption pattern. It wouldn’t be like replacing your ethical consumption choice. If ethical consumption was a good idea, then also like at least totally divesting and maybe even 10X leveraged short positions. When you would have bought one dollar of animal agricultural companies, instead you sell $10. I think stuff like that could be justified if you thought that ethical consumption was a good thing.

Robert Wiblin: Do you just want to map out, or sketch out briefly for those who are skeptical, how it is that selling shares in a company or selling bonds in a company reduces the output of that company?

Paul Christiano: Yes. I think the bond case is a little bit simple to think about though I probably think they’re probably about the same. Let’s talk about the bond case. This is the company Tyson wants to raise a dollar. They’ll go out to investors and say, “Give us a dollar now and we’ll give you some amount of money 10 years from now assuming we’re still solvent.” That’s their pitch. They’re selling these pieces of paper to people which are like ‘I owe yous’. The price of those ‘I owe yous’ or how much the ‘I owes yous’ have to be forged are set by supply and demand amongst investors.

What happens when you short? The bond is– someone came to this company and wanted to loan them a dollar and they’re saying, “Don’t loan them a dollar. Instead, loan me a dollar, and whatever it is that they pay back to their bondholders, I’ll pay it back to you instead.” They’re like, “Fine. I’m just as happy to lend to you as I was to lend to the actual company.” Now the company has one less dollar. Now the company’s like, “We still need to raise that dollar if we want to produce this additional marginal check-in.” Now the company goes and tries to raise the dollar but they’ve used up one of the willing buyers. Now they need to find another buyer, someone who’s willing to loan them this dollar.

That person is going to be a little bit less excited because, again, their portfolio is a little bit more overweight in this company, so they’re a little bit more scared about the risk of this company going under. Roughly speaking, that’s the mechanism.

Robert Wiblin: I think it makes sense. You imagine what, if just a lot of people weren’t willing to lend money to a company with significant number and this drives up their borrowing cost, and so the company shrinks because they have to pay a higher interest rate. They can’t get as much capital. It kind of makes sense on an intuitive level. Some of this gets a little bit technical, so we’ll stick up a link to the blog post that you write with all the equations, explaining how you worked this through and try to estimate the size of the benefits and the costs.

Paul Christiano: I’m concerned it’s not the most careful or clear analysis to people. I think I’m interested and I think at some point, will have a more careful version that I put up. Just a fun exercise for me.

Robert Wiblin: You make some points that I haven’t seen anywhere else, and that actually might shift the conclusion. That seems like probably the most important thing that people need to take onboard.

Paul Christiano: That’d be super interesting to me if I actually ended up with the divestment fund that was reasonably the long-short fund that was recently constructed and cost-effective. That would be kind of cool. Also, sorry. I said they’re like compared with ethical consumption. I think one thing I want to stress there, is that the way that work is getting done is just because of the slight changes on the margin being very effective. It’s very similar to vegetarians. If you just stop eating meat in cases where it was really marginal, that has a lot more bang for your buck than if you go all the way.
It’s the same thing here. It’s not going to be competitive with the first unit of stopping eating meat. It’s going to be competitive with going all the way to the last bits.

Robert Wiblin: It’s going to be a little bit embarrassing if effective altruist aligned folks have been saying divestment is a waste of time for all these years, and it turns out that we’re pretty wrong about that. [laughs] Going to have to eat humble pie. I suppose it also looks good that we’re updating our views so we’re not just stuck with dogmatic positions.

Paul Christiano: I think we also most likely end up with some compromise where we’re like, look, the impacts are a lot smaller than people will often implicitly assume when they are pitching this. It is, on balance, like a reasonable thing to do. Maybe we shouldn’t have been quite so down on it.

Robert Wiblin: The costs are negligible a lot of the time.

Paul Christiano: Or it is really a social thing of the cost of just people believing. The only difficulty to make it happen is people believing they should it. Therefore, it’s like a reasonable– If there’s a change people can make that costs them almost nothing, it’s a particularly reasonable thing to advocate for people to do.


Robert Wiblin: Let’s talk about s-risks for a minute. Some listeners will know, but some people won’t, that s-risks is this term people have settled on to describe possible future scenarios that are just neutral where humans go extinct and then there’s nothing, or it’s not very good, where humans stick around but then we just don’t make the world as good as it could be, but rather worlds where there’s astronomical levels of bad things. I guess S, in this case, stands for suffering because a lot of people tend to be concerned that their future might contain a lot of suffering.

It could also just include any future that is large in the sense that a lot of stuff is going on, but it also contains a lot of bad stuff rather than good stuff. Some of the ways that people are aware this could happen involved in artificial intelligence that doesn’t share our goals. What’s your overall take on s-risk as a problem to work on?

Paul Christiano: I think my best guess is that if you go out into a universe and optimize everything as being good. The total level of goodness delivered is commensurate with the total amount of badness that will be delivered if you went out into the universe and optimize it for things being bad. I think to the extent that one has that empirical view or like that maybe moral, some combination of empirical view and moral view valid the nature of what is good and what is bad, then S-risks are not particularly concerning because people are so much more likely to be optimizing the universe for good.

So much more in expectation, the stuff in the universe is optimized for exactly what Paul wants rather than exactly what Paul doesn’t want. That’s my best guess view. On my best guess view, I think this is not a big concern. I think I do have considerable moral uncertainty. I guess the way that I would approach moral uncertainty, in general, would say that even if an expectation, it’s sort of hard to talk about it in comparing the expectations of outcomes across these very different moral views. This is one of the cases that gets the comparison that’s difficult because of those weird difficulties with inter-theoretic utility comparisons.

The way I would normally think about this kind of case is to say I should put reasonable priority or reasonable interest in reducing S-risks, if I put a reasonable probability on views on which the total amount of possible badness is much larger than the total amount of possible goodness which is where I’m at.

I think it’s not likely, but plausible combinations of empirical and moral views on which they’re very important. That’s my starting point of taking this as a thing which I’m not going to put much on because I don’t find that perspective typically appealing when it’s not going to be a large fraction of my total concern. It deserves some concern because it’s a plausible perspective.

Robert Wiblin: The naive take on this might be, “Why would we worry about these scenarios because they seem really outlandish? Why would anyone set out to fill the universe with things that are really bad? That just seems like a very odd thing to do.” Once you’re at the level of sophistication where, yes, you can go out and colonize space and great astronomical massive stuff, why are you filling it with stuff that’s bad.

Then there’s something to be said for that but then people try to think about scenarios in which this might happen, which might involve conflicts between different groups were one of them threatens the other that they’re going to do something bad and then they follow through in doing that, or potentially where we don’t realize that we’re creating something that’s bad.

You might create something that has a lot of good in it, but also has a bunch of bad in it as well and you go out and spread that. We just don’t realize that we’re also as a side effect to creating a bunch of suffering or some other disvalue. How plausible do you think any of these scenarios are?

Paul Christiano: I guess the one that seems by far most plausible to me is this conflicts, threats and following through on threats model.

Robert Wiblin: Not just moral error potentially?

Paul Christiano: I think it’s hard to make a sufficiently extreme moral error. There might be moral error that’s combined with threats they could follow through on but I think it’s hard for me to get- the risk from that is larger than the risk for me getting what is good, very nearly exactly backwards. It’s not totally impossible to get things exactly backwards. It’s more likely than hitting some random points in the space for a wide variety of reasons but I think it’s still a minority of my total concern. Most of it comes from someone wanting to just destroy shit because they wanted to have a threat of destruction or destroying value, so that’s what I would mostly be worried about.

Robert Wiblin: How plausible is it? It seems like you think it’s conceivable but pretty unlikely, so it’s like you pay a little bit of attention to it, but it’s not going to be a big focus. Is that kind of the bottom line?

Paul Christiano: Yes, so when I was talking before about these comparisons, how being conceivable means it gets a little bit of priority. That was more with respect to sort of moral views or aggregation across different values and how much weight I give them?

Well, I think the key question is how much credence do you place on views where the worst outcomes are much more bad than the best outcomes are good and then I think that those views basically are going to recommend if the ratio was large enough, just focusing entirely on minimizing this risk of really bad stuff. I think regardless of one’s empirical view, it’s worth to put in some amount of attention to reducing the risk of really bad stuff.
In terms of how plausible it is, that’s still important to understand the basic shape of what’s up. I don’t really have a conservative view on this. I think the answer is, it’s relatively unlikely to have significant amounts of disvalue created in this way but not unlikely, like the one in a million level, unlikely, more like the 1% level.

As a question of when they’re bad, what fraction of the badness is realized compared to the worst possible outcome and how much of the universe’s resources go into that, and that estimate is not very stable. That’s where I’m at.

Robert Wiblin: You made this argument at the start that it seems naive that you would think that it’s as easy to create good things, as to create something that’s equivalently bad and so more future beings are going to want to create good things and bad things, so we should expect the future to be positive. How confident are you that actually is true, that it’s symmetrically easy to create good and bad things?

Paul Christiano: When you say symmetrically to create good and bad things, I think it’s worth distinguishing, being clear about what exactly that means. I think the relevant thing here, assuming that we’re linear, things twice as big or twice as good or bad, then the relevant question is just what is your trade-off?

Suppose you have a P probability of the best thing you can do and a one-minus P probably the worst thing you can do, what does P have to be so it’s the difference between that and the barren universe. I think most of my probability is distributed between you would need somewhere between 50% and 99% chance of good things and then put some probability or some credence on views where that number is a quadrillion times larger or something in which case it’s definitely going to dominate. A quadrillion is probably too big a number, but very big numbers. Numbers easily large enough to swamp the actual probabilities involved, a quadrillion is just way too big. I should have gone with ‘a bajillion’, which was my first…

Anyway, in terms of how confident I am on the 50% or on the 50% to 99%, I think I would put a half probability or like weight a half to a third on the exactly 50 or things very close to 50% and then most of the rest gets split between somewhat more than 50% rather than radically more than 50%.

I think that those arguments are a little bit complicated, how do you get at these? I think to clarify the basic position, the reason that you end up concluding it’s worse is just like conceal your intuition about how bad the worst thing that can happen to a person is the best thing or damn, the worst thing seems pretty bad and then the like first-pass responses, sort of have this debunking understanding, or we understand causally how it is that we ended up with this kind of preference with respect to really bad stuff versus really good stuff.

If you look at what happens over evolutionary history. What is the range of things that can happen to an organism and how should an organism be trading off like best possible versus worst possible outcomes. Then you end up into well, to what extent is that a debunking explanation that explains why humans in terms of their capacity to experience joy and suffering are unbiased but the reality is still biased versus to what extent is this then fundamentally reflected in our preferences about good and bad things. I think it’s just a really hard set of questions. I could easily imagine maybe shifting on them with much more deliberation.

Robert Wiblin: Yes. How do you think technical AI research or your focus would change if preventing S-risks became a high priority?

Paul Christiano: I think the biggest thing is this understanding better the dynamics that could possibly lead to bad threats being carried through on and understanding how we can arrange things. It’s less likely for that to happen. I think that’s the natural top priority.

Robert Wiblin: I heard an interesting suggestion for how to do that recently which was– Concern you might have is that someone would threaten to create the thing that you think is really disvaluable. Let’s say I’m concerned about, I don’t want suffering to exist in the future. That leaves me open to someone threatening to create a suffering in order to get me to concede on some other point. But I could potentially avoid that risk by say, changing myself so that I also disvalue something that was actually not important at all. Let’s say I want to- I also really don’t like there being– I don’t want there to be like flying horses or something like that. Something that doesn’t exist.

In that case, if someone wanted to extort me there or wanted to threaten me then they can instead rather than threaten to create suffering, they would instead have the option of threatening to create flying horses which currently don’t exist but they could threaten to create them. Potentially, I could change my value such that it’s more efficient to create that than it would be to create suffering and so that would be the most efficient threat I think to threaten me with. It’s kind of this spillover like part of your utility function that protects you from threats about the things that you previously cared about. Do you have any reaction to that idea or things in that vein?

Paul Christiano: I think my initial take seemed crazy and then since then I have become significantly more enthusiastic about it or it seems plausible. I think actually one of the like I was giving out a prize last year for things that seemed relevant to AI alignment or to AI leading to a good outcome. I think one of them, Caspar from EAF, gave some version of this proposal. He submitted some proposals along these lines and at that point I thought about it more and I was somewhat compelled. I think since then he’s continued to think about that and that seems interesting.

I think a perspective on that that I find somewhat more plausible than like don’t care about that thing is, you could say I care a lot about this random thing like how many flying horses there are. You could also take this perspective that’s kind of like a big bounty. If you were to demonstrate to me convincingly to you could have run this strategy, it would have had a significant chance of causing extreme disvalue and would have coerced me into doing X. It would have in fact caused me to do X. You can just demonstrate that sufficiently convincingly and then really once you’ve persuaded me of that I’m like, “Hey, fine.” You can have whatever outcome you would have in fact achieved, an outcome which is from your perspective like incrementally better than whatever outcome that you would have achieved by carrying through this risky policy.

It’s not clear. I think it’s incredibly complicated. I’ve started to spend a little bit of time thinking about this and it’s just incredibly complicated to figure out if it’s a good idea. Well, not if it’s a good idea but whether it really works. I think it’s like a thing I’ll be interested in people thinking more about it. It’s definitely one of the things I’ll be doing and understand the conditions under which bad threats could fall through on. I think that makes less difference than other common-sense cool interventions like avoiding the situation where there are people threatening each other. It is a lot easier to make, to get into flexible traction on these more obvious open-questions there.

Robert Wiblin: One reason that people work on S-risks is that they are more worried about preventing bad things than they are about creating good things. Another rationale might be, even if you are symmetric in that point I would be that there’s more people working on trying to prevent extinction or trying to make the future go well than there are people worrying about the worst-case scenarios and trying to prevent them, so it’s like potentially right now a more neglected problem that deserves more attention than it’s getting. Did you put much, or any weight on that?

Paul Christiano: I think ultimately I mostly care about neglectedness because of how it translates to tractability. I don’t think this problem is currently more tractable than– I don’t feel like it’s more tractable than AI alignment. Maybe they seem like they’re in the same ballpark in terms of tractability. I don’t think, part of it is like a harder problem to deal with. It also have these concerns where it’s not– There are a bunch of reasons, it’s like maybe less tractable on its face than alignment.

Robert Wiblin: Why is that?

Paul Christiano: I think the basic source of a lot of difficulty is that part of the source is the threat model for alignment is incredibly clear like you have this nice model in which you can work. You understand what might go wrong. I mean it’s absurd to be comparing alignment to a problem and be like it’s incredibly clear and concrete. That basically never happens. Anyway, but in this one comparison, we can be like it’s unusually much more clear and concrete whereas here we’re like, “Geez.” It’s quite a fuzzy kind of difficulty and the things that we’re going to do are all much more like bing shots. I don’t know, I think it’s a messy subject.

Robert Wiblin: Quite a lot of people think that these risks of bad outcomes and threats are more likely in a multipolar scenario where you have a lot of groups that are competing over having influence over the future and I guess over potentially the use of artificial intelligence or whatever other technologies end up mattering. Do you share that intuition?

Paul Christiano: I think it’s at least somewhat worse. I don’t know how much worse like maybe twice as bad seems like a plausible first pass guess. The thing is turning a lot on how sensitive people are threatening each other in the world. That seems bad. That’s like one major source of threats and it’s like if you have fewer, if you have less rapid competition amongst people you’d expect to have less of that going down. Some questions about how sensitive the number of threats people are making against each other to amounts of polarity seems pretty sensitive. Then what fraction of all threats occur say over the next hundred years and that kind of dynamic.

Robert Wiblin: Do you have any thoughts on coordination between people who are mostly focused on making the future contain good things and people who are mostly focused on making sure that it doesn’t have bad things?

Paul Christiano: Mostly, I think the reason they’re going to end up coordinating is via they’re pursuing similar approaches like cognitive styles for thinking about the situation and people should be coordinating. Generically, it’s nice if people can coordinate even if their goals are on the object level– Even if you have totally orthogonal goals, it’d be good if you share resources and talk and benefit from the resources of each other and so on. We do have somewhat- people normally don’t have extreme bias of one end to the other. That’s the main channel for coordination, you’d also hope for some kind of more cooperation, just through overlapping objectives that happen to serve both. I think that’s like a less important channel here like both communities can be happier and healthier and get along better if we all care to some extent about these different things and we, therefore, should all help each other out.

Meta-ethics and AI

Robert Wiblin: Let’s talk a second about philosophy and ethics and AI. What role do you think different theories of metaethics play in AI alignment and potentially AI alignment research?

Paul Christiano: I think there are two qualitatively different ways that philosophical progress could affect AI alignment. One is on the object level thinking that the work we need to do when aligning an AI involves clarifying some philosophical questions. A different one is like the way that you’ll approach alignment depending on your views on some of those philosophical questions. I think it’s worth distinguishing those.

So on the object level, if you thought that you had to understand some process which would ultimately converge to a correct understanding of the good and that you had to directly impart that into an AI system you built, then it’d be a really rough position where you either have to solve a bunch of philosophy yourself, what Wei Dai calls metaphilosophy, understanding by what process humans arrive the truth when doing philosophical inquiry. That seems pretty rough.

Then there’d be this really tiny object-level connection where you might even end up saying these are basically the same problem. I think that’s a perspective maybe that’s closer to where I was six years ago and I’ve really shifted a lot towards like, “Look, you just want to have a system that is– The thing you want to be clarifying is this notion of control and course-correction.” You want to say we want the construction of the AI to not make anything worse. We want to end up in a position like the one we’re currently in where we get to continue going through the same process of deliberation and understanding the ways in which that process goes well or poorly and correcting them and avoid– We want to be making essentially as few ethical commitments as we can at the point we’re constructing AI. And I’ve become much more pessimistic about any approach that essentially involves any hard philosophical commitments. I think we’re still end up making some and I’m happy to talk about the ones that we most likely to make but I don’t think– I think if things go okay, it’s probably because we can dodge most of them.

Robert Wiblin: Why have you become more pessimistic?

Paul Christiano: I think in part it’s from technical details of the case and just thinking about how different approaches to alignment might play out. By that I mean something like I think you have to be leaning on this mechanism of course-correction by humans or like deferring to this process of human deliberation a lot anyway to have any hope apart from philosophical issues and once you’re leaning on it, in general, you might as well also just lean on it for answering these questions. In part, it’s just becoming a lot more optimistic about the prospects for that. You might ask how important is it that I understand what you want to happen with the universe as a whole when it goes out and acts on your behalf. I’ve opted a lot towards it’s okay if it doesn’t really understand– Even really pessimistic cases where stuff is getting crazy.

There’s going to be like six minutes between when you start your AI and when the colonization of the universe begins. I think even then it’s like basically, if it doesn’t understand what you want for the universe that much. Just understands like, “Look, here’s me.” Put you in a box somewhere. Now, let’s start colonizing the universe and then eventually make some space for your box to sort out what humanity should do like a little civilization. I plan it somewhere out in the backwoods trying to figure out what is good and then ultimately the process just remain responsive to the conclusions of that deliberation.

The thing that has to understand what does it mean to protect humanity and allow humanity to develop and mature in the way that we want to develop and mature. Then what does it mean to ultimately be responsive to what that process concludes? To be correctable, once humans figure out what it is we want in some domain. Allow that understanding to ultimately affect the behavior of this scaffolding of automation we’ve built up around us. There’s maybe one last more technical question that comes up there. You might think philosophy would affect the value of different kinds of resources and there’s some more– I think you can sort of dodge these kinds of dependence once you’re more careful about these arguments

Robert Wiblin: How do you feel about the idea of the long reflection of this idea that we don’t really know what’s valuable. It seems like we might have a better shot if we get our best people to think about it like thousands of years or just a very long time until we decide what would be the best thing to do with all of the resources that we can get in the universe. Is that a sound idea?

Paul Christiano: I think viewing it as a step which occurs is probably not quite right. I think I’m pretty on board with the idea that is like the process of deliberation, understanding what is good and we’re currently engaged in. I see most of the action in AI is allowing that process to continue happening.

I think it can be that processes like to couple ultimately from most of the economic expansion through the universe which is decoupled from this process of ongoing deliberation, understanding what we want. You can imagine a world– where humans are living on earth while out there in space, a bunch of crazy shit is going down.
AI is waging wars and building machines and stuff, and the humans are just like we’re doing our normal thing on earth. Sometimes we need futurists to more think of that as a sideshow where all the action was now off with the crazy stuff the AIs are doing.

The effect was more shifted to– Actually, that’s where a lot of the action is. The overall evolution of our values. We choose what we want to be the store of value. I don’t think it’s like a person going off and thinking for a thousand years it’s more like there’s a trajectory along which our civilization is developing, and we’re thinking about how that trajectory should be.

I think that one of the things that happens– One of the hopes of AI alignment is to decouple that process of ongoing deliberation from the process of remaining economically competitive. It’s a really hard problem to understand what that deliberation should look like.
That’s like another one of the big ways. When I said at the beginning, that I’d slightly downgraded my overall sense of how important alignment was to other parts of the problem. A lot of that has been up -weighting how important making natural deliberation go well is .
That’s really from a long term perspective other than alignment that seems like probably top priority. I think that’s also worth– At the beginning, we talked about metaethics, I made this distinction between object level and meta-level influences.

It’s worth bracketing . Here we’ve just been diving in on the object level. I’m like happy to keep going with that. It’s worth saying briefly that at the meta level that like, I do think that your approach to alignment about like, how important, how valuable different kinds of outcomes are, depends on the answers to some hard ethical questions related to this like how much do you care about ” the lizard people”?

That’s a hard question in moral philosophy and similar hard questions are how much do you care about this AI system that you’re building? If you want to do something totally different from humans, how much should you be like, well, we have some values AI some values, we are happy.

I think we talked about this a little bit on the last podcast. The answers to those kinds of moral questions does have an effect on how you go about alignment or how you prioritize different aspects of the problem.

Robert Wiblin: What are your views on moral realism and anti-realism? Do those affect what AI alignment work seems most important?

Paul Christiano: I’m definitely pretty anti-realist. I think we get a little bit into semantic weeds here when I’ve had long discussions with people about it. I think there’s this question which feels like a realist question of, “Is deliberation going in the right direction?”
I don’t think you could have a version of the anti-realist perspective where you’re like it doesn’t matter how you deliberate, you’re going to come to some conclusions and those are fine. I don’t endorse that. There’s another version of this perspective where you shouldn’t deliberate, you should just endorse your current conclusions because that’s what you like.
I don’t endorse that either. I’d say like look, right now there’s some kind of processes in deliberation and growth, but I endorse the output of this. In some way I want our values to evolve. In some sense, you could say that what I care about is the endpoint of that deliberative process.

The endpoint of that potentially very long process of evolution maturation. I think I philosophically don’t think there would necessarily be convergence across different– I think different process and emotion would arrive at different conclusions.

I think there is a very hard problem of having an echo in the right direction. It’s a little bit awkward as a non-realist to be like what is the right direction mean here? The realist has this nice, easy answer like is actually converging to the good.

I think that’s just a linguistic thing where they happen to have a nice– again, the whole thing is semantic differences. It’s just like there’s some concepts that are slippery or hard to talk about and think about for the non-realist.

I think that’s because they are, in fact, for the realist, let’s just push down to the slipperiness and complexity of the actual concept of good. In terms of what I find … my view overall on the object level questions that might affect how to prioritize different parts of alignment is I don’t think there’s that much convergence.

I think it’s quite plausible that an AI would be smart and would do its own thing. That’s somewhere between a barren universe and actually achieving the optimal outcome and I don’t think it’s very close to barren. I don’t think it’s very close to the optimal outcome.
I lean towards like a 50/50 prior for like I’d be okay… I’d treat it half as bad as extinction. Like, to have some random AI doing this random thing with the universe.


Robert Wiblin: We’ve been here for a while and we should wrap up. Earlier we were talking about how creatine could potentially make people a whole lot smarter, or sorry, three IQ key points smarter. [laughs] A third of a standard deviation it’s as good as it gets potentially.

Do you have any view on nootropics and kind of the drugs and life-hacking stuff that people try to use to make themselves fitter and more intelligent? Is there much mileage to be gotten out of that?

Paul Christiano: I’m definitely pretty excited about investigating where it feels like a thing that the medical establishment is not really not into but like very not into. I think there’s a reasonable chance that there’s something I think there are a few cases where the current state of the literature would — If you took everything at face value, you’d be like it does seem like I don’t know. I think maybe Piracetam is in this position, but I’m not totally sure. There’s a few other possible candidates. If you actually believed all the studies and just took that at face value, you would think that there’s reasonable gains from some of these.

I think probably that’s just not the case. Everyone understands that these are older studies and there have probably been failed replications that haven’t been published. It would be pretty nice to go through and check them all and I’d be pretty excited about that.

I’m pretty interested in general– I think the thing I would most like to see– I generally have this question of why would it be the case that some simple changes to brain chemistry would improve performance but not be made by evolution. You want to see what the countervailing consideration is.

Robert Wiblin: That does potentially explain why it has been pretty hard to find anything that seems like it works really well. If it was as easy as that then evolution would have done it.

Paul Christiano: Yes, so I think you’re going to have to exploit some distributional change or some cost which is now a cost that wasn’t a cost historically. I think the two best candidates are basically, one, you can exploit this hypocrisy angle. People have this thing they want to do like, “I want to make the world better,” but at some level, it looks like your biology is not at all optimized for making the world better. It’s optimized for having descendants from descendants. One thing is if you want to pursue what you nominally want to do and you can hope that there are some drugs that just make you better at achieving the thing that you set out to do. Even when that thing is not in line with the thing that your body is optimized for achieving. I think that’s a mechanism of action in some cases and seems pretty realistic. It’s something I’ve been really scared of.

Then the other one, the thing that would be most satisfying and excellent would be it just burns more energy. In an evolutionary environment, you didn’t want to run your brain hot because it’s kind of a waste and you’re only getting marginal benefits but if you just could do that, that would be super great.

Robert Wiblin: Just overclock it because it’s like, “We’ve got so much food now.”

Paul Christiano: So much food. [laughs]

Robert Wiblin: You told me I’m going to burn an extra 400 calories a day and be marginally smarter, I’m like, “That’s good on both sides.”

Paul Christiano: Yes. [laughs] That would really be the best case definitely. I’m a little bit confused about where stimulants stand on that. Usually, my understanding for caffeine is that if I take caffeine I’m probably going to drive up blood pressure and drive up energy expenditure, at least at the beginning. Then if I keep taking it, probably within two weeks, at least on the blood pressure side, I’m going to return to baseline. I would like to understand better what long-term effects are on cognitive performance and energy use.
I’d similarly like to understand whether those long-run effects. Is it the case that if you take something over and over again then eventually it stops having impact. Or is it the case that if you take it even once you have the bounce-back period over the next couple days such that the long-term effect is the integral of this thing that are showing that the integral is zero. I don’t know, anyway, it seems plausible to me there’s some wins, probably mostly through those two channels of either getting it to do what you think you should do, or trying to do, or else bring in more energy.

I have personally have not prioritized experimenting with that kind of thing. In part, because I am really bad at introspection, so I cannot tell even if I’m in a very altered mental state. Partly because I think it’s really hard to get results with n=1, but I’m pretty excited about more experimentation on those things. It’s really hard probably because the medical establishment really seems to hate this kind of thing. Some of the likely winners are also not legal which sucks. Like, prima facie, amphetamines would probably be the most natural candidate.

Robert Wiblin: Do you think they would be good in the long-run? It does seem like the body is so good at adapting to stimulants that most of the time they work for the first week or the first month but then you’re back to baseline and now you’re just taking this thing that just returns you to normal.

And while almost all of these cognitive enhancers are legal, with a handful of them like amphetamines, there’s a risk of getting into legal trouble if you take them, or even not being able to get a security clearance in the future if you’ve taken them, which is a really serious downside for anyone who might go into policy careers in future. And people who might really want or need a security clearance at some point make up a large fraction of all our listeners.

So in those cases, it’s clear you should stay away.

But, basically, I don’t recommend that people pay much attention to nootropics, both because of the fact that for many the evidence they work is weak in the first place, combined with the fact that the body is so good at adapting to and undoing the effect of almost anything you take after a few weeks.

Paul Christiano: That’s a lot of my concern. I mean you could imagine. I think people have a sense that like under– Some people are prescribed stimulants for example, have the sense that under-repeated use, you do get some lasting advantage or at least lasting treatment effect. It would be good to check whether that’s actually the case or whether that’s shuffling things around and zero-sum. It does kind of feel like once you’re burning more energy, there’s not really a good reason. It’s like, if I’m trying to take some Piracetam and hope that I think better. There’s sort of a good reason to expect to fail and then lose to this evolutionary argument.

In the case of a stimulant which is initially causing you to use more energy and think better, there’s not really a great reason to expect it to break down. So there’s hope at least. I think in my ideal world you’d really be throwing a lot of energy like, “Can we find a win here because it sure seems plausible.”

Robert Wiblin: One hope I’ve had is that even if the brain is– It seems like if you take a stimulant and then you get a benefit from it but your body is adapting to it and then you have to have a time when your body de-adapts from it in order to take it again. If you say take it all the time for a week and then don’t take it for a week in order to flush out the adaptation from your system, that doesn’t seem so great. Potentially, you could do this every day, so that you’re a bit more awake during the day and then you de-adapt to it while you’re sleeping and that seems potentially good on both sides. I’m not sure whether the de-adaptation over the 12 hours that you’re not taking it is really sufficient to make it worthwhile.

Paul Christiano: Yes, I think it feels plausible to me there’s something here that works. It’s plausible to me that you should just run with something dumb like, “I thought about it a little bit on paper, I tried some.” I think it’s hard to do the experiments due to the combination of that the medical establishment hates it, the instruments are a little bit of the hard thing, you’d have to get over that.

Also, the most high-leverage options probably are not going to be legal, which makes it just overall less appealing. You can try this with caffeine but it seems just probably less wins.

Robert Wiblin: Well, on that inspiring note, my guest today has been Paul Christiano. Thanks for coming back on the podcast, Paul.

Paul Christiano: Yeah. Thanks for having me.

Robert Wiblin: There’s lots of links in the blog post attached to this episode to papers and blog posts where you can learn more about the things we covered over the last two hours.

As I mentioned at the start of the show, in the show notes we link to a 40 minute MP3 where Paul and I have a particularly confusing and bonkers conversation about decision theory, and what it might mean if some non-standard solutions turn out to be right.

I don’t especially recommend listening to it, which is why we cut it from the episode — but people who are interested in decision theory might enjoy it, either to learn how we think about the problem, or as unintentional comedy.

Alright, the 80,000 Hours Podcast is produced by Keiran Harris.

Thanks for joining — talk to you in a week or two.

Related episodes

About the show

The 80,000 Hours Podcast features unusually in-depth conversations about the world's most pressing problems and how you can use your career to solve them. We invite guests pursuing a wide range of career paths — from academics and activists to entrepreneurs and policymakers — to analyse the case for and against working on different issues and which approaches are best for solving them.

The 80,000 Hours Podcast is produced and edited by Keiran Harris. Get in touch with feedback or guest suggestions by emailing [email protected].

What should I listen to first?

We've carefully selected 10 episodes we think it could make sense to listen to first, on a separate podcast feed:

Check out 'Effective Altruism: An Introduction'

Subscribe by searching for “80,000 Hours” wherever you get podcasts, or click one of the buttons below:

If you're new, see the podcast homepage for ideas on where to start, or browse our full episode archive.