#44 – Paul Christiano on how OpenAI is developing real solutions to the ‘AI alignment problem’, and his vision of how humanity will progressively hand over decision-making to AI systems

Paul Christiano is one of the smartest people I know and this episode has one of the best explanations for why AI alignment matters and how we might solve it. After our first session produced such great material, we decided to do a second recording, resulting in our longest interview so far. While challenging at times I can strongly recommend listening – Paul works on AI himself and has a very unusually thought through view of how it will change the world. Even though I’m familiar with Paul’s writing I felt I was learning a great deal and am now in a better position to make a difference to the world.
A few of the topics we cover are:
- Why Paul expects AI to transform the world gradually rather than explosively and what that would look like
- Several concrete methods OpenAI is trying to develop to ensure AI systems do what we want even if they become more competent than us
- Why AI systems will probably be granted legal and property rights
- How an advanced AI that doesn’t share human goals could still have moral value
- Why machine learning might take over science research from humans before it can do most other tasks
- Which decade we should expect human labour to become obsolete, and how this should affect your savings plan.
—
Here’s a situation we all regularly confront: you want to answer a difficult question, but aren’t quite smart or informed enough to figure it out for yourself. The good news is you have access to experts who are smart enough to figure it out. The bad news is that they disagree.
If given plenty of time – and enough arguments, counterarguments and counter-counter-arguments between all the experts – should you eventually be able to figure out which is correct? What if one expert were deliberately trying to mislead you? And should the expert with the correct view just tell the whole truth, or will competition force them to throw in persuasive lies in order to have a chance of winning you over?
In other words: does ‘debate’, in principle, lead to truth?
According to Paul Christiano – researcher at the machine learning research lab OpenAI and legendary thinker in the effective altruism and rationality communities – this question is of more than mere philosophical interest. That’s because ‘debate’ is a promising method of keeping artificial intelligence aligned with human goals, even if it becomes much more intelligent and sophisticated than we are.
It’s a method OpenAI is actively trying to develop, because in the long-term it wants to train AI systems to make decisions that are too complex for any human to grasp, but without the risks that arise from a complete loss of human oversight.
If AI-1 is free to choose any line of argument in order to attack the ideas of AI-2, and AI-2 always seems to successfully defend them, it suggests that every possible line of argument would have been unsuccessful.
But does that mean that the ideas of AI-2 were actually right? It would be nice if the optimal strategy in debate were to be completely honest, provide good arguments, and respond to counterarguments in a valid way. But we don’t know that’s the case.
According to Paul, it’s clear that if the judge is weak enough, there’s no reason that an honest debater would be at an advantage. But the hope is that there is some threshold of competence above which debates tend to converge on more accurate claims the longer they continue.
Most real world debates are set up under highly suboptimal conditions; judges usually don’t have a lot of time to think about how to best get to the truth, and often have bad incentives themselves. But for AI safety via debate, researchers are free to set things up in the way that gives them the best shot. And if we could understand how to construct systems that converge to truth, we would have a plausible way of training powerful AI systems to stay aligned with our goals.
This is our longest interview so far for good reason — we cover a fascinating range of topics:
- What could people do to shield themselves financially from potentially losing their jobs to AI?
- How important is it that the best AI safety team ends up in the company with the best ML team?
- What might the world look like if several states or actors developed AI at the same time (aligned or otherwise)?
- Would artificial general intelligence grow in capability quickly or slowly?
- How likely is it that transformative AI is an issue worth worrying about?
- What are the best arguments against being concerned?
- What would cause people to take AI alignment more seriously?
- Concrete ideas for making machine learning safer, such as iterated amplification.
- What does it mean to say that a crow-like intelligence could be much better at science than humans?
- What is ‘prosaic AI’?
- How do Paul’s views differ from those of the Machine Intelligence Research Institute?
- The importance of honesty for people and organisations
- What are the most important ways that people in the effective altruism community are approaching AI issues incorrectly?
- When would an ‘unaligned’ AI nonetheless be morally valuable?
- What’s wrong with current sci-fi?
Get this episode by subscribing to our podcast on the world’s most pressing problems and how to solve them: type 80,000 Hours into your podcasting app. Or read the transcript below.
The 80,000 Hours podcast is produced by Keiran Harris.
















