#58 – Pushmeet Kohli on DeepMind’s plan to make AI systems robust & reliable, why it’s a core issue in AI design, and how to succeed at AI research

When you’re building a bridge, responsibility for making sure it won’t fall over isn’t handed over to a few ‘bridge not falling down engineers’. Making sure a bridge is safe to use and remains standing in a storm is completely central to the design, and indeed the entire project.

When it comes to artificial intelligence, commentators often distinguish between enhancing the capabilities of machine learning systems and enhancing their safety. But to Pushmeet Kohli, principal scientist and research team leader at DeepMind, research to make AI robust and reliable is no more a side-project in AI design than keeping a bridge standing is a side-project in bridge design.

Far from being an overhead on the ‘real’ work, it’s an essential part of making AI systems work in any sense. We don’t want AI systems to be out of alignment with our intentions, and that consideration must arise throughout their development.

Professor Stuart Russell — co-author of the most popular AI textbook — has gone as far as to suggest that if this view is right, it may be time to retire the term ‘AI safety research’ altogether.

With the goal of designing systems that reliably do what we want, DeepMind have recently published work on important technical challenges for the ML community.

For instance, Pushmeet is looking for efficient ways to test whether a system conforms to the desired specifications, even in peculiar situations, by creating an ‘adversary’ that proactively seeks out the worst failures possible. If the adversary can efficiently identify the worst-case input for a given model, DeepMind can catch rare failure cases before deploying a model in the real world. In the future single mistakes by autonomous systems may have very large consequences, which will make even small failure probabilities unacceptable.

He’s also looking into ‘training specification-consistent models’ and formal verification’, while other researchers at DeepMind working on their AI safety agenda are figuring out how to understand agent incentives, avoid side-effects, and model AI rewards.

In today’s interview, we focus on the convergence between broader AI research and robustness, as well as:

  • DeepMind’s work on the protein folding problem
  • Parallels between ML problems and past challenges in software development and computer security
  • How can you analyse the thinking of a neural network?
  • Unique challenges faced by DeepMind’s technical AGI safety team
  • How do you communicate with a non-human intelligence?
  • How should we conceptualize ML progress?
  • What are the biggest misunderstandings about AI safety and reliability?
  • Are there actually a lot of disagreements within the field?
  • The difficulty of forecasting AI development

Get this episode by subscribing to our podcast on the world’s most pressing problems and how to solve them: type 80,000 Hours into your podcasting app. Or read the transcript below.

The 80,000 Hours Podcast is produced by Keiran Harris.



As an addendum to the episode, we caught up with some members of the DeepMind team to learn more about roles at the organization beyond research and engineering, and how these contribute to the broader mission of developing AI for positive social impact.

A broad sketch of the kinds of roles listed on the DeepMind website may be helpful for listeners:

  • Program Managers keep the research team moving forward in a coordinated way, enabling and accelerating research.
  • The Ethics & Society team explores the real-world impacts of AI, from both an ethics research and policy perspective.
  • The Public Engagement & Communications team thinks about how to communicate about AI and its implications, engaging with audiences ranging from the AI community to the media to the broader public.
  • The Recruitment team focuses on building out the team in all of these areas, as well as research and engineering, bringing together the diverse and multidisciplinary group of people required to fulfill DeepMind’s ambitious mission.

There are many more listed opportunities across other teams, from Legal to People & Culture to the Office of the CEO, where our listeners may like to get involved.

They invite applicants from a wide range of backgrounds and skill sets so interested listeners should take a look at their open positions.


Highlights

If you think about the history of software development, people started off by developing software systems by programming them by hand and sort of specifying exactly how the system should behave. We have now entered an era where we see that instead of specifying how something should be done, we should specify what should be done. For example, this whole paradigm of supervised learning where we show examples to the machine or to the computer, that for this input you should provide this output and for this input you should provide that output. You’re telling the machine what you expect it to do rather than how it should do it.

It has to figure out the best way to do it. But part of the challenge is that this description of what you want it to do is never complete, it’s only partial. This is a partial specification of the behavior that we expect from the machine. So now you have trained this machine with this partial specification, how do you verify that it has really captured what you wanted it to capture, and not just memorized what you just told it? That’s the key question of generalization, does it generalize? Does it behave consistently with what I had in mind when telling it, when giving it 10 correct examples. That is the fundamental challenge that all of machine learning is tackling at the moment.

Suppose you are trying to test a particular individual, you are interviewing them, you ask them a few questions and then you use the answers and how they performed on those questions to get a good sense of who they are. In some sense you are able to do that because you have some expectation of how people think, because you yourself are human.

But when you are reasoning about some other intelligence — like a bird — then it becomes trickier. Even though we might share the same evolutionary building blocks for reasoning and so on, the behavior is different. So that comes to the question now, if there’s a neural network in front of you and you are asking it questions, you can’t make the same assumptions that you were making with the human, and that’s what we see. In ImageNet you ask a human, “What is the label of this image?” And even experts are not able to identify all the different labels because there a lot of different categories and there are subtle differences. A neural network would basically give you a very high accuracy, yet you slightly perturb that image and suddenly it will basically tell you that a school bus is an ostrich.

We are trying to go beyond that simple, traditional approach of taking a few questions and asking those few questions. What we are thinking about is, can we reason about the overall neural network’s behavior? Can we formally analyze it and we see what kinds of answers it can give, and in which cases does the answer change?

Optimization is used in a general context for various different problems in operations research and game theory, whatever. Optimization is key to what we do. Optimization is a fundamental technique that we use in safety work to improve how our systems conform to specifications. We always are optimizing the performance of our systems, not to produce specific labels, but to conform to the more general problem or to conform to these general properties that we expect. Not to the simple properties of, well for this input it should be this output. That’s a simple a property, but you’re going to have more sophisticated properties and in traditional machine learning you are going to optimize consistency with those simple properties on reduce your loss or empirical risk and in our case, we are reducing our loss or reducing the risk of inconsistency with the specifications.

Related episodes

About the show

The 80,000 Hours Podcast features unusually in-depth conversations about the world's most pressing problems and how you can use your career to solve them. We invite guests pursuing a wide range of career paths — from academics and activists to entrepreneurs and policymakers — to analyse the case for and against working on different issues and which approaches are best for solving them.

Get in touch with feedback or guest suggestions by emailing [email protected].

What should I listen to first?

We've carefully selected 10 episodes we think it could make sense to listen to first, on a separate podcast feed:

Check out 'Effective Altruism: An Introduction'

Subscribe here, or anywhere you get podcasts:

If you're new, see the podcast homepage for ideas on where to start, or browse our full episode archive.