#23 – Jan Leike on how to become a machine learning alignment researcher

By Robert Wiblin and Keiran Harris · Published March 16th, 2018 ·

#23 – Jan Leike on how to become a machine learning alignment researcher

By Robert Wiblin and Keiran Harris · Published March 16th, 2018

Want to help steer the 21st century’s most transformative technology? First complete an undergrad degree in computer science and mathematics. Prioritize harder courses over easier ones. Publish at least one paper before you apply for a PhD. Find a supervisor who’ll have a lot of time for you. Go to the top conferences and meet your future colleagues. And finally, get yourself hired.

That’s Dr Jan Leike’s advice on how to join him as a Research Scientist at DeepMind, the world’s leading AI team.

Jan is also a Research Associate at the Future of Humanity Institute at the University of Oxford, and his research aims to make machine learning robustly beneficial. His current focus is getting AI systems to learn good objective functions in cases where we can’t easily specify the outcome we actually want.

How might you know you’re a good fit for this kind of research?

Jan says to check whether you get obsessed with puzzles and problems, and find yourself mulling over questions that nobody knows the answer to. To do research in a team you also have to be good at clearly and concisely explaining your new ideas to other people.

We also discuss:

Where do Jan’s views differ from those expressed by Dario Amodei in episode 3?
Why is AGI alignment one of the world’s most pressing problems?
Common misconceptions about artificial intelligence
What are some of the specific things DeepMind is researching?
The ways in which today’s AI systems can fail
What are the best techniques available today for teaching an AI the right objective function?
What’s it like to have some of the world’s greatest minds as coworkers?
Who should do empirical research and who should do theoretical research
What’s the DeepMind application process like?
The importance of researchers being comfortable with the unknown.

The 80,000 Hours podcast is produced by Keiran Harris.

Highlights

In the longer term, the vision for this project is that we’re thinking about when we actually do build AGI, what would be the objective function? The idea here is that this is kind of a small step into the direction of learning what humans value or what you would want, say, a household robot that you buy, what you want them to do, and in a way that you don’t need to be an expert in reinforcement learning, but you can just give feedback in other forms, like really easy to do for humans.
I think, ultimately, we want to take feedback in a way that humans want to give it. Right now, what we do is we have two video clips, and you kind of say is the left better, is the right better, or are they kind of the same. But, I think it will be great if we have better ways of giving feedback.

There’s lots of interesting projects going on in DeepMind, and one of the perks of working at DeepMind is that you kind of get to see them as they unfold. So, last year, there was a lot of stuff going on with AlphaGo. Most of that happened before I was even working there. But, right now, a lot of people are really excited about StarCraft, and we recently released a research environment for that so that other people can also work on that. Yeah, I think there’s lots of other really exciting things going on.

I think, overall, something that is really important is that you should be comfortable with navigating a space that you don’t really understand very well, because researchers kind of necessarily are on the frontier of human knowledge and things that we understand, so you have to be comfortable with the unknown. In some ways, this is in contrast to the skills that an undergraduate degree selects for, where you’re really basically learning about things that we understand well, and then it’s more you need to be able to remember them and you need to be able to understand them quickly rather than dealing with the unknown.

Articles, books, and other media discussed in the show

AI policy and strategy
DeepMind jobs
DeepMind entry on Wikipedia
OpenAI jobs
Open AI entry on Wikipedia
Deep reinforcement learning from human preferences a collaboration from DeepMind and OpenAI researchers Paul Christiano, Jan Leike, Tom B. Brown, Miljan Martic, Shane Legg, and Dario Amodei.
Learning through human feedback
Concrete Problems in AI Safety by Dario Amodei et al. Detailed interview about this paper with the Future of Life Institute.
Two Giants Of AI Team Up To Head Off The Robot Apocalypse – Wired Magazine
Central limit theorem
Eigenvalues and eigenvectors
Conference on Neural Information Processing Systems (NIPS)
International Conference on Machine Learning (ICML)
International Conference on Learning Representations (ICLR)
Montreal Institute for Learning Algorithms (MILA)
Guide to working in AI policy and strategy
Stanford University’s Machine Learning on Coursera

Transcript

Robert Wiblin: Hi listeners, this is the 80,000 Hours Podcast, the show about the world’s most pressing problems and how you can use your career to solve them. I’m Rob Wiblin, Director of Research at 80,000 Hours.

Today’s episode is about how to have a career as a machine learning researcher focussed on ensuring AI systems do what we intend them to do. It builds on and is in part a response to episode 3 with Dr Dario Amodei, a machine learning researcher at OpenAI.

If you haven’t listened to that episode I’d recommend doing so first as it explains the broader issue and will make this interview make a whole lot more sense.

If you have little interest in machine learning or artificial intelligence, you should feel free to skip this episode, as our goal was to get into specifics.

Effective Altruism Global is the main conference for people involved in the discipline of doing as much good as possible. The next event is in San Francisco on the second weekend of June and most of the 80,000 Hours team will be there.

The goal is to increase people’s knowledge, skills and network to enable them to have more social impact. If you like this show you’ll likely enjoy EA Global as well. I’ve seen a lot of people benefit hugely from coming along, so you should definitely think about going.

The organisers want to choose attendees who can get the most out of the event, and so are more likely to accept your application if you’re already familiar with the key ideas of effective altruism and looking to master more complex issues, learn new skills, or get help from the community.

If you’re fairly new to effective altruism it’s usually best to first join a community-hosted EAGx event – this year they’re on in Australia, Europe, and the US east coast.

You can apply at eaglobal.org and if you do so before the 18th of March you can save a bunch of money with early-bird tickets.

Without further ado, I bring you, Dr Jan Leike.

Robert Wiblin: Today, I’m speaking with Jan Leike. Jan is a research scientist at DeepMind in London, and a research associate at the Future of Humanity Institute at the University of Oxford. His research aims to make machine learning robust and beneficial, so he works on questions like: how can we design or learn a good objective function, and how can we make machine learning more robust?

Thanks for coming on the podcast, Jan.

Jan Leike: Hey, Rob. Good to be here.

Robert Wiblin: We’ll get to how people can prepare themselves to do work that’s similar to yours. But, first, you recently started working at DeepMind, what are you helping them with there?

Jan Leike: I’m part of the technical AI safety team at DeepMind. I’m basically looking at questions, technical questions, regarding to making AGI safe.

Robert Wiblin: So, 80,000 Hours is all about getting people working on the most important, neglected and solvable problems in the world, so why do you think what you’re working on is one of the most pressing problems that humanity faces?

Jan Leike: AI has the potential to be a powerful technology that we can use to make a lot of positive impact in the world. AI and machinery, in particular, have recently been undergoing a period of rapid improvement, and I expect that they will continue to do so. If this is the case, then we can use it to make progress on lots of problems, pressing global problems like global poverty, animal suffering and others.

But, with any new technology, there’s risks that we should understand beforehand so that we can navigate the space and use the technology wisely. This is what AI safety is about, and I’m working on the technical side of these problems.

Robert Wiblin: What are some of the problems that we might face in the future if we don’t prepare for them ahead of time?

Jan Leike: The classical kind of scenario that people like to describe is that you’re building a very powerful artificial intelligence, and you give it some objective function that is maybe easy to specify, but not actually what you care about. Then, this AI just ends up optimizing really hard, and you get what you specified but not what you wanted. This is one of the things that my research focuses on: how can we get a good objective function into a machine?

Robert Wiblin: So, risks from artificial intelligence and AI safety has been spoken about a lot in the media recently. What are some common misconceptions that people have about the field?

Jan Leike: So, you’re referring to the Zuckerberg/Musk debate and things like that?

Robert Wiblin: Yeah, for example.

Jan Leike: Yeah. I think there’s a lot of unhelpful polarizing going on in the public domain. There’s people on the one hand saying we should really focus on these near time issues like self driving cars and unemployment, and on the other side there’s people who really care about the far future, and, in most case, tend to be very alarmist about it. I think just polarizing the space is kind of unhelpful. I think what we really need is like have well reflected, sane, and informed debates about it, especially when we think about building AGI and something really powerful.

There’s lots of decisions that we have to make as a society, how do we deal with that, what kind of implications does that have, and in order to really do that productively, I think we also have to give the public a better understanding of what’s really going on in AI.

Robert Wiblin: What do you call the problem within DeepMind? I mean, I refer to the issue as AI safety, but is there another term that actual experts use?

Jan Leike: Yeah. There’s a bunch of terms that are kind of related, that people use. Things like the alignment problem, or AI alignment, things like AI strategy and policy. I think AI safety is a term that we are unfortunately stuck with. I don’t think it’s the greatest term because it has this kind of implicit connotation, that somehow artificial intelligence research is unsafe otherwise, which is not true. But, it’s kind of an established term at this point.

Robert Wiblin: What are some of the details of the specific issues that you and your colleagues are researching at DeepMind?

Jan Leike: We just recently released a paper together with OpenAI called Deep Reinforcement Learning from Human Preferences, where, essentially, we train neural network to learn a reward function for an agent to maximize. Using our approach, you can basically learn any … You can teach them, the agent, any arbitrary objection function that you have in mind.

In our case, we taught a small robot to do a backflip, which is really hard to do if you have to hand specify what function, if you want to make it look … In our case, all you need to do is really look at a bunch of video clips of the agent’s behavior, and kind of rank them based on how much it looks like a backflip. This is, as a human, is kind of a lot easier to do than, say, doing a backflip yourself.

Robert Wiblin: So, what’s the hope about how this process will help?

Jan Leike: In the short term, this is just useful to just solve new problems that were kind of difficult to solve before. I’m here talking about things like the backflip, but that are actually useful. But, in the longer term, the vision for this project is that we’re thinking about when we actually do build AGI, what would be the objective function? The idea here is that this is kind of a small step into the direction of learning what humans value or what you would want, say, a household robot that you buy, what you want them to do, and in a way that you don’t need to be an expert in reinforcement learning, but you can just give feedback in other forms that are like really easy to do for humans.

Robert Wiblin: So, you just up vote it and down vote it, or say this is more like what I want or less like what I want than something else? Is it like being at the optometrist when they’re testing your eyes?

Jan Leike: Right. I mean, I think, ultimately, we want to take feedback in a way that humans want to give it. Right now, what we do is we have two video clips, and you kind of say is the left better, is the right better, or are they kind of the same. But, I think it will be great if we have better ways of giving feedback. Maybe you watch a video and you say, “Oh, this part looks really good. This part looks kind of bad,” but, overall, you don’t have strong views about what happens in the video.

Robert Wiblin: I interviewed Dario Amodei at OpenAI a few months ago, and he mentioned the backflipping noodle and how you’d go on about training it. So, this was a collaboration between DeepMind and OpenAI, right?

Jan Leike: That’s correct.

Robert Wiblin: Yeah. Is this one of many projects that you guys work on together?

Jan Leike: We just started the collaboration on technical AI safety with them, and we’re currently doing more follow up work that we collaborate on. My hope is that we can find more projects in the future that we can collaborate on. Overall, my wish would be to see Open AI and DeepMind working more closely together.

Robert Wiblin: So, coming back to this reinforcement learning process with the backflipping noodle, what are some ways that even that training system might still fail and the machine learning algorithm might do things that we didn’t intend?

Jan Leike: Yeah. I think that’s a very good question, and this is the kind of thing I’m very interested in because I want to know about all the ways in which things can fail and think about them ahead of time.

In the case of that project, one thing that we noticed was that if we don’t give feedback online, meaning just interactively while the system is learning, you can run into degenerate solutions where, basically, the reward predictor, or the component that learns the reward function, you stop giving it feedback and then, as the agent continues to learn, the state distribution changes and then the reward predictor has to do well on the distributional shift. So, it’s kind of seeing a new input it hasn’t exactly seen before, and then it starts mis-predicting the reward.

What we actually saw is that there were same cases where the agent just learns really weird things that you didn’t intend, but they kind of look according to the reward predictor because the reward predictor doesn’t really know what it’s doing in the new distribution.

Robert Wiblin: So, if I can try to put that into language that even I would understand, basically, you have people offering feedback to a machine learning algorithm that is then going to try to predict what humans would say in other cases. Is that right?

Jan Leike: Yep.

Robert Wiblin: And what happens is, if humans stop providing information to that process, and then the kinds of noodle movements that it’s trying to score move outside the distribution of what it’s familiar with and what it has experienced with humans rating, then it will just start giving kind of nonsense answers ’cause it no longer has any basis in … It doesn’t have relevant human answers to draw on, and so the answers just start breaking down and you’ll get random changes basically, in the backflipping style.

Jan Leike: Yeah. This is something that neural networks are actually really bad at. They don’t have good confidence intervals, they’re not good at specifying their own uncertainty, so when you throw them into a new problem that isn’t what they’re trained on, it’s not like they back off and say, “Oh, I don’t know what to do here.” They just still give very confident answers.

In our case, you’re still assigning rewards to whatever happens, and you don’t even realize that you shouldn’t be doing this.

Robert Wiblin: So, we described the training process for this backflipping noodle on the Dario podcast, but maybe just to refresh our memories, what’s the novel insight here?

Jan Leike: I just love the term ‘backflipping noodle’, by the way. Yeah, so it’s a three part process. One part is a human looking at video clips and kind of ranking them. The second part if what we call a reward predictor, which basically learns how the human would rank different behavior and turns that into a reward that is assigned to that particular behavior. This third part is just a regular reinforcement learning algorithm that tries to maximize reward, and thus, basically, tries to maximize what the reward predictor thinks the human wants.

Robert Wiblin: So, the concern would be that if we tried using this approach in a real life situation, and humans weren’t called on to give actual scores of how they rate different things that the robot was doing often enough, or if the situation changed such that humans would give different scores, then the robot could end up very confidently doing things that were not at all what we would like because it just doesn’t understand the new situation that it’s found itself in.

Jan Leike: Yeah. This is exactly the sort of questions that we’d love to understand better, like how much feedback do we exactly need? Can we give less feedback over time, which is kind of what we did in the work that we’re talking about. But, it also depends on how the environment changes. How can you know when you should ask for more feedback, or what kind of parts of your behavior should you ask for feedback for?

Robert Wiblin: Isn’t there a way of just getting it to realize that it’s now scoring behaviors that are quite different from what it’s seen before?

Jan Leike: So, I mean, I guess you could insert some kind of anomaly detection mechanism into this, right? Anomaly detection is a problem class that machine learning has thought about for quite a while.

We haven’t tried doing that. I think that might be a good thing to try. I’m not quite clear on how well anomaly detection works right now, and how well it scales.

Robert Wiblin: So, do you see this as a big step forward, or is it just one of many things that we’re going to have to try out before we figure out how to really make machine learning safe for important applications?

Jan Leike: I see this as a kind of small step into a direction that wasn’t explored very much before. So, I think it’ll be very useful once we do more follow up work and other people start building on this. I think, overall, there are so many other problems we also want to think about and consider, like how can we explore safely? How can we maximize reward while kind of implicitly regularizing with side effects, to encourage your agent to not cause unnecessary side effects, and what does that even mean?

Or, more broadly, there’s lots of other questions that are, I think, really important, that, right now, very few people are really thinking very hard about, and some of them are becoming more popular. Machine learning security is one of them. There’s these very visceral examples where you take an image and you just perturb it very minimally, that you can’t even see it with your own eyes. But, a just regular [inaudible 00:12:39] that was trained to classify the image now just vastly changes the classification that it gives for the image. That doesn’t only apply to image classification, but more generally, if you have deep neural network and then you only need to perturb the input minimally to change the output in ways that you wouldn’t participate.

Robert Wiblin: Do you have to have a copy of the underlying neural net process to figure out how to create these images that are slightly different, but get classified as a completely different kind?

Jan Leike: This is one of the striking things about this, that you don’t even need that. So, instead, what you could do is you could train a neural network from scratch, on the same data set, possibly using a slightly different architecture, and then attack your own neural network that you just trained. The input perturbation that you get out of that then also transfers to other models. This is what is called the black box attacks. These black box attacks work surprisingly well, even if you look at deep reinforcement learning where people train reinforcement learning agents with deep learning, and then you can even train an agent with an entirely new algorithm, or with an entirely algorithm.

So, say you train DQN to perform some task, and I’m trying to attack your DQN, so I just train another algorithm [inaudible 00:14:02] and attack that, and the input still transfers to your problem setting. I think that’s something that we should really figure out how to fix.

Robert Wiblin: It’s a bit surprising.

Jan Leike: It is. It was very surprising to me.

Robert Wiblin: I saw a paper recently claiming that it would be very difficult to use these kind of distorted image attacked against self driving cars because they usually only work from one particular angle or one particular level of zoom, and because cars are moving quite quickly, they get much more of an average of what something looks like from a lot of different positions. Do you know anything about that?

Jan Leike: I think there was some recent research that showed that you don’t have to do much to the stop signs to actually fool a classifier. I mean, given all the other things that I talked about, that kind of came previously, at this point it’s not so surprising. So, the bottom line story of machine learning security at the moment is that it’s really easy to attack, and there’s lots of different ways you could attack. We don’t really have very good defense strategies yet.

It’s kind of reminiscent of early days of the internet or something, where everyone had all their ports open or something, and it was easy to attack software. We’re just slowly getting better at that.

Robert Wiblin: So, what are some approaches that you could take to make deep reinforcement learning more robust?

Jan Leike: I think there’s lots of interesting questions to explore in this space because deep reinforcement learning is really just about general purpose agents interacting with the environment, and you can think of lots of different robustness questions that come up in this context.

For example, safe exploration. How can I explore my environment that I initially don’t really know much about in a safe way, so I don’t make any irreversible decisions or nothing bad happens while I do that? Or, for example, side effects that I mentioned earlier. How can we make sure that while you’re maximizing reward, you also kind of try not to disturb your environment unnecessarily that much. There’s questions on machine learning security that I mentioned earlier, where somebody attacks deep reinforcement learning agent and tries to get it to do certain things, and how can you defend against that.

Another thing is that deep reinforcement learning algorithms are known for being notoriously unstable, and-

Robert Wiblin: Unstable in what way?

Jan Leike: So, for example, you train your deep RL agent on 10 different random seeds, and the variants of the performance you get out of that, in the end, can be quite large. I think it’ll be good to just make it more stable in the way that you can have more reliable performance.

Robert Wiblin: So, more frequently converges on the same good level of performance?

Jan Leike: Right. So, instead of just trying to get the highest level performance for all the random seeds that you take, instead you want to have an algorithm where the final outcome doesn’t really change that much and it just reliably performs.

So, these are all important and interesting questions, where we can do cutting edge research. In a way, I think working on AI safety is exciting because it’s not a very well established field yet. There’s lots of different low hanging fruit that are just really ripe to pick, and if you’re in this field now, you could be one of the people who picks them. It’s all up for grabs.

Robert Wiblin: Are there any other interesting problems that other people at DeepMind are working on, that you’re able to talk about?

Jan Leike: I mean, there’s lots of interesting projects going on in DeepMind, and one of the perks of working at DeepMind is that you kind of get to see them as they unfold. So, last year, there was a lot of stuff going on with AlphaGo. Most of that happened before I was even working there. But, right now, a lot of people are really excited about Starcraft, and we recently released a research environment for that so that other people can also work on that. Yeah, I think there’s lots of other really exciting things going on.

Robert Wiblin: So, what is your work like on a day-to-day basis?

Jan Leike: Lots of different things. I spend a lot of time just reading archive papers, sit in meetings, talk to other researchers about their research. I get some time to just sit down and think about problems, and figure out what would be the next good things to work on, like plan that. Talk to research engineers and what they’re working on, on the implementation side, things like that.

Robert Wiblin: I guess your colleagues are insanely smart, right?

Jan Leike: Yeah. It’s really exciting. DeepMind has really a whole bunch of the world experts in various topics, and you get to ask them questions.

Robert Wiblin: Is that a perk of the job, or is it a bit threatening having some of the world’s greatest minds around you, competing with you?

Jan Leike: It’s kind of both. I mean, they’re not exactly competing with me.

Robert Wiblin: Yeah. You’re all working together. I suppose it’s enjoyable to have very smart people to have good conversations with, and know that you’re kind of at the best organization, at the forefront of the research project.

Jan Leike: Yeah, definitely.

Robert Wiblin: So, is DeepMind a fun place to work, or is it just extremely quiet and very studious?

Jan Leike: No. We have this really big open office, shared space, and there’s lots of conversations that kind of naturally arise. So, at my desk, I have two famous professors just sitting right across from me, and you just randomly start interacting with them. There’s people meeting in like micro kitchens and cafes, and then we have, every Friday, there’s like a party with drinks and pizza and everything.

Robert Wiblin: Do you have a lot of other people working with you on the safety topics, or is it just a handful?

Jan Leike: Right now, we don’t have that many people yet. I think there’s really a great opportunity for someone who wants to build their career to focus on these kind of things, these kinds of problems. We collaborate with OpenAI and we collaborate with the Future of Humanity Institute, so there’s lots of collaborating going on. But, yeah, I really wish there would be more people working on these problems.

Robert Wiblin: So, what path in your career did you take to end up working at DeepMind? It sounds like you’re German, right?

Jan Leike: Yes, that’s correct. I did my undergraduate degree in Germany, in Freiburg. I studied math and computer science, I finished a master’s in computer science, and then I went to Australia, to ANU, in order to do a PhD in machine learning.

Robert Wiblin: Did you work with Marcus Hutter? I actually did my undergrad degree at ANU, so I imagine he was your supervisor?

Jan Leike: Yeah, that’s right.

Robert Wiblin: Cool. Did you have a good time in Canberra?

Jan Leike: Canberra is pretty good to make you focused on your productivity, I would say.

Robert Wiblin: That’s a very polite way of putting it. What did you do after ANU? Did you then go to the Future of Humanity Institute?

Jan Leike: Yes, that’s right. I was a postdoc there for about six months before I joined DeepMind.

Robert Wiblin: So, which decisions in your career do you think you made the right calls on?

Jan Leike: I think getting into a machine learning PhD at the time that I did was a very good decision. I really liked working with my supervisor, Marcus Hutter, and I think he taught me a lot of things. The reason for me to go to FHI when I did was that I thought I wanted to focus more on theoretical research, because that was kind of the focus of my PhD. So, now I’ve changed my mind and I think that empirical research is more valuable in this space, and then DeepMind is a better place for me to be.

Robert Wiblin: What changed your mind about that?

Jan Leike: A number of things. It seems, right now, the empirical work is kind of under-explored in this space, and there’s lots of things to do. It seems like just easier to approach it that way. I think people should do theory and figure out AI safety problems using theoretical methods, it just seems harder for … To me, it seems harder to do.

Robert Wiblin: Did you ever consider doing policy work like your previous colleague at the Future of Humanity Institute, Miles Brundage, or was just technical research clearly a benefit for you?

Jan Leike: I think there’s lots of really important and interesting questions to be solved in policy research, things regarding autonomous weapons and autonomous hacking, and all of this, and people should really be thinking about them. I think, in my case, I have a competitive advantage of working on the technical side of things because my background is very technical. I don’t really know anything about policy and international relations.

Robert Wiblin: What was the application process for getting a job at DeepMind like?

Jan Leike: My case is somewhat atypical because, at the time when I applied, the technical safety team was very small, so they were just building it up. I had interviews with all three of the DeepMind founders. Usually, also part of the interview process is the DeepMind quiz, where they just ask you lots of different questions about computer science, math and statistics, and machine learning.

Robert Wiblin: Yeah, as I mentioned, a couple of months ago I spoke with Dario Amodei on the podcast. He’s someone you know and you’re collaborating with at Open AI. Was there any things that he said that you disagree with or have a different perspective on?

Jan Leike: Yeah. I agree with Dario on almost all of the things. There was one thing where I would have answered the question differently, and this is when you asked Dario how to figure out whether you’re a good fit for research. I think Dario’s answer was that you should just take some recent papers and implement the models, and do that very quickly, and see how quickly you can do it and whether it’s fun to you, and whether you can replicate the results.

I think I would have answered that question in a way that puts more emphasis on other parts of research. Good indicators of whether you’re a good fit for research are kind of that research feels fun and easy to you, and it just kind of happens naturally. So, you just end up thinking about research questions during your down time, the kind of questions that you don’t know the answer to and maybe nobody knows the answer to, and you tend to get obsessed with puzzles and problems. Also, that you’re good at explaining your thoughts, or novel thoughts, to people who don’t understand them yet – clearly and concisely. So, in a way, it’s like a combination of having ideas, reading literature, executing a project, and presenting it to other people.

I think in terms of implementing models … So, at DeepMind, we have research engineers whose job it is to work with researchers together on the implementation side. So, they tend to be really, really good at implementing models, and tweaking them and making them work, and that kind of frees researchers’ time to think about the more high level and conceptual question. If you’re working on theoretical research, then being really good at implementation is obviously less important for you.

Robert Wiblin: For the second half of the episode, let’s move onto the issue of personal career choice and how listeners can potentially make a contribution to solving this problem themselves.

So, we both think that being an AI safety researcher is one of our highest impact things that people can potentially do, but it’s also potentially very, very difficult work. What’s the lower bound of maths ability that would allow someone to make a useful contribution?

Jan Leike: I think that depends on what kind of approach you’re taking. If you’re doing theoretical work, then you should really be really, really good at math. If you’re doing more empirical work, then you should understand the fundamental math fields like linear algebra, and also statistics, and so on. So, things like what is the central limit theorem, what is an eigenvector, like solve this integral, stuff like that. But, on the empirical side, you usually don’t have to be really good at proving theorems and stuff like that.

Robert Wiblin: What kind of thinking is most important to be able to engage in?

Jan Leike: The kind of thinking that is really useful is critical thinking. So, say you’re reading a research paper, ask yourself what is good about this paper, what should be done better, how could you extend it, then kind of be able to point to the weaknesses of a particular research output. This is also then useful when you’re writing your own research ’cause you should know where your own research has weaknesses and where you could do better and how do you extend it, so that if you have more time to work at it, like what to focus on. In a way, doing research is kind of like training GAN. You have to have a good discriminator to kind of figure out whether what you’re doing is good, and then you can train your generator to generate that stuff.

Another type of thinking is … Of course, it involves a lot of theoretical thinking. You need to have good intuitions about numbers. You have to have intuitions about math and algorithm, like what things are expensive in terms of compute, and you have to be able to code. As a researcher, you don’t necessarily end up coding so much, but you have to be able to do it and understand how to do it.

I think, overall, something that is really important is that you should be comfortable with navigating a space that you don’t really understand very well, because researchers kind of necessarily are on the frontier of human knowledge and things that we understand, so you have to be comfortable with the unknown. In some ways, this is in contrast to the skills that an undergraduate degree selects for, where you’re really basically learning about things that we understand well, and then it’s more you need to be able to remember them and you need to be able to understand them quickly rather than dealing with the unknown.

Robert Wiblin: If someone’s already familiar with machine learning, if they already have some training in it, what kinds of things can people try to see if it’s a good fit for them?

Jan Leike: If you already know how to do research in machine learning, then you should have all the skills that you need to do research in AI safety. Really, these are not like two different things. There’s not AI and AI safety. They are both AI questions, or they’re both machine learning questions. So, you’ll approach them with the same tools, you’ll think about them in the same way. It’s really the same problems.

Robert Wiblin: And if people don’t know much about machine learning, how can they figure out if this is a sensible path for them to go further down?

Jan Leike: So, there’s various aspects with this. There’s the question of how excited are you about machine learning? How excited are you on working on that? Do you want to work more on the research side, or do you want to work more on the implementation side, stuff like research engineering, and how can you figure out whether you have each of these skills?

There’s a lot of resources online that help you go through tutorials and implement various deep learning models, so that’s one thing if you want to work on the implementation side. I think if you want to work on the research side, it’s usually good to get a PhD, or some kind of equivalent, experience in order to really skill up on the research skills.

So, how do you know if you’re a good fit for that? Well, you should just work on a research project with your supervisor at university, or in an internship say, and see how well that goes, like how much fun you have doing that.

Robert Wiblin: So, we’ll talk about the PhD in a minute, but what would be the ideal undergraduate degree for someone who’s thinking about working in AI safety in future?

Jan Leike: I think the perfect undergraduate degree is computer science and mathematics. If your undergraduate degree is in another quantitative subject, like physics or something, that’s also fine. But, really, you should know all the fundamentals, like linear algebra, calculus, coding, algorithms, machine learning, deep learning, reinforcement learning, statistics, and so on.

Generally, it’s good to prioritize harder courses over easier ones. If you have to say choose between a math course or an applied course, then I would recommend the math course, even if it doesn’t seem that related to what you actually want to do. I think it’s generally a good idea to start doing research early and try to publish at least one paper before you finish your degree, your master’s degree, because that would put you in a much better position when you actually apply for PhDs, because that’s kind of proof that you’re able to do research, and people can evaluate how good you’ll be based on that.

It’s also good to find a supervisor that is not necessarily the most famous supervisor, ’cause if they’re really famous, they usually don’t end up having much time for you, but somebody who’s just really good at supervising so you can learn a lot from them and get a lot of feedback. That way, you also have an easier time to find out whether you’re a good fit.

Robert Wiblin: Speaking of which, DeepMind doesn’t exactly have open offices that anyone can just walk into and meet everyone. If someone’s, you know, they’ve finished an undergraduate degree and they have an interest in this whole area, is there any way that they can go about meeting people who are working in the field, potentially at, say, a conference on the topic?

Jan Leike: Yeah. I would recommend going to one of the top machine learning conferences. These are ICML, NIPS, ICLR, and so on. Ideally, you can look at the papers that are coming out, you can look at them online, and you can go through them and find the papers that you find interesting. It makes sense to read them in detail beforehand, so when you actually meet these people, you can ask smart questions about them rather than asking the questions that everyone asks.

Robert Wiblin: So, it sounds like you’re pretty enthusiastic about people doing machine learning PhDs. Are there any alternatives to doing a machine learning PhD, that are worth mentioning?

Jan Leike: Yeah. So, a PhD is usually the requirement for someone to hire you as a researcher. There are some exceptions for that, and there’s other routes that give you a similar kind of experience. The Google Brain Residency is one example of that. There’s also people who start their PhD, and then they kind of stop midway through and go work in one of the industry labs. For some positions like research engineering, a PhD is not strictly required. You basically have to be really good at coding and understand machine learning and follow the research.

The reason why I would recommend people to get a machine learning PhD, if they’re in a position to do so, is that this is kind of where we are currently the most talent constrained. So, at DeepMind, and for the technical AI safety team, we’d love to hire more people who have a machine learning PhD or equivalent experience, and just get them to work on AI safety. The problem is that there’s not enough people who have that required background and are also excited about working on AI safety, so that we could hire more.

Robert Wiblin: To what extent do you think people should go down traditional machine learning paths and work with people like you at DeepMind, versus try to do AI safety research on their own or among groups of like minded people outside of an institution like DeepMind?

Jan Leike: Yeah. I think this is quite important, because I think if people try early on to be independent researchers in AI safety, they tend not to be very successful with that. I think going through one of these more established paths to getting the required research skills is a really good idea. So, in a way, a PhD is not actually the thing we care about, but it’s more a package of a number of related skills that are really useful if you want to be a researcher.

Robert Wiblin: Do people face any trade offs in deciding when to switch from the most mainstream machine learning research questions to working on the safety topics that you’re particularly interested in?

Jan Leike: Yeah. That should happen, basically, when you feel confident that you have the required skills. That could be, for example, in the later years of your PhD, if you’re doing a machine learning PhD, or it could be after you’ve finished your PhD and you go somewhere else after that. I think, usually, I see people who overemphasize going into that, into AI safety research, as quickly as they can, over skill building. Sometimes people are really concerned about working on capability research, where that is meant to be something that is not safety research, but I think these concerns are overemphasized and people shouldn’t worry too much about them, and rather really just focus on their career capital.

Robert Wiblin: Where might someone work before they’re ready to apply to DeepMind? Is that a possibility, that someone could do a PhD, but not yet be ready to work at DeepMind? Are there any intermediate steps that they can take to bridge the gap?

Jan Leike: Yeah. I think there’s lots of interesting things to do. There’s machine learning internships. I think a good place to do an internship is MILA in Montreal. There’s various machine learning startups where you might work, you might do a postdoc in industry. I think there’s a lot of options.

Robert Wiblin: If someone skills up in machine learning, but then either can’t find a job in AI safety research or they decide that that’s no longer what they want to do, what other kind of options do they have for either having a good life or doing a lot of good for the world?

Jan Leike: Yeah. I think, right now, if you have a PhD in machine learning, you’re really hot shit. There’s lots of people who just want to throw money at you to be around them, and make cool machine learning things happen. There’s lots of different industry labs that you could join. In particular, if you want to use machine learning in a more applied setting, to do good in the world, there’s lots of projects that you could join, there are lots of companies. There’s stuff going on at DeepMind for using machine learning for health, for example. But, yeah, I think right now, if you have that kind of degree, you wouldn’t have problems to find a really well paying, comfortable job.

Robert Wiblin: Yeah. Thinking a bit outside the box for a minute, are there any options in other areas, like going into politics, or any other ways that people with machine learning experience might be able to help to deal with AI safety issues?

Jan Leike: Yeah. So, there’s a lot of questions around AI that come up in the policy, government space, that I think we need to deal with and face, and having people in these institutions who really understand the technical details of what’s going on in machine learning is going to be really helpful to have really informed discussions about this.

So, I think this is something that people with a technical background should also really seriously consider, because I think there’s a lot of great potential for impact there. I know you had Miles Brundage on the podcast just a while ago, and he’s written this excellent guide on how to do AI policy.

Robert Wiblin: Yeah. We’ll stick up a link to that in the notes on the episode, so people can also find out about the politics and policy and strategy side of AI safety.

Another approach that people might take is trying to earn to give. So, if they have an ability to make quite a lot of money, they might go do that and then try to donate it to people or organizations or projects that they think will help to make artificial intelligence more safe. Do you think that that’s a sensible approach for people to take?

Jan Leike: So, at the moment, I would say that the talent gap in technical AI safety is just so much larger than the funding gap, that I wouldn’t really worry about putting more money into this area. We need people to solve these problems.

Robert Wiblin: Yeah. Are there any other technical approaches to AI safety, other than doing machine learning research?

Jan Leike: Yeah. There are some other approaches, most notably the agenda for highly reliable agent design, that came out of the Machine Intelligence Research Institute. For that kind of work, your background would probably be a PhD in logic, or something like that.

Yeah. My impression is that the talent gap for machine learning PhDs is much more severe, and I expect that in terms of the number of technical AI safety researchers that will be working on this is a few years time, I think a lot of them will be machine learning.

Robert Wiblin: What’s the biggest downside of the career path that you’ve taken?

Jan Leike: I have a lot of work to do.

Robert Wiblin: Okay. Right. I guess a lot of responsibility ’cause it’s such an important issue that you’re working on.

Jan Leike: Yeah. It’s like, right now, there’s all these things that need to be done, and there’s not many people to do them. So, I have to prevent myself from really doing all of them, or trying to do all of them and then failing miserably, because it’s really too much to do.

Robert Wiblin: You have a lot of prioritization to do, I guess.

Jan Leike: Yep.

Robert Wiblin: So, there’s some people who are worried about AI safety and interested in contributing, but they are reluctant to get involved ’cause they think I’m not one of the smartest mathematicians in the world, I’m not one of the smartest mathematicians in America, so can I really make a difference? Do you think that’s misguided?

Jan Leike: Yeah. I think you really don’t need to be a math genius to work on these problems. I think it’s really much more important that you have the research mindset, and you can think critically and come up with new ideas.

Robert Wiblin: I guess stick with difficult problems, even though it’s not clear when you’re going to find a solution.

Jan Leike: Exactly. Yeah. Be comfortable with working with the uncertainty and unknowable questions.

Robert Wiblin: So, suppose someone is thinking of doing a PhD in machine learning, what general advice do you have for them in preparing for that, or what they should do while they’re doing their PhD?

Jan Leike: So, while you’re figuring out whether or not to do a PhD and the way to do it, I think there’s a lot of things that you should factor into your decisions, and it’s really helpful to read general advice on how to do your PhD. A lot of that you can find online and various books. Right now, pursuing a PhD in machine learning is very difficult because a lot of senior people have left the industry, so there’s not many professors around in academia anymore who train students. As a result of that, and also because ML is really exciting at the moment, there’s lots of people trying to move in this space, PhD applications are extremely competitive, especially in the top places.

On the other hand, today we have so many other tools available that you can use to get into this, like online courses. There’s a deep learning course on Coursera right now. There’s deep reinforcement learning courses that you can check out online. There’s tutorials and open source frameworks, like TensorFlow and other things. At the moment, basically, all the machine learning publications are available in Arxiv, so you can just read them for free. But, all of these things don’t really teach you research skills. I think, for that, doing something like a PhD is very, very useful.

Robert Wiblin: We’re often a little bit reluctant to tell people to go and do PhDs unless they’re really sure that they want to work in the area, because it can be a really big commitment. How long does a machine learning PhD take?

Jan Leike: Yeah. In the US, they can take five or six years. In Europe, they tend to be shorter. They can be three or four years, but they require you to have a master’s degree when you apply. So, in that way, it is quite a big time commitment.

Robert Wiblin: If you decide that you don’t want to continue halfway through, can you leave with a master’s?

Jan Leike: In the US, you can. I think after two years, or something. There’s also cases where people have started a PhD and then got hired by a top industry lab kind of halfway through.

Robert Wiblin: So, you’re probably not going to be short of options.

Jan Leike: Yeah, I don’t think. If you do well on a machine learning PhD, I don’t think you will be short of options.

Robert Wiblin: What if someone did their undergraduate degree in something other than maths or computer science, how can they pivot to get into a machine learning PhD?

Jan Leike: Yeah. I think that depends on what your background is and where you are in your career. So, if you, say, have a PhD in physics, maybe what you should do is read up on machine learning and then try to do an internship and get a paper published. If you’re more on a bachelor’s and master’s level, maybe a master’s in machine learning is the right thing to do for you. It really depends on what you are, what you’ve done, how much research experience do you have already, and stuff like that.

Robert Wiblin: Well, it sounds like you could use all of the help that you could get at DeepMind, and you’ll be really interested in hiring people who are qualified to do this kind of research. So, is there any last thing you would like to say to people, to inspire them to try to come and join you?

Jan Leike: Yeah. Yeah. I mean, of course. I think AI will open up all of these amazing opportunities to do good in the world. In particular, AI safety is especially promising of high impact stuff that you can do because so few other people are doing it, and we’re really trying hard at DeepMind to hire more people to do that. It’s been quite difficult to do that. So, yeah, I would really love, if you’re really good at machine learning, you should come work with us.

Robert Wiblin: My guest today has been Jan Leike. Thanks for coming on the 80,000 Hours Podcast, Jan.

Jan Leike: Thanks so much for having me.

Robert Wiblin: If you enjoyed this episode remember to consider applying for Effective Altruism Global SF at eaglobal.org. Thanks for joining, talk to you next week!

Learn more

Risks from power-seeking AI systems

Machine Learning PhDs

The 80,000 Hours Podcast on Artificial Intelligence

Related episodes

August 7, 2023

#159 – Jan Leike on OpenAI’s massive push to make superintelligence safe in 4 years or less

Listen now

July 21, 2017

#3 – Dario Amodei on OpenAI and how AI will change the world for good and ill

Listen now

June 6, 2017

#1 – Miles Brundage on the world’s desperate need for AI strategists and policy experts

Listen now

October 11, 2017

#10 – Nick Beckstead on how to spend billions of dollars preventing human extinction

Listen now

About the show

The 80,000 Hours Podcast features unusually in-depth conversations about the world's most pressing problems and how you can use your career to solve them. We invite guests pursuing a wide range of career paths — from academics and activists to entrepreneurs and policymakers — to analyse the case for and against working on different issues and which approaches are best for solving them.

Get in touch with feedback or guest suggestions by emailing [email protected].

Our crash course on transformative AI

We've carefully selected 10 key episodes to help listeners get to grips with the potential upsides and downsides of powerful, transformative AI.

Check out 'The 80,000 Hours Podcast on AI'

Listen here, or anywhere you get podcasts:

If you're new, see the podcast homepage for ideas on where to start, or browse our full episode archive.

#23 – Jan Leike on how to become a machine learning alignment researcher

#23 – Jan Leike on how to become a machine learning alignment researcher

On this page:

Highlights

Articles, books, and other media discussed in the show

Transcript

Learn more

Risks from power-seeking AI systems

Machine Learning PhDs

The 80,000 Hours Podcast on Artificial Intelligence

Related episodes

#159 – Jan Leike on OpenAI’s massive push to make superintelligence safe in 4 years or less

#3 – Dario Amodei on OpenAI and how AI will change the world for good and ill

#1 – Miles Brundage on the world’s desperate need for AI strategists and policy experts

#10 – Nick Beckstead on how to spend billions of dollars preventing human extinction

About the show

Our crash course on transformative AI

Our research

Follow us

Take action

About us