Enjoyed the episode? Want to listen later? Subscribe by searching “80,000 Hours” wherever you get your podcasts, or click one of the buttons below:

Just two years ago OpenAI didn’t exist. It’s now among the most elite groups of machine learning researchers. They’re trying to make an AI that’s smarter than humans and have $1b at their disposal.

Even stranger for a Silicon Valley start-up, it’s not a business, but rather a nonprofit founded by Elon Musk and Sam Altman among others, to ensure the benefits of AI are distributed broadly to all of society.

I did a long interview with one of its first machine learning researchers, Dr Dario Amodei, to learn about:

  • OpenAI’s latest plans and research progress.
  • His paper Concrete Problems in AI Safety, which outlines five specific ways machine learning algorithms can act in dangerous ways their designers don’t intend – something OpenAI has to work to avoid.
  • How listeners can best go about pursuing a career in machine learning and AI development themselves.


OpenAI is focused on reinforcement learning and learning across many environments, instead of just focusing on supervised machine learning. The views of the people are quite similar to Google DeepMind but it’s a smaller team – they’re slower to hire.

OpenAI is not about open source, but rather about distributed benefits from AI. AI gives you enormous leverage to improve the world because many currently unsolvable problems will become much more straightforward with superhuman AI.

Most people at OpenAI think safety is worth considering now. They already confront how AIs can act in ways their creators don’t foresee or desire, and these problems might get worse as AI becomes more powerful. Their third safety researcher is due to start very soon, and they’re hiring – get in touch if you’d like to join. OpenAI is closely cooperating with DeepMind to ensure safety research is shared and there’s no race to deploy technologies before they’re shown to be safe.

Dario thinks we should learn how to make AI safe for the future by working on actionable problems with feedback loops today – solving practical problems in ways that can be extended in future as ML develops.

The most natural way to get into working on AI and AI safety is by doing a ML PhD. You can also switch in from physics, computer science and so on – Dario’s own background is in physics.

To test your fit for working in AI safety or ML, just trying implementing lots of models very quickly. Find an ML model from a recent paper, implement it, try to get it to work quickly. If you can do it and enjoy it, AI research might be enjoyable for you. Many people with a quantitative background can try this at home. Even if you decide not to work on safety the exit opportunities from ML are excellent.

Articles, books, and other media discussed in the show


Robert Wiblin: Hi! I’m Robert Wiblin, Director of Research at 80,000 Hours and welcome to the podcast. If you want to make sure you never miss an episode from us, you could subscribe by searching for 80,000 hours in whatever app you use to get podcasts. That way you can also speed up the episode, which is how I much prefer to listen to interviews. Next week, I’m scheduled to speak with Alex Gordon-Brown about working in quantitative trading in order to and to give, which I expect to be very engaging.

Today’s conversation really goes into the weeds and I learned a great deal from it. If you’re looking for personal advice on how to pursue a career in technical AI research, stick around because we get to that in the second half. I apologize for the audio quality on my end. I think we’ll have that fixed up by next time. If you’d like to offer any feedback on the podcast, please do email me at rob at 80000hours dot org. We’re still figuring out how we can best use podcasts to help our readers and I’ll try to respond to everyone. Without further ado, here’s my conversation with Dario Amodei.

Today I’m speaking with Dario Amodei, a research scientist at OpenAI in San Francisco. Prior to working at OpenAI, Dario worked at Google and Baidu and helped to lead the project that developed Deep Speech 2.0 which was named one of 10 breakthrough technologies of 2016 by MIT Technology Review. Dario holds a PhD in Physics from Princeton University where he was awarded the Hertz Foundation Doctoral Thesis prize. Dario is also the co-lead author of the paper, “Concrete Problems in AI Safety”, which lays out in simple terms the problems we face in making AI systems safe today. Thanks for coming on the show, Dario.

Dario Amodei: Hi.

Robert Wiblin: We plan to talk about the motivations behind technical AI safety research, the Concrete Problems paper, and how someone can pursue a career in it for themselves. First, we’re at the OpenAI office here in SF, tell us a bit about OpenAI and how you ended up actually working here.

Dario Amodei: OpenAI is a nonprofit AI research lab. It was originally founded by Elon Musk, Sam Altman and a few other folks. Generally, we’re working on following the gradient to a more general artificial intelligence and making it safe. I joined around, what was it? July of last year, so about a year ago which was a few months after it started. I came here because there were a number of … I thought there were a number of really talented researchers here and it was a good environment in which to think about safety in the context of AI research that’s already being done.

Robert Wiblin: OpenAI was only founded about 18 months ago? Is that right?

Dario Amodei: It was about 18 months ago, yeah.

Robert Wiblin: How many staff does it have now?

Dario Amodei: I think there’s, last I counted, about 55 people here.

Robert Wiblin: Has it been difficult hiring that many people that quickly?

Dario Amodei: I’ve actually never worked at a startup before. Our CTO, Greg Brockman, was previously CTO of a startup called Stripe, which now has around a thousand people or so. It’s definitely hard. He’s really good. It is not something that I’ve been super involved in except on the safety side.

Robert Wiblin: I guess it’s the Bay Area way to explore some growth in organizations. Who’s backing it? What’s the budget like?

Dario Amodei: I don’t know if I can give exact numbers in the budget. The main donors at this point are Elon Musk, Sam Altman and Dustin Moskovitz through Good Ventures [00:03:27].

Robert Wiblin: It’s what you do pretty similar to what’s going on, at DeepMind? Are there important differences?

Dario Amodei: I would say that the general research agenda at OpenAI and it’s focused on reinforcement learning and in learning across many environments and trying to push forward the boundaries of what’s done instead of just focusing on supervised machine learning. I would say that’s very similar to DeepMind and probably it’s one thing that sets OpenAI and DeepMind apart from other institutions. We both have a similar focus on safety. We both have safety teams. I would say OpenAI is trying to be a smaller organization that focuses on hiring just the people that we want the most. That’s been one of the big difference. There’s probably some differences in culture as well that are a little bit intangible and hard to describe. I think generally our view of AI works and what to build in AI and the focus on safety are actually pretty similar between the two organizations.

Robert Wiblin: You studied Physics, right? Had a PhD in Physics? Then you switched into AI, or was your Physics-

Dario Amodei: My Physics work was, in particular, specialized in biophysics. I was thinking about models coming from statistical physics and applying them to models in the brain. Then also using the techniques from Physics and electronics to make measurements to try and validate those models. I come from a Physics background but I’ve been thinking about intelligence for quite a while and how intelligence worked. I think, when I get my PhD, I wanted to understand that by understanding the brain. By the time I was done with it and by the time I did a short postdoc, AI was starting to get to the point where it was really working in a way that it hadn’t worked when I started my PhD.

I felt like maybe it was starting communication the best way to understand intelligence would be to actually directly work on building parts of it rather than studying the messiness of the brain. That was what led to that switch.

Robert Wiblin: Do you want to give a quick pitch for why, what kind of artificial intelligence is so important?

Dario Amodei: Yeah. I think you can give a standard argument that though a lot of people are familiar with which is that if you think about any technology that humans have created, what’s allowed us to create that technology since the sanitation, flight, medicine, improvements in human health, improvements in the ability to feed the world. All this has been generated by our intelligence. Our intelligence is relatively fixed.

If we’re able to build something that was able to match or exceed our intelligence, then that would really be increasing the engine, produces a lot of the great things that we do and ultimately maybe I’m, maybe it would take a long time, would give us a much more complete control over our own biology and neuroscience could make us whoever and whatever we want to be, could end conflict, war or diseases, that stuff. That sounds a little utopian but I think if we push this technology far enough and all goes well, then that will lead to a result either immediately when we build it or over a somewhat longer period of time. I don’t see any reason why those things can’t happen. I think that’s the basic reason to work on AI.

As I’ve written, there are these safety issues where we can imagine situations in which it doesn’t actually go well to the extent that that’s a risk, that’s also risk that we can reduce. We can also have leverage by focusing particularly on reducing that risk. On both the positive side and the negative side, it seems like a-

Robert Wiblin: Quite a lot of leverage.

Dario Amodei: Yeah, there’s a huge amount of leverage to be had. The previous stuff I was doing was in biology. It’s great, you can help people, you can try and try and cure some disease but this feels like it’s more getting to the root of problems.

Robert Wiblin: What does the name “OpenAI” mean? Does that relate to the parts the organization is taking?

Dario Amodei: Yeah. I wasn’t actually present at the time that OpenAI was founded or the name was chosen. I wasn’t the one who picked the name. I think there’s been fair amount of misunderstanding. I think there’s one group of people who think it’s all about open source, and releasing open tools. There’s another set of people who, I don’t think many people think this anymore but who for a while thought that it was about making an AGI without any safety precautions and just giving a copy to everyone and that this would somehow solve safety problems. These were too early misconceptions that were around long before I joined OpenAI. My understanding is that it’s meant to indicate the idea that OpenAI wants the benefits of AI technology to be widely distributed. Assuming-

Robert Wiblin: Rather than only going to the owners.

Dario Amodei: Assuming that safety and control problems are solved and we build AGI, there’s then a question about who owns it, what happens with it, what world do we live in after it’s created. Again, this is … I wasn’t the one who named this or set the specific mission statement but I think Elon’s intention with it was trying to think ahead given that we built an AGI and it’s not wildly unsafe how were its benefits distributed throughout humanity. I think openness is intended to indicate the idea that these benefits should accrue to everyone. That’s my understanding.

Robert Wiblin: OpenAI is a nonprofit.

Dario Amodei: It is a nonprofit.

Robert Wiblin: If you developed a really profitable AI, how does that work? OpenAI become incredibly rich and then it gives out the money to everyone?

Dario Amodei: Personally, I’ve no interest in getting rich from AGI. I think it would do so many interesting and wonderful things to humanity that I think-

Robert Wiblin: The question of getting a larger share is silly.

Dario Amodei: The meaning of money would change quite a lot and even maybe the psychological motivations that would want me to get a larger share or things I could change and my want to change. In many ways, shares in terms of money are maybe not the right way to think about it but I think there’s all kinds of stuff that could happen when AGI happens. Some of the things I think about are where that could go or what that could mean. The summary is we don’t know very much because it’s something we haven’t done yet. A lot of it’s speculation.

Robert Wiblin: What do you research here at OpenAI?

Dario Amodei: I mainly work on safety. We have a safety team that’s so far, myself and Paul Christiano. Paul was co-author of mine on the Concrete Problems paper and has also written a lot online on his blog about AI. He’s probably one of the people who’s I think done the most to promote clear thinking about the problem and tie it to current AI. We have a third person joining in a few weeks who I’m super excited about. We’re trying to build up a team that focuses on technical safety. We also do a little bit of strategy stuff which is how do we get different organizations that are working on AI to cooperate with each other, how do we cooperate with policy makers on questions like these. We’re also thinking a little bit about those issues but mainly technical safety.

I also do some stuff that’s not strictly technical safety but is generally done to stay up to date on where AI is currently going. I did some work on a transfer learning a while ago that was really a little bit safety motivated but trying to make environments that are broad enough that it’s possible to see distributional shift to our distribution problems. That’s the range of stuff I work on.

Robert Wiblin: What was the organizational culture like here? What kind of people does OpenAI attract?

Dario Amodei: I think we’ve generally been very selective in who we pick. Generally it’s people who are very talented machine-learning researchers but also people who, I would say not everyone, but a large faction of people here really do think in terms of eventually getting to AGI. At least some people, a significant fraction are quite interested in or at least supportive of safety work related to that or related to what we do. Now there’s a wide range of beliefs on how to work on safety, how possible it is to work on safety from our current vantage point. There’s a wide distribution of use but broadly people are pretty supportive.

Robert Wiblin: OpenAI recently moved away from software development, is that right? To more focus on machine learning?

Dario Amodei: That’s not quite right, it’s more … I think what you’re referring to is we had a project called Universe, which actually I was somewhat involved in the machine learning side. The idea of that project was to make a lot of environments that agents could learn using. The way we did was using something called the VNC protocol to connect directly to a browser through pixels and so that would allow you to play thousands of flash games and navigate weba tasks. It turned out to be a case. I was actually really excited about this because I saw this as a test bed to study safety. If you have hundred flash racing games, you can train the agent on one flash racing game and then see how it behaves badly when you transfer it to another flash racing game. You can study some of these open world problems where an agent has a very wide space to explore and a wide range of actions it could take.

Robert Wiblin: This is one thing that, and our researchers have been working on is teaching computers to play computer games really well like a superhuman level.

Dario Amodei: Yeah, this is … DeepMind has worked on this with ATARI games. We were taking it to another level with any game you can find in the internet. This ended up being a project that I think could probably be described as a little bit ahead of its time. It turned out that in order to connect this way, we needed all the different workers who are applying our algorithm to be asynchronous with one another and for reasons that were complicated when we figured out … we only figured out later. Actually such asynchronous communication was really hard to make it play well with ML. it led to a lot of complexity.

We’re, to some extent, de-emphasizing that project now. We’re actually trying to move to doing the same thing with a more synchronous environment. Basically, the same idea but more in a way that more amenable to ML benchmarking and to measuring how well we’re doing and doesn’t have this kind of hard to interact with property. It’s more like we made a tool and it was a good first attempt that add something ambitious but it wasn’t quite the right tool so now we’re working on changing it to a version that’s better, I wouldn’t say we’ve gone away from software engineering so much as we’ve been experimenting with how to produce tools and takes a few iterations to get that right.

Robert Wiblin: Turning now to the broader issue of superhuman AI development, what do you see as the potential dangers here? Why should anyone be worried about this?

Dario Amodei: My attitude, to start off with, has always been, although I do think about AGI, which is a term I prefer to use than super intelligence because I think no one knows whether a machine will rocket past human level or not so that’s something that could happen or not. AGI is something that I think definitely will eventually happen. I prefer to talk in terms of that. Even within safety, in Concrete Problems, I explicitly try to think in terms of not how powerful the systems are but conceptually what can go wrong with them.

The same kind of thing could go wrong with an AGI as could go wrong with a very simple agent playing a video game or robot cleaning your house. If it has the wrong objective function, if you don’t specify its goal correctly, it can do something unpredictable and therefore dangerous. In general, when I talk about safety, I talk about safety generically whether it’s in powerful systems or very weak systems. All that said, with respect to powerful systems in particular, I think there is a possibility that if we either do a bad job specifying the goals of complex systems or just they’re unreliable in the way that self driving cars are unreliable.

A self-driving car has to have a very high standard of safety in order to trust it to drive on the road. For almost a decade now, we’ve had self-driving cars that are 99.9% safe. But that’s not enough, we need them 99.999% safe. With AGI which is something that … it’s going to take a lot more novel strategies than self-driving cars, the space in which it operates is a lot broader than self-driving cars. If you just transpose that kind of safety testing from self-driving cars to general intelligence, even with all the controls you put on and even with all the safety standards, it’s clear that at the very least we’re going to have a big challenge in making sure that something doesn’t go wrong. If something does go wrong, it would be easy for or it might be easy for a large amount of harm to be done relatively quickly.

You have your AGI controlling the stock market or the economy or something and it just doesn’t know how to do it very well yet and something goes wrong. It takes a long time to unwind that. There’s a long tail of things of varying degrees of badness that could happen. I think at the extreme end is the Nick Bostrom [00:18:41] style of fear that an AGI could destroy humanity. I can’t see any reason and principle why that couldn’t happen.

Robert Wiblin: If it was sufficiently powerful or sufficiently good at accomplishing its goals.

Dario Amodei: If it was sufficiently powerful and like safety had been handled sufficiently badly, that is definitely something that can happen. I think there are folks at places like MIRI who say that this is the default outcome or this is likely to happen or there’s almost no way to avoid it, or you have to solve some incredibly hard math problem to avoid it. I don’t generally agree with any of those things but I think this is a possible outcome and at the very least as a tail risk we should take it seriously. I think another thing I’m worried about is that the wrong, even if we manage to make a super human AI safe or an AGI safe, then it might be used for the wrong ends.

Robert Wiblin: Deliberately.

Dario Amodei: Deliberately used for the wrong ends by a disturbed individual or an organization whose views are not aligned with humanity or nation state whose views are not aligned with humanity. That’s in my mind, the range of risks.

Robert Wiblin: Do you think there’s much of a chance that the risks are being overblown here and in fact it’s just going to end up delaying something that could be incredibly useful and make life a lot better?

Dario Amodei: It may very well turn out. Maybe it’s more than 50% chance that as we get close and closer to AGI then it becomes clear how to make something safe. Maybe it’s just like the goals are specified in a way that’s a very corded off from the task that are done that there are certain problems of nature like scanning brains or something that we need AIs to do for us in order to gain control over our biology or control over resources. Then there are human values and maybe there can be an efficient division of labor where there isn’t much confusion or maybe safety problems are just a bunch of research and … they’re just a corner of machine-learning research where we haven’t done much yet and so we haven’t tried. I can think of lots of ways and maybe it’s even the most probable way where things turn out totally fine. But I wouldn’t-

Robert Wiblin: You don’t want to count on it.

Dario Amodei: I wouldn’t in any of those cases [00:21:04] say the risk was overblown. It’s like, supposed you have a fire alarm and someone’s cooking a barbecue and it’s smoke, you wouldn’t call like installing the fire alarm overblown. It’s just sometimes you’ll have a fire and sometimes you won’t.

Robert Wiblin: You want to take precautions.

Dario Amodei: Installing the fire alarm is the right course of action. I think of this as a precaution. I don’t really think of it, I don’t think of anything I do as slowing down the rate of AI progress or at least I’m not trying to do that. I think of it as broadening the scope of AI progress and thinking about AI in a more interaction and human-centered way. If anything maybe it accelerates progress a little bit although that’s probably a minor effect. If people are worried about progress being slowed down, I don’t believe anything I do is close to that.

Robert Wiblin: Closing that, yeah. How much of OpenAI’s work is focused on these kinds of problems? It sounded like 5% of the staff but I guess other people are worried about too?

Dario Amodei: I think, broadly, most of people at OpenAI are worried about or at least think these issues are worth thinking about. That’s different from who is actively doing their technical work on it. I would say it’s three or four people now and hoping that that grows somewhat. We’re actively looking for really talented people. I think OpenAI as an institution has the general idea that in order to work on AI safety, you have to be at the forefront of AI. Also if you are at the forefront of AI, you have a better ability to implement AI’s safety in the final system that’s built.

Many people are interested in safety in the long run but I think until recently and even so now, I think many people here don’t know if there’s a way to work on safety right now. They’re skeptical that you’re able to work on safety right now with concrete work. I’ve been trying to change that with Concrete Problems and with this recent paper that Paul and I wrote on learning complex human preferences. We’re trying to show that there’s concrete work that can be done and that’s had a variety of reactions. Some people are like, yes, this is exactly what you meant with safety work now I see how it can be done. Some people are like, well that’s machine learning work, I don’t actually see how it connects to AGI and so then we’ll try and write another paper and say, “Okay, this is the line we’re drawing and this is how we think it gets us there.” It actually could turn out that this is mostly just ML work and the final systems we build are different enough that for whatever reason this ends up not being relevant to safety.

Again, I’m pretty happy in that world. If there was nothing concrete it was possible to work on safety and I instead ended up doing a different direction in machine learning, then that ends up being fine. Then it will turn out we couldn’t have worked on safety until later and then we’ll work on safety later. Whereas in the world where it does matter, it’s really great and really impactful to get a headstart on it.

Robert Wiblin: I’m curious to get your view on this debate that I’ve seen online. You have this contrast, some people like perhaps Bostron could be a case of this in the book SuperIntelligence of talking as though once we have a super human AI then it will get very much smarter very quickly and it could potentially just solve all of these problems, it could solve aging, like solve all of our health issues. Since the people criticizing this online saying you just think this because you’re a bunch of nerds and you think that thinking is the way to change the world, the way that everything gets done, but it’s not going to be so simple even if you had a very intelligent machine it wouldn’t necessarily be able to solve those problems. Do you have a view on that debate?

Dario Amodei: I’d rephrase the debate a little bit. I think there’s an interesting technical question of like, let’s say I built an artificial general intelligence tomorrow and because it’s software, let’s say I made a hundred thousand of them. How much does that fundamentally change our society and our technological capability? A lot of it is just, you can look at individuals throughout history that manage to discover a lot more than other individuals. You look at-

Robert Wiblin: Von Neumann.

Dario Amodei: Von Neumann or Einstein or one of these figures who just manage to be leaps and bounds ahead of others. The question is like what’s the ceiling on that. If we invented AGI tomorrow, would it take a couple of days to scan all of our brains into software, upgrade us, give us indefinite life extension, or would it just be like, “Oh, it’s more humans to talk to”. I think it’s actually complicated. I think some people act like it’s obvious one way or another but it’s not really something, I have a lot of certainty on in part because I think modern science has experienced a lot of diminishing labor like-

Robert Wiblin: Diminishing returns.

Dario Amodei: Diminishing return like depletion of low hanging fruit. It could turn out that like solving biology is just this exponentially complicated combinatorial problem or it’s limited by data and experiment. Of course, maybe the machines will allow us to do the experiments much, much faster. Then there’s some limit on the physical reaction time of the biological systems. When you put it all together, do we get zoom to do something much much faster than we ever could or do we get just some mild acceleration of what humans can be doing? I feel like many people act as if the answer’s obvious but as someone with a background in Biology, even thinking about all the directions in which machines can optimize it, my guess is machines could probably make things happen pretty fast but I think there’s huge uncertainty here and I don’t really think anyone knows what they’re talking about on this question.

Robert Wiblin: My background is in economics. I imagine if you had an incredibly smart AI and it was trying to figure out macroeconomics, like understand recessions and booms and bust cycle, I suppose it could have lots of conceptual breakthroughs but you take the measurements so quickly and you can’t really run experiments so you could end up being the, processing of the data that we get is extremely good and very fast but then the data only comes in so quickly and there’s only so much you can actually learn.

Dario Amodei: There are some simple stuff, which is like, I wouldn’t be surprised if for example a really powerful AI wouldn’t be able to understand our macroeconomic systems because this data issue but it would be able to design a better macroeconomic system. It’s weird. There are some stuff I feel like you just redesign it and you and do it much better. There’s other stuff that it’s just really difficult. I find this puzzling. I’m pretty agnostic on it. I don’t really have a good answer on the kind of like nerds think AI can solve everything question. I think there are some deep set problems in human nature and so just solving resource constraints isn’t going to solve war or probably in some ways already solved resource constraints. But maybe having true AGI will allow us to redefine what it means to be human and we’ll ultimately-

Robert Wiblin: Elevate ourselves above conflict.

Dario Amodei: Will elevate ourselves above our petty human bickering or maybe the petty human bickering will prevent us from being able to elevate ourselves so we’ll be stuck. I don’t know about that either.

Robert Wiblin: We’ll reach superhuman levels of petty bickering perhaps.

Dario Amodei: I don’t actually know. It’s actually very hard to know.

Robert Wiblin: We’ve mentioned a few times this paper, “Concrete Problems in AI Safety”, let’s dive into that. Before we discuss those problems, what was your impetus for writing it?

Dario Amodei: I had been aware of the work of the AI safety community for awhile but had in general felt that … I wasn’t particularly happy with the way they were phrasing things. It didn’t seem like what they were describing was actionable and there wasn’t a lot of ties to like … AGI was generally discussed in these very abstract terms like having the utility function and having incentives to do this or that, discussing things at these very abstract level, I couldn’t help but feel there were a lot of implicit assumptions that were not really being discussed.

At the same time, the mainstream machine learning community, which I’ve been a part of for about a year and half, having a lot of experience with speech recognition systems, one thing that I found about neural nets is that they’re very powerful but they’re very unevenly powerful. The key example I gave early on was you can train a speech recognition system on 10,000 hours of American accent of data. For someone with an American accent that gets it perfectly, then you give it someone with a British accent or an Indian accent or something, and it just does terribly on it. Of course, if you train it on enough diversity of accents then start generalizing better. Generally, when we build engineering systems, that silent, random failure, it’s not something that we see as a desirable property in systems we build particularly safety-critical systems.

The idea that fixing those problems was not just a one-by-one thing where we’re like, “Op! We’re using neural net again in this self driving car, what statistical test for everything we can get.” We’re using a neural net now in a drone, let’s make sure it doesn’t shoot someone. That we could have principles behind what gives us guarantees on the behavior of a system or at least what gives us statistical guarantees. That seems super interesting to me and it really didn’t seem like a … it seemed like very few people were actually working on it.

Me and some of my colleagues, Chris Olah at Google, Paul Christiano, who’s now at OpenAI, Jacob Steinhardt at Stanford, John Schulman here, and Dan Mané who is another Googler had all thought a little bit about this problem. We decided to get together and write down all of our ideas in a paper that would lay out an agenda for what, why we think this is a thing. In particular, I think … I felt that the machine learning community as a whole was a little bit confused. I think that they largely thought AI safety was about fears that AIs would malevolently rise up and attack their creators. Even when they didn’t think it was about that, they worry that the people who talk about AI safety will feed into fears that it’s about that.

I felt like this was a silly state of affairs and that of course we can do research on making systems safer and more reliable that doesn’t prey on these fears. In particular, we can even do research that ultimately points towards AGI. I think the important thing is that we shouldn’t go around with every other word we say being AGI in particular like the research itself shouldn’t be specific to AGI. You can’t really research AGI now because we can’t build an AGI. I think the very standard technique when doing research on a topic is if you want to think about a topic that’s abstract or in the future then come up with a short term bridge to it that lets you think about something conceptually similar in a way you can empirically test now. That was the general philosophy behind the paper and the philosophy behind the follow-ups that we and others have done to implement the research agenda described in the paper.

Robert Wiblin: What are some of the concrete problems? Do you want to tackle one or two here?

Dario Amodei: Yeah sure. I can go into them briefly. I think we made a distinction between problems that relate to, what happens if you don’t have the right objective function and what happens if you do have the right objective function but something goes wrong in the process of learning or training the system. Not having the wrong objective function, the extreme version of that is what’s talked about and the classical AGI safety stuff which is you want to specify a goal and you, for whatever reason, you know you have some simple instantiation of the goal and it ends up not quite being the right thing. We call that-

Robert Wiblin: The genie problem?

Dario Amodei: Yeah. We call that “reward hacking”. Few months ago, using an environment in the now de-emphasized Universe program. I had an example of like a boat race where the boat is supposed to go around and a few laps and what it’s trying to do is finish the race as fast as possible. The only way to be able to get points and you can’t change this because it’s the way the game is programmed is you get points as you pass targets along the way. But it turns out there is this little lagoon with all these targets and the targets also give you turbo so they make you go faster and faster. You can just loop around in this little tiny lagoon and not finish the race. In one sense, you shouldn’t be surprised it’s the correct solution, it’s how you get the most points. The idea is that the mapping from, well, this is the reward function to this is the behavior that leads to it is a very twisted mapping and so the point is that it’s-

Robert Wiblin: It’s not what you would have intended it for it to be maximize it.

Dario Amodei: It’s not what you would have intended it to be. The lesson of that is that it’s very easy to make small changes in the reward space and have that lead to big differences in the behavior space and also for the mapping to be very opaque for you to look at a reward space think you know what it means and in actuality it leads to something very different than what you would have expected. We call that generalized reward hacking. Then there was another problem called negative side effects, which is a little related to that, which is just that if your reward function relates to a few things in your environment and your environment is very big, then there’s a lot of ways for you to do destructive things. It’s one particular way in which it’s easy to specify the wrong reward function.

Robert Wiblin: Because you haven’t put in side constraints?

Dario Amodei: Yeah. You haven’t explicitly put in the 10,000 other things.

Robert Wiblin: All the things you care about.

Dario Amodei: 10,000 other things you care about. Then there was this thing called scalable supervision, which is, if you’re a human trying to specify a goal to a machine learning system even if you have a clear idea of what it is that needs to be done, then you don’t have enough time to control or give feedback on every action that an AI system does, and therefore limits to your ability to control and supervise can lead to a system behaving in a way you hadn’t intended because it interpolate in the wrong way. Those are the problems with like the classical AI safety type problems like you have the wrong … you somehow, you gave yourself the wrong goal in a way that was hard to understand. Then the more technical right problems that relate to, you know, your system was trying to do the right but something went wrong, it’s things more like this thing we called distributional shift which is when your training set is different from your testing sets.

The classical example of this is … when I was at Google there was an incident where Google’s photo captioning system had been … they had this photo captioning system that was trained on a lot of photos. It turned out that most of the photos were statistically biased to be photos of Caucasian people. There were also a lot of animals and monkeys in it. Unfortunately, this system reacted when a black person took a picture of themselves and the photo it tagged them as a gorilla because it has only seen humans with white skin so this was, of course, incredibly offensive and Google had to apologize for it. They even had … they even thought of this a little bit ahead of time but the neural net ended up being so screwed up that it didn’t even warn them that it was in a region of the state space it was dangerous.

Robert Wiblin: Because the algorithm has no concept of what’s offensive and what was not.

Dario Amodei: Yeah, the algorithm, it’s just-

Robert Wiblin: But it can produce a pretty horrifying outcome.

Dario Amodei: Exactly. It’s just a statistical learning system. It doesn’t know about racism, it doesn’t know about racial slurs. It doesn’t know about what’s offensive. It’s just a learning algorithm and it just learns from the data it was given. There turned out to be some problems with the data that it was given and there turned out to be some problems with the algorithm. It just innocently produced this extremely offensive result. This is … the world of neural net is full of this. I think something related to this distributional shift is adversarial examples, which my colleague, Ian Goodfellow, works on a lot, which is when you intentionally adversarially try to disrupt an input to a machine learning system and make a very small change to it that causes something bad to happen.

They were a little complimentary. Adversarial examples is like a small but carefully chosen, like perturbation to it, whereas [inaudible 00:39:34] of distribution is this holistic perturbation to it. Resistance against those two is … it’s separate. You’re talking about two orthogonal directions in the perturbation space. These are all issues with making sure that when you train something that it behaves in a new environment the way you would intend it to behave or if it goes wrong but it fails gracefully. We haven’t put a lot … when we put some work into this area, we cite a lot of papers in the Concrete Problems paper. I think relative to the stampede of work in mainline AI, I’d like to see more of this stuff.

Robert Wiblin: I think you did an interview with the Future of Life Institute where I think you talk about this paper for about half an hour so people who are interested can go and listen to that and you get more details on each of those different five problems. How do these problems tie together the long term concerns with the short term ones that we have today?

Dario Amodei: I think the attempt was to come up with some conceptual problems that relate to both that have long term and short term versions. With something like a distributional shift, the short term version of it is something like the gorilla. The long term version would be something like, well I’ve trained in AGI in a simulation and then I put it in the real world and a lot of things are different. Does it break a lot of stuff without meaning to? The super intelligent version of it is like whatever, it’s like the marks-

Robert Wiblin: It’s just the same but too extreme.

Dario Amodei: More extreme, it’s building a Dyson sphere, it’s never built a Dyson sphere before where like if something go wrong, whatever outlandish thing you can think of. I think the point and the explicit strategy was that people often contrast long term versus short term approaches as if working on short term safety and long term safety and like different topics and like they trade off against each other. What I’d rather do is have a thread running from long term to short term things where you identify what the fundamental problems are. Then you work on them on short term problem. Then as the systems get more powerful, you update your techniques. It creates this more symbiotic where you’re following along.

I think safety shouldn’t be anything different from reinforcement learning. Reinforcement learning is a general paradigm for learning systems. You can do something as simple as walk across a grid, all the way up to playing Go, all the way up to perhaps building a system that’s as intelligent as humans. I probably wouldn’t literally use reinforcement learning but … the reinforcement learning is a general paradigm that runs from things that are very simple, the things that are very complicated. I guess the idea was to do the same thing for safety, come up with some general principles that will carry across towards very powerful systems. I wouldn’t say these problems tell you everything that could go wrong with powerful systems. I think there are almost certainly things that are very specific to powerful systems.

My general view is I’m much less confident in our ability to identify those problems. Maybe we can, some people are trying but let’s … my view is just, it seems like there’s a lot seen on the table. Let’s identify the problems we can identify, let’s work on them, and then whatever is left, we either have to work on them very late in the process or maybe someone can identify them but that seems like the higher hanging fruit.

Robert Wiblin: The hope is that in order to solve the long term problems, you want to find cases that are similar today where you can get feedback on whether it’s actually helping.

Dario Amodei: Yeah, exactly. I think that there’s a magic of empiricism because it’s very easy to engage in long chains of reasoning about a topic that don’t get tied back to reality. Of course, the risk of working on short term stuff is that it doesn’t matter or it doesn’t generalize. The compromise I’ve come up with is try and think of things that are conceptually general and then try to tie them into empirics [00:43:48].

Robert Wiblin: To that end, has OpenAI made any noticeable progress on these problems or other problems?

Dario Amodei: Yeah. I think about three weeks ago, Paul Christiano and I and Tom Brown here, and three people at DeepMind including Jan Leike, Miljan Martic and Shane Legg came out with a paper called “Deep Reinforcement Learning from Human Preferences”. This works on the reward hacking, scalable supervision side of things. The way this paper works is normally you have a reinforcement learning algorithm it has a goal or a reward function. The agent acts to maximize that reward function. This works pretty well for something like chess or go where the behaviors are incredibly complicated but evaluating the goal is pretty easy. With go, it’s like, are you in a winning position, you have more territory? With chess, it’s have you checkmated the king or have you been checkmated?

It’s really easy to evaluate these simple goals with the script and so you can run the algorithm through millions or even hundreds of millions of games and the goal evaluation is easy. Most of the stuff that we do in real life, the goal was complicated. It’s like carry on the conversation, or be effective personal assistant to a human, which means scheduling things for them, making their life easier but not enabling all their private information to their boss or whatever. There’s a lot of like context sensitive stuff, which is part of what makes, it’s part of what leads to safety problems. If I take a complicated set of goals like that and I try and forced it into the framework of a hard-coded reward function, it’s going to lead to something that makes everyone unhappy because the two things don’t fit together.

Robert Wiblin: You said that it’s maximizing on one dimension and then fails on all of the others?

Dario Amodei: Yeah. Or just that the intrinsic number of bits of complexity in something like hold the good dialogue is very high. If I try and program that in, I might be going to be programming for a very long time in which case I’ll probably make an error. Or, if I try and make what I programmed simple, then there’s just not going to be enough bits of information to fit the actual complex nature of the goal. I’m either going to be very error prone or I’m just not going to be capable of learning what I need to learn. That’s why people talk about like strategies for absorbing values and things.

What our paper basically does to address this is it replaces the fixed reward function with a neural net based model of the human’s reward. The idea is you have a reinforcement learning agent that’s learning, and in the beginning, it starts acting randomly, and every one in a while, it gives some examples of its behavior to a human. It will come out with two video clips. The human looks of the video clips and says is the left better or is the right better? The human says left is better or right is better. If it’s playing pong something, then if the left is point got scored on you and the right is you scored a point, then the human will say the right is better. Then, the agent builds a model of what reward function would lie behind the human’s expressed preferences. The reward function becomes something implicit and learned, observed from the human’s behavior.

Then the RL [00:47:33] agent gets to work saying, “Yup! This is what I think the human’s goal is. I’m going to go and try to maximize this.” But then it comes back to you and it gives you more examples of behavior, and then, the human decides in those. Over time, the human is given more and more subtly different examples of behavior. The reward predictor in response learns to discriminate them and gets a more refined understanding of what the human prefers, and in the RL algorithm then tries to maximize that. The consequences of its behavior are then given back to the human. It’s this kind of-

Robert Wiblin: It’s kind of three steps, like the human, and then AI that’s trying to figure out what the humans are optimizing for and then the thing that does that. But then, most of time it’s asking the intermediate AI, is that right?

Dario Amodei: Yeah.

Robert Wiblin: Both is to have [crosstalk 00:48:20] intermediate model of what the human wants.

Dario Amodei: Yeah, you have three parts. You have a model of what the human wants. You have the RL algorithm that’s maximizing that model, and you have the human that feeds-

Robert Wiblin: That trains the-

Dario Amodei: That trains the model. But also the RL algorithm feeds back to the human so that it basically, whatever the RL algorithm has learned to do, it goes back to the humans. It basically says, “Okay, is this what you wanted? Of the things I’m now doing, which do you want more?”

It’s this gradual preference elicitation, which helps to get around the … if you get things wrong by a little then you get the wrong behavior. It’s unfolding behavior in real time and incrementally showing you the consequences of the behavior that you’re seen. By no means does this solve all safety problems. It’s just one little bit of progress on one.

Robert Wiblin: One brick in the wall.

Dario Amodei: Yeah, one safety problem. This is an example of this thing I’m talking about. We use this both to solve ML task that you couldn’t solve before because the reward functions were too hard to specify and then impact on safety is obvious because it allows us to specify goals more easily. There’s all kinds of other problems, you can have with it, it has to scale, there are other safety problems you don’t want AI systems tricking you. There’s so much. This thing is … there’s an example of what we did. I think we’re going to try and do a lot more it.

Robert Wiblin: How many times do you have to get feedback from the human to solve this problem? Is it a reasonable number?

Dario Amodei: Yeah. It depends on the task. On some of these ATARI games, which take about 10 million timesteps to learn, usually a human has to give feedback a few thousand times. Less than 1% or a tenth of a percent the human actually has to pay attention to. We managed to train this simulated little neural robot to do a backflip with a few hundred times steps. It’s the human clicking for about 30 minutes or so. We’re trying to get that number down because-

Robert Wiblin: This is the learning from human preferences paper?

Dario Amodei: Yes.

Robert Wiblin: We’ll put up a link to that. You can take a look at this little worm thing here that learns to, let us jump progressively from this flailing around than focusing on the ground.

Dario Amodei: It got a little bit of media coverage. My favorite headline was “What this back flipping noodle can teach you about AI safety”.

Robert Wiblin: It seems quite a bit.

Dario Amodei: I think it was “Here’s what this back flipping noodle can teach you about AI safety”.

Robert Wiblin: That’s some good click bait. Apart from those five issues that you talk about in the paper, what do you think are some other important problems or open problems in the field?

Dario Amodei: One thing we didn’t discuss in the paper is the issue of transparency of neural net. This is trying to figure out why a neural net does what it does, which you could eventually extend to why is a reinforcement learning system taking the options it takes. It just has a policy. It’s in a situation it runs a bunch of things through its neural net and it says, “I’m going to move left”, or “I’m going to bend my joint”. It doesn’t really have much explanation for what it does. If we could explain why, break down the decisions made by neural nets, then that could help with feedback, could help with making sure that systems do what we want them to do and that they’re not doing the right thing for the wrong reasons, which might mean they would do the wrong thing in another circumstance.

I think that’s a pretty important problem. My co-author on the paper, Chris Olah, did a lot of work in that area with Deep Dream, which is all of the back propagated images generated by neural nets that was originally designed to be a way to visualize what maximally activates a given neuron within a neural net. It was initially a transparency technique. That’s an area that Chris is very excited about. That’s another area I think we should work on. I mentioned before adversarial examples. I think that’s an area that’s already getting a decent amount of attention but probably should get more like everything in safety should get more. I think that’s an area we should work on, and also that has like short term safety implications. Someone could sabotage a self-driving car with adversarial examples. We certainly wouldn’t want that to happen.

Robert Wiblin: We can’t have that.

Dario Amodei: Yeah.

Robert Wiblin: Interesting. Is that a problem for a rollout of self-driving cars now? That someone might put up a word sign that confuses them?

Dario Amodei: I’m not the expert on it. I definitely don’t want to give anyone any ideas about how to do that.

Robert Wiblin: I guess it would certainly end up being criminal I would think to do that in the same way as hacking a computer system.

Dario Amodei: It’d be extremely illegal. I don’t actually know the details of whether that’s feasible or not and wouldn’t discuss them if I did.

Robert Wiblin: Of course, yeah. As we progressively work towards being able to control AIs that we’re developing, do you think it’s going to be possible for people [53:31] to understand the solutions that we have developed? You’ve discussed this three step process by which you train a machine or a reinforcement learning algorithm to understand the humans and then that trains the machine learning algorithm on the other side. I can understand that. There’s other big breakthroughs in history that you can get, like quantum physics, like it’s a particle and a half a wave and probably grasp it. Do you think it’s going to look like that? Or would it be just impossible technical details that-

Dario Amodei: The AI itself or the safety?

Robert Wiblin: I guess the way that we’re going to get machine learning or like other AI technologies to do what we want rather than flip out in some way we don’t expect.

Dario Amodei: I guess there’s two possible questions. One is are we going to understand at a very granular level every decision that’s made? Then there’s, are we going to understand the principles by which the system operates? I think we better understand the principles by which the system operates. If we don’t understand those, I don’t know how we can build them. If we did build them, I would definitely worry about their safety. I think it’s realistic to understand the basic principles on which something is built. But then there’s a question of on what level of extraction do we understand it right? The principles on which a visual neural net are built are very simple. It’s back propagation and alternating linear and non-linear components. That’s pretty much all there is to understand. Then the question is how much do we know about what goes on inside the neural net. That’s the question of transparency.

I’m optimistic that we’ll gain a better understanding of transparency inside neural nets. The question is how does that actually help us on safety, how do we actually use it? There’s a lot going on inside neural nets even if we could individually understand every piece of them. How does that actually help us? There’s more units than I can read and understand. I have to have some way of translating [00:55:17] that into something actionable like correcting bad behaviors or something. Somehow, that component of it has to fall into place as well. I don’t know yet how that’s going to happen. I don’t know if it’s possible. I think it’s urgently important research area.

Robert Wiblin: Let’s turn now to how someone might be actually able to pursue a career in AI safety. What are the natural paths to getting a job at OpenAI or other similar organizations?

Dario Amodei: I think my advice is going to be focused on the kind of AI safety work that I’m excited about. For example, MIRI does some safety work that’s more based on mathematics and formal logic. If you wanted to do that, you’d need a different background. The safety work that I’m most excited about, I think, it sounds obvious but the two things you most need are an extremely strong background in machine learning and a real deep interest in AI safety. I think to break those down, I think the first one is, certainly at OpenAI we really try hard to have a really high bar for hiring people. Just because someone wants to work on safety doesn’t mean that we lower the machine learning bar at all.

We have a lot of people here who are very good so going to get a PhD in machine learning, going to get a PhD with trying to work with the best people you can work with, doing the most groundbreaking work. There’s no ceiling to how much of this helps. My sense has been that people who have a deeper understanding of machine learning, if they’re interested in AI safety, also tend to really grasp AI safety issues better provided they think about them. That’s the second component, which is, I want people who really have a deep interest in safety not just, “Oh. it would be good if systems didn’t” … It would be good if self-driving cars didn’t crash but have a broad view of where we’re going with AI, which could be totally different from my vision, might not involve AGI but this general idea that we want to build machines that do what humans want and carry out the human will. I think that idea is a broad one and I want people working on safety to have a broad, broad view of that issue.

In the AI community, I don’t think the second one is lacking. There are many people who are passionate about the second one. I think the limiting factor is just a very strong machine learning talent.

Robert Wiblin: We just wrote a career review of doing a machine learning PhD, which we’ll put up a link to and then you can have a read. Is it machine learning or bust? Are there other options like other PhDs that people could do that it could be relevant like Computer Science or Philosophy or Data Science?

Dario Amodei: My PhD wasn’t in machine learning. We have a number of people here who have backgrounds in neuroscience or another area of computer science or mathematics or physics. It’s entirely possible if you happen to be educated in another area to go into this field. But I think going forward, if you’re a young student, I don’t particularly see a case for doing a PhD in another field if what you want to do is machine learning.

Robert Wiblin: Grab the bull by the horns.

Dario Amodei: I guess I’m saying it’s pretty easy to convert skills in related areas. Sometimes it gives you perspectives that you don’t have. If you want to do machine learning, you should get a PhD in machine learning. I think another thing I’d add is we do have some people working here who don’t have PhDs. My co-author, Chris Olah, actually never even went to college. He just straight went to Google. He had to do a lot to prove himself. The level of technical ability you need to show is not lowered, it’s even higher when you don’t have the educational background, but it’s totally possible.

I would say the most important thing is just being able to do a lot of impressive and creative machine learning work. I would even go so far as to say it’s not my expertise but even the people doing safety work that doesn’t involve machine learning, I get pretty nervous when we don’t have a strong background in machine learning because even if they think that a machine learning system can’t be made safe, they should know enough to understand why they think that’s the case and what they think the alternatives are.

Robert Wiblin: That includes, I guess, people doing mathematical research or philosophy research?

Dario Amodei: If it relates to AI safety. I would say that even those people, I would encourage them to learn as much machine learning as possible if only because they should understand approaches that they’re partaking.

Robert Wiblin: Is it fair to say that you think that the approach you’re taking where you study machine learning and try to actually improve AGIs is the best way to make AGI safe that you’d rather see someone do that than go into these other adjacent areas?

Dario Amodei: It’s a little complicated because I think that as systems get more complicated, there may be ways in which we combine neural nets for formal reasoning. There’s been some work by my friend, Geoffrey Irving, and some of his colleagues on doing theorem proving. Basically, using neural nets to select the lemmas to be used for the next theorem. If you take that far enough, you can imagine versions of reasoning systems that basically, they traverse some well-defined reasoning graph. They make logical conclusions that are tractable but it’s all driven at the bottom by neural nets driven intuition. The neural nets decide what conclusions you draw and where your thinking goes.

I think this is how humans do symbolic tasks like physics or math or anything like that. We’re neural nets at the bottom and then we have a layer on top of that that is … we use those neural nets to represent symbolic reasoning. Computer could probably do that even better because it can make sure that it never makes a mistake in this symbolic reasoning. The symbolic reasoning engine is there. You can imagine having formal guarantees on that formal reasoning. I think when we get to that it’ll look different from the way things are currently being done. I’m really not against using formal reasoning methods and using mathematics but I think it’ll be possible to do that work more productively once we understand how it fits in with current systems.

I actually don’t know. Maybe there’s stuff that’s being done now is productive but I’m pretty suspicious of anything where you can’t get that type empirical feedback loop because I think it’s really easy for people to fool themselves.

Robert Wiblin: It sounds like the key piece of advice is do a PhD in machine learning. What universities can people go to? What supervisors for a Phd? [01:02:44] Do you have any suggestions then?

Dario Amodei: I do want to repeat again that there are ways not to do a PhD. In particular, a number of people go to PhD for a year or two and then doing internship here or at DeepMind or somewhere, and then get hired so you can partial PhD can work. That said, I think the usual suspects are places like Stanford, Berkeley, Cambridge or Oxford in the UK, Montreal, Yoshua Bengio’s Group is pretty well-known for doing a lot of good stuff. Then, there are kind of number of other places. Definitely, the PhD pools are where we hire. A lot of our folks, because we know a lot of the relevant professors.

Again, we have some people here who didn’t do the PhD work. I think the most important thing is being able to keep up with the literature and make, creative, original discoveries that are novel and that stay on pace with what everyone else is doing. If you can do that then that’s the best thing. When I talk to people who I want to switch into machine learning from another field, the advice I always give them is just get every possible model you can. If you’re trying to learn supervised learning, get all the image now, models like implement them yourself, read the paper, implement it, read the paper implement it. Same for supervised learning, same for generative models. You just get this knack for it after you’ve done it for a while. It’s really just practical hands on experience. You just get a sense for these things once you’ve done them for a while. You also find out quickly how good you are.

Robert Wiblin: I might be showing my naivete to ask this but machine learning is only way that you can approach AI right? There’s like other paradigms on how you produce another artificial intelligence, is that right?

Dario Amodei: Yeah, this has historically been the case. It’s gotten a little bit, I think complicated. In the old days, we had things like expert systems that were based purely on logical reasoning. Generally, they found that those systems were very, very brittle because they couldn’t represent the high dimensional space that we see. For example, vision system that’s based purely on rules, it’s difficult because if I’m trying to identify a face or an object, I’m trying to identify these blobs in distribution space. In some sense, these problems are inherently statistical. The rule based systems actually don’t end up working all that well. Historically, there were statistical system and there were rule based systems.
I think we can say, now the statistical systems have pretty decisively won. I sometimes hear people say things like, “I don’t think AGI could be built using machine learning or I don’t think AGI could be held safely using machine learning. I don’t think AGI could be held safely using machine learning or something. I think when people say that they aren’t really thinking carefully about the alternatives, I’m quite sure that a pure rule-based system is just not going to work because of the thing I said that you have to ground what you’re doing and sensory information and the sensory information is just inherently statistical and fuzzy. Pure rule-based systems I think are not going to work. What could happen is the thing I described before where you have a machine learning systems, deep neural nets being used to drive logical reasoning systems.

That will be a hybrid of the two but people are working that, that will be considered within the field of machine learning. I think we’ll often, in the future, machine learning may include logical reasoning processes but there’ll be at a higher layer. What ends up happening, it will involve reasoning in the same sense that goal-playing systems involve a goal but it won’t really be like those rule-based systems that we had before. You can’t be 100% sure of this but I think just the basic argument that percepts are these statistical blobs and so you have to use the statistical system at least at the beginning to measure them. Then whatever concept you draw from them end up being fuzzy statistical concepts. If you want to bring those back to logical reasoning, the reasoning have to exist on a plane that’s subtracted from that.

Robert Wiblin: So there’s not some other AI paradigm that people should be doing if deep studying?

Dario Amodei: Again, in the history of AI, there were a lot of rule-based systems. Then there were critiques written of them. I forgot the guy’s name, Hubert Dreyfus or something who basically, he was like this continental philosopher who wrote this critique that people found really hard to understand but what it was really saying was just percepts of these statistical blobs. If you make a rule-based system, you’re not going to … you’re always going to make mistakes and your system is always going to be brittle and wronged a lot.

Robert Wiblin: Assuming someone has been studying machine learning or one of these other related areas that’s potential path in, is there a natural path from studying to actually getting a job at OpenAI or another organization? Are there intermediate steps that people have to take?

Dario Amodei: Yeah. I think, again, PhD students who do impressive first author work on papers are people we are generally very excited to interview. If you’re in a good PhD program and you do some work, then I’d definitely do some good work. I’d definitely encourage you to apply here. For people who are relatively early in their careers or come from another background, there is a program at Google called The Brain Residency program, that allows you to study machine learning with the experts for a year. Then, that allows you to know to train your skills. We’ve had a number of residents who have applied here or elsewhere. That ends up being a good thing.

Robert Wiblin: Speaking of which, there’s a bunch of different organizations, right? There’s OpenAI, there’s Google DeepMind, Google Brain, Vicarious was another company or maybe more in the past. There’s the human compatible AI group at Berkeley. You can like go through a couple of these that you might recommend working at?

Dario Amodei: As I said, OpenAI and DeepMind are probably the most focused on reinforcement learning. They probably spend the most time thinking about AGI. Not everyone here does but it’s a focus here more than it is elsewhere. Google Brain is where I was before coming here. That was the original research group at Google. I would say a more kind of decentralized group that works on a wider set of topics. Chris Olah there thinks about safety. There is … you mentioned the Centre for Human-Compatible AI which is Stuart Russell’s group. We collaborate with them some. We have some interns from there come here. I think Stuart’s been someone who’s been thinking about safety for a while. That’s another good place to work on it. Vicarious to my knowledge, doesn’t think about safety. I could be wrong.

Robert Wiblin: Are there other groups that you missed here? Are there any other government research projects? Anything in China?

Dario Amodei: Not that I’m aware of. Of course, there’s MIRI and FHI.

Robert Wiblin: Right the Future of Humanity Institute and the Machine Intelligence Research Institute (MIRI). They’re less doing machine learning where it gets more strategic and-

Dario Amodei: Their focus is less on machine learning although I think Stuart Armstrong at FHI collaborated a bit with DeepMind, which something that I think was broadly machine learning related. This was like studying interruptibility and corrigibility or something like that.

Robert Wiblin: At 80,000 Hours, we talked to people reasonably open who would be interested in doing a job like what you’re doing. Is there any way that they can get indicators early on about whether that’s possible or whether they just wasting their time and I’m sure looking at other options because if it’s not going to be a good fit, either they don’t maybe have the machine learning chops so cultural they’re not going to a good fit or some other reasons.

Dario Amodei: I think the thing I mentioned about implementing lots of models very quickly. If you want to know where you’re good, a way that is a good proxy for how well you’ll in grad school at well at for the test we give when people apply here is find the machine learning model that’s described in a recent paper, implement it and try and get it to work quickly. This is a painful process for you and you really don’t like doing it, then you aren’t going to like any of the research that of either on AI safety or other AI stuff.

If you find you can do this quickly and/or, you really, really like doing it. You find it addictive, then that’s an indicator that this is something, this might be something you really want to do. I wouldn’t worry about the cultural stuff. If you’re skilled in this area and passionate about this area, I don’t think you’ll have-

Robert Wiblin: That’s not going to be a barrier.

Dario Amodei: I don’t think it will be a barrier. I don’t think you’ll have any problems. You try and be ask open and welcoming as we can, we don’t have the luxury of selecting people and, anything other than their.

Robert Wiblin: You don’t like our favorite TV shows.

Dario Amodei: Yeah, I mean. That’s just wasteful and pointless.

Robert Wiblin: Yeah, absolutely. How well you can people to do that? Cannot they do that as an undergraduate? Is it more like a PhD level thing?

Dario Amodei: People can do it in high school like, if you meet a 17 year old, how do you get them into machine learning. I’m like, just go home and implement these models. You actually don’t need any kind of formal education. You probably need a thousand dollars to buy a GPU. I have considered various times, shall we have a like for grand program where it’s like, if you’re like 17 years old and you want to get into machine learning, I’ll just buy you a GPU.

Robert Wiblin: Yeah. Yeah why not.

Dario Amodei: If you’re interested in AI safety, that was a thousand dollars. Most adults living in the developed world can afford a thousand dollars but most 17-year-olds might not be able to. If they don’t already have access to one it might be a good way to get people started early.

Robert Wiblin: If there’s a 17-year-old listening here who wants to go and build their machine learning model, what should they Google for, what should they start reading?

Dario Amodei: Because I’m from OpenAI/DeepMind direction research, thinking about reinforcement learning, trying to-

Robert Wiblin: What’s the difference between Reinforcement learning and Machine learning? Do you want to-?

Dario Amodei: Machine learning is the broader topic, within there are several different areas. There is supervised learning which is where you try and predict some data that’s been labeled. An example of a supervised learning problem would be like you’re given images and they correspond to objects. This is an image of a dog, this is an image of cat, this is an image of a computer. You train the network on lots of pairs of here’s the image, here’s what it is. Then it learns over time to map the two to each other.

Supervised learning has this static quality where it’s like a one off. You’re trying to like, predict one thing from another. Reinforcement learning is more a setup where you’re interacting in a more intertwined way with an environment. The game of go is like this. You make a move, and then the opponent or your environment makes a move, and then you make a move again and overall you’re trying to win the game and the reward or figuring out whether you’ve won the game or how well you’re doing can be delayed by a long time. The reason I focus a lot on reinforcement learning and why OpenAI focuses a lot on it is that reinforcement learning and things like it, the extended versions of reinforcement learning seems like a better fit for what intelligent agents do in general.

Often, I have very long range goals, I’m trying to get an education, trying to get a PhD, trying to have a career, trying to start a family or something. These are all things that unfold over years and involve interacting with my environment in this very complicated way. Reinforcement learning is the only paradigm we have that even close to capturing this,

Robert Wiblin: Sorry. I cut you off. We’re figuring out what the seventeen year old should read to get that foot in the door.

Dario Amodei: Lots of papers in reinforcement learning. I read about what’s called the DQNs – it’s Googlable. It’s not common acronym. Deep-Q learning that this was a paper done by DeepMind in 2013. Policy gradients, particular A3C and just follow the tools of recent reinforcement learning things that have showed up on arXiv. Just go to /r/machinelearning on Reddit and look at some recent papers in the deep neural net literature. Look at them try to re-implement it, see if you can get results as good association the results that others get. It’s really pretty self-contained and you don’t need that much help. If you’re having trouble getting started implementing them, then you can start by fermenting popular papers like DQN, you can find it an existing implementation that start with that and try to fiddle with it to see if you can make it better.

Robert Wiblin: What kind of program or you running, because you’re moving out doing this in Excel. What’s the-

Dario Amodei: No. It’s typically, you’ll use a, the typical is a Python with Tensorflow. Tensorflow is this tool that Google Brain team made for doing general computations but in particular deep neural net computations. You’ll find a large fraction of this office implemented in Tensorflow or some similar framework so python’s pretty easy to learn, Tensorflow is pretty easy to learn. Read some Tensorflow code as some stuff that’s been implemented, learn Tensorflow and implement some stuff yourself.

Robert Wiblin: What’s the range of roles available? How do they vary?

Dario Amodei: You’re talking about within machine learning or within safety-related stuff?

Robert Wiblin: I guess mostly within safety I think.

Dario Amodei: Okay. I can talk about what’s being done here at OpenAI. I would say there is two main directions. There’s the technical safety stuff, and there’s the policy side of things. On the technical safety, there’s not a lot of people working on it yet but the Human Preferences paper that I showed you is a good example of it. A lot of the papers we cite in Concrete Problems are good examples of this work. DeepMind just have some recent good examples of safety work. I think the skill set, as I’ve said, is very similar to the typical machine learning skill set. You should also be willing to work in the field that has relatively sparsely populated literature, which means coming up with your own ideas or working very closely with someone who’s one of the people generating the ideas in the field.

That has some downsides in that. You have to set more of your own direction but also has some upsides that you can be one of the first people in a to totally new field. That’s what excited me about writing Concrete Problems. I can work on something that 200 other people will work on or I could try and set a new direction. Maybe it won’t be exciting at all and you know, “Oh well! At least I did something interesting.” Maybe it’s turns out to be really exciting and that’s like a bet that I’m perfectly happy to take.

Robert Wiblin: What would you recommend to someone who is considering entering the AI safety industry doing the machine learning work. They’re worried that they’re not going to have such great long term career options elsewhere, especially compared perhaps to doing machine learning work in a more commercial way with less of a safety focus, of just going into what pays the most or has the best career.

Dario Amodei: I think ML work is so hot right now that anyone who goes into it, particularly on the fundamental research side, it’s easy to transition to applications. I think the kind of safety work that we’re doing has many of the same skills as any other area in machine learning, even though the subject matter is very different. I think some who does that is going to be in a very strong position to do very well in the future. I think it’s probably, even if we’re going into for altruistic reasons, it probably just also having to be on the most secure career, high paying, financially secure career areas. You could go into … we recently had someone leave OpenAI who became head of AI at Tesla, head of all of AI reporting directly to Elon Musk.

I myself want to stay at OpenAI and work on safety. I want to keep working on the research end all the way until we get AGI whenever that is. If I didn’t want to do that, if I wanted to leave, there’s like plenty of wonderful things that I can do and the same would be true of other people who come here.

Robert Wiblin: Is it more of a concern from people who would be working at MIRI doing non-machine learning safety research.

Dario Amodei: Yeah, I mean I think most of the people there are smart people who either had or could have really great careers as software engineers. They probably have great, great, great options as well. I generally get the sense, people who go to MIRI are really passionate about MIRI’s mission and tend to worry about this less. The amount of buzz and hype is definitely not as high as it is for the machine learning bolt.

Robert Wiblin: Pretty often, we talk to people who are saved in the mid 20s and they did a fairly quantitative degree, maybe like economics or logic. I do know machine learning particular. Is it possible for them to great to get into this? Was it all just over for them at 25?

Dario Amodei: My own example is, until I was 28 or 29 or something, like I hadn’t any machine learning. It’s definitely possible to do this. My main advice is the same advice that I’d give to you know, the 17 year old that we talk about earlier, which is just implement as models as you can as quickly as you can. Just to see if you have the knack for it you really enjoyed doing it because this is going to be greater than 50% of what your job consists of. Just knowing how to have the real intuition to implement neural net models and have them work, how to put together new architectures that do new things.

First, implementing these papers and then tweaking them. That’s a really cheap to give out whether this is a career that you’re good at and that you’ll enjoy. I wouldn’t recommend by going back and doing another PhD in machine learning. I think once you have a PhD like there are some positions where Google wants you for instance, wants you to have a PhD when they hire you. They don’t really care what area it’s in. They do want to know that you’re like committed to some new area that you want to go into but it’s more important for the places that care about whether you have a PhD, which we don’t care that much. Even though at places that care, things more important, you have a PhD and that you have it in some particular field.

Robert Wiblin: If you already have a PhD in Philosophy, then you should go and learn ML directly or do some internship somewhere.

Dario Amodei: Yeah. I would say learn ML, implement a bunch of models then go do an internship or the Brain Residency program at Google, come do an internship with us or at DeepMind, all these are viable options and each step gives you a better ideas of whether this career path is really for you.

Robert Wiblin: Really, it’s just a case of someone who did an undergraduate degree in Economics can jump in and try to learn machine learning, try to train machine learning on my computer.

Dario Amodei: Yeah, I think there’s … if you know how to program Python and you can learn Tensorflow quickly that it’s a very empirical field. Of course there’s lots of hidden knowledge that researchers know that they tell each other but that it’s hard to express in the papers. You won’t pick up on everything but you can certainly get started this way. Then talking to people about models, you’ve implemented talking to professional researchers to get a sense of what’s an exciting to work on next. That’s enough to get you started.

Robert Wiblin: How does working at OpenAI or Google compare to machine learning role in academia?

Dario Amodei: I generally tend … I’m a bit biased but I generally tend to give people the advice to come to industrial labs. I think one reason is the industrial abs have gotten, by industrial labs, I mean OpenAI even though it’s not for profit. Just the large non-academic research center, they tend to have more resources, more compute, and in part because of this, I think they’ve been winning the talent war recently.

I think it still makes sense to go do a PhD. I think staying in academia your whole life, I mean, I guess if you become a professor then it becomes a lot easier to collaborate with the industrial labs. Both we in DeepMind have people who were our professors and spent part of their time there and part of their time here. It’s all very feasible and there’s a little of mobility between the two. But in general it felt many people will disagree with me but I felt that the most groundbreaking work has tended to arc the industrial labs. Over the last couple of years at least.

Robert Wiblin: Yeah, interesting. Is there any effort to change them? Are you universally is trying to catch of it just too expensive to get the research projects.

Dario Amodei: Yeah, there’s Yoshua Bengio’s group in Montreal as you know is quite larger, it’s one of the few major figures in deep learning who’s resisted the pressure to go into the industrial world. His lab does a lot of great work, Pieter Abbeel at Berkeley, Percy Liang at Stanford, and just a number of others including folks who do work that’s not necessarily related to deep learning. There’s a lot of interesting work everywhere. At least the kind of safety that I work on tends to play best with the cutting edge of ML work and explicitly tries to keep up with the cutting edge of ML work.

Robert Wiblin: That reminds me, given the cost, how is OpenAI funded? Is it just the donations or are you also like selling products at things point?

Dario Amodei: No, we’re nonprofit. I think I mentioned earlier the major donors are Elon Musk, Sam Altman and Dustin Moskovitz at this point.

Robert Wiblin: It’s just the donations.

Dario Amodei: At this point it’s just donations, yes.

Robert Wiblin: Okay, interesting. Do you think, is it possible for you to sell things to go extra computational power if you need or like it starts selling services. It’s a legal issues, it’s outside the area.

Dario Amodei: Yeah, I’m not an expert in this. I think … I’m not sure you can sell stuff if you’re nonprofit.

Robert Wiblin: Is the work frustrating because you’re not sure whether solutions actually exists and you beat your head against the wall for quite a while before you figure out, well maybe that there isn’t a way of solving it the way that you thought?

Dario Amodei: I think actually that’s the case in any area of machine learning where you’re trying to do original research. If you’re trying to do something worthwhile, they you don’t already know if it can be done and you have to try stuff that seems crazy. It might not … it’s true if any area, especially true of an area that’s very new like AI safety. Yeah, I definitely agree that one of the trade offs for working with AI safety is that on one hand you have this exciting ability to work on a new field that’s just starting. It could be very impactful but at the same time no one’s defined what successful work looks like.

We’re still laying out what the problems are and what the work is that needs to be done. I think it definitely requires an attitude of being willing to do more to define problems yourself. You need to be more creative instead of doing something that’s an incremental improvement on the thing, the thing that was done last. To me that’s a good property.

Robert Wiblin: Is it a good role for someone who a lot of grit and willing to persist with things despite adversity. They’re pioneering in your area.

Dario Amodei: I think that quality is useful in any important or original work and it is here as well.

Robert Wiblin: Turning now to non-machine learning approach I to tackling the problems in AIs safety. What kind of non technical approached do you see as promising? I interviewed Miles Brundage of the Future of Humanity Institute, recently. Do you have a view on any of the AI policy topics that we spoke about?

Dario Amodei: Yeah. I don’t know precisely what you guys spoke about. I spend a little bit of my time speaking about the relevant policy issues. I think if the humanity at some point builds AGI then we’re going to have to think about both how to handle safety issues as we’re building it. Some of the coordination issues that going to come up with respect to safety and also the question of who uses it, what it’s used for. One example is this, you can imagine that maybe if it’s possible, a really good way to build again would be to build an AGI, and instead of doing anything with it in the world, try and, if it’s possible, first developed the develop the capability to have it advise you on this situation you put humanity in by building it. Look, we just opened this can of worms by creating you. Can you analyze our strategic situation and say what we should do because we’re aware that if we don’t use you in the right way, or we hand you to the wrong person, then it could be really bad for humanity.

If we’re able to turn the problem in on itself that way, they would be really good. That’s partially-

Robert Wiblin: Get the AI to make the world safe for AI.

Dario Amodei: Yeah, it’s partially a technical question and it’s partially a policy question, which is how do we get ourselves in a situation where we can do that. I think that there’s a lot of players. There’s going to be more AI organizations, government actors, will someday have something to say about Ai. They already have something to say about AI. Someday they’ll have something to say about AGI. When we’re more in the world where AGI is going to happen. What strategies should we take towards all of the actors. How do we make sure that when everything is put together, it leads to a good outcome. Is there anything we can do today to deal with these distant problems.

Those are the set of policy issues that we tend to think about. There’s also some more thought on short term policy issues. How can we get people to think about more mundane issues of safety, should the government regulate things, what should policies be on self-driving cars and stuff, what should policies be on automation and job creation. We do some of that stuff but a lot of people think about that so we tend to focus more on the long range stuff. It’s less actionable business there aren’t that many people thinking about it. We might as well do whatever thinking we can on it. Which might be there’s nothing that actionably be done, but we want to at least consider.

Robert Wiblin: Do you have any thoughts on how we can ensure all of the players cooperate and avoid having an Arms Race where they just try to incorporate and avoid having an arms rice where they just try to improve their machine learning techniques really quickly without regard to safety. It seems like you’re collaborating a great deal with DeepMind.

Dario Amodei: Yeah, this was in part motivated by the idea of the orgs working together. It helps that myself and some of the founders of DeepMind have known each other for a while. We all think about AGI and think, that safety issues are important. When people at the major organizations are friends with each other and work to actively collaborate, then that reduces the probability of any kind of conflict because people know each other. There isn’t fear or uncertainty. If there’s a disagreement, we can work it out. Then the question is, how does that scale to there being a lot of organizations. How is that scale to others who get involved once they see how powerful AI is. Can we make them cooperate as well? My hope is that we can. No, it’s not an easy thing.

Robert Wiblin: Do you think it would be good or bad thing if AI were developed sooner. There’s been a kind of this explosion of investment in machine learning and improvement. Is this something that we should be pleased about or concerned about, or just neutral, we’re not sure whether it’s good.

Dario Amodei: It’s hard to say, the obvious bad thing is if you’re like Really afraid that there’ll be safety problems with AGI then you might think it was a bad thing. A lot of people think it’s bad thing for that reason. My view is we’re relatively early in the game and I think there’s a substantial probability that the gloomy analyses are really misunderstanding the safety problem and how a safety problem works. Some counter wrist to worry about that something bad happens to the world in the meantime while we’re trying to develop AGI, or that AGI is used in a bad way. I guess a couple of years ago, I often made the argument that we were in a relatively peaceful geopolitical time so it would be good that AGI will be built. I’m starting to wonder that in the last year we’re not in such a peaceful geopolitical state.

Robert Wiblin: It’s a little bit less clear.

Dario Amodei: Last year so that maybe we’ll not in such a peaceful geopolitical state.

Robert Wiblin: As we’re recording everyone is flipping out about North Korea developing intercontinental ballistic missiles.

Dario Amodei: No, these things are pretty deeply concerning. There’s been a lot of political instability in the western world in the last year. Aside from the usual reasons why this might make me unhappy, it’s made me unhappy because it creates a less stable political environment in which AGI would happen. I don’t know. I will say, I think we’re better off if AGI is developed in a stable political environment with leaders who are intelligent and have reasonable views. I’d like that to happen. I no longer know whether that means that AGI can happen soon or that it should in a long time. I guess it depends whether the current trends that we’re seeing in the last year continue or if they’re only a blip. If they’re only a blip, then that doesn’t matter. In a few years, we’re back to where we are before. But if we’re on a general trend in bas way direction, then maybe it’s bad to wait to long.

Robert Wiblin: I guess that’s the difficult thing to time.

Dario Amodei: I think it’s pretty complicated. I think pure safety considerations tell us that it’s always good to have more time although at the same time, some of the hardest safety problems to solve maybe problems we can’t solve until the last couple of years, until we build AGI. In that case, delay doesn’t really help us. It just delays the crunch period.

Robert Wiblin: It’s like someone trying to finish an essay by a particular deadline. If they know they’re only going to do it the night before, then it doesn’t much matter when the deadline is.

Dario Amodei: It doesn’t much matter when the deadline is. I think it’s a complicated question. It’s not a variable that I have a lot of control over. It’s happening at the field level. I prefer to try to control variables that I have some control over. One thing I have control over is that it seems like there’s at least some safety work that can be done now and so I’d like to do it. It seems like there are some ways that different AI organizations are not collaborating now that we can encourage them to collaborate. I’ve also been working on that. I think those efforts have been successful and so I feel like it’s been good to cause things to happen that wouldn’t have caused otherwise. Then there are all these other things that I feel like I have no control over whatsoever.

Robert Wiblin: I’ve taken out an awful lot of your time here. I’m sure you have to get back to your research, hit these deadlines. Anything you’d like to say to people who are considering following your example and doing this research before we finish?

Dario Amodei: We’re of course hiring for very talented, machine-learning people who care a lot about AI safety. We welcome applications at OpenAI. We collaborated a lot with the DeepMind safety people. I’m always, as part of this collaborative spirit, I think that’s a really great team as well and people should apply there as well. It’s convenient to have a place that’s in Europe and a place that’s in the US. I think there’s a lot of good work going on at several different places.

Robert Wiblin: I’ll just add 80,000 hours has been doing a whole lot of research into his question of how can we positively shape the development of artificial intelligence and we’re coaching some people to try to help them get jobs at places like OpenAI. If you feel like you’re in a really good position to do that, then fill out the application on our website. We think it’s one of the most high impact roles that someone could take if they’re able to do it, which is one of the reasons why we’ve looked into it so much. Hopefully, over the next few years, we’ll see quite a lot more people going into this field and it wouldn’t be so neglected. But it’s been fantastic to have you on the show, Dario.

Dario Amodei: Yes, thanks for having me.

Robert Wiblin: Hopefully, we can check back in a couple of years and find out what OpenAI has been up to, and hopefully you’ve found lots of new talented people to work in the area.

Dario Amodei: Hopefully.

Robert Wiblin: Fantastic. All right. Thanks so much.

About the show

The 80,000 Hours Podcast features unusually in-depth conversations about the world's most pressing problems and how you can use your career to solve them. We invite guests pursuing a wide range of career paths — from academics and activists to entrepreneurs and policymakers — to analyse the case for and against working on different issues and which approaches are best for solving them.

The 80,000 Hours Podcast is produced and edited by Keiran Harris. Get in touch with feedback or guest suggestions by emailing [email protected].

What should I listen to first?

We've carefully selected 10 episodes we think it could make sense to listen to first, on a separate podcast feed:

Check out 'Effective Altruism: An Introduction'

Subscribe by searching for “80,000 Hours” wherever you get podcasts, or click one of the buttons below:

If you're new, see the podcast homepage for ideas on where to start, or browse our full episode archive.