#78 – Danny Hernandez on forecasting and the drivers of AI progress

By Arden Koehler, Robert Wiblin and Keiran Harris · Published May 22nd, 2020 ·

#78 – Danny Hernandez on forecasting and the drivers of AI progress

By Arden Koehler, Robert Wiblin and Keiran Harris · Published May 22nd, 2020

Companies use about 300,000 times more computation training the best AI systems today than they did in 2012 and algorithmic innovations have also made them 25 times more efficient at the same tasks.

These are the headline results of two recent papers — AI and Compute and AI and Efficiency — from the Foresight Team at OpenAI. In today’s episode I spoke with one of the authors, Danny Hernandez, who joined OpenAI after helping develop better forecasting methods at Twitch and Open Philanthropy.

Danny and I talk about how to understand his team’s results and what they mean (and don’t mean) for how we should think about progress in AI going forward.

Debates around the future of AI can sometimes be pretty abstract and theoretical. Danny hopes that providing rigorous measurements of some of the inputs to AI progress so far can help us better understand what causes that progress, as well as ground debates about the future of AI in a better shared understanding of the field.

If this research sounds appealing, you might be interested in applying to join OpenAI’s Foresight team — they’re currently hiring research engineers.

In the interview, Danny and I (Arden Koehler) also discuss a range of other topics, including:

The question of which experts to believe
Danny’s journey to working at OpenAI
The usefulness of “decision boundaries”
The importance of Moore’s law for people who care about the long-term future
What OpenAI’s Foresight Team’s findings might imply for policy
The question whether progress in the performance of AI systems is linear
The safety teams at OpenAI and who they’re looking to hire
One idea for finding someone to guide your learning
The importance of hardware expertise for making a positive impact

Get this episode by subscribing to our podcast on the world’s most pressing problems and how to solve them: type 80,000 Hours into your podcasting app. Or read the transcript below.

Producer: Keiran Harris.
Audio mastering: Ben Cordell.
Transcriptions: Zakee Ulhaq.

Highlights

The question of which experts to believe

You can think about understanding the different experts as model uncertainty. You don’t know what experts are right in the world. If you could just choose which experts to listen to, as a leader, that would solve all of your problems. If you’re like, “Which experts do I listen to in different times”, you’ve solved your entire problem of leadership. And so evaluating experts is this critical problem. And if you can explain their arguments, then you’ve kind of internalized it and you’ve avoided this failure mode, whereas you could imagine that there were some experts, they made some arguments to you, you couldn’t really explain them back to them, meaning you didn’t really understand them, and so later you’ll have regret because you’ll make a decision that you wouldn’t have made if you actually understood their arguments.

Moore's law

The long-term trend is kind of Moore’s law, whatever happens there. And so that’s kind of what I think about more often and what longtermists should be more interested in. If you’re like a longtermist, then Moore’s law’s like really big over the next 20 to 30 years, whatever happens. Even if it’s exponent goes down some, you really care what the new exponent is, or if there’s no exponent.
And so it could be that when you just zoom out in the history of humanity a hundred years from now, our current thing is an aberration and Moore’s law just goes back to its old speed or speeds up or whatever. But if you think about what’s the most interesting compute trend, it’s definitely Moore’s law and that’s the longtermist most interesting compute trend, and much of what’s happened in compute kind of follows from that. If you’re in the sixties and you know Moore’s law is going to go on for a long time, you’re ready to predict the internet and you’re ready to predict smartphones, and you’re ready to make 30 and 40 year long investments in basic science from just knowing this one kind of fact. And you could still be kind of in that position, if you think you know what’s going to happen with Moore’s law.

The foresight team at OpenAI

The Foresight team tries to understand the science underlying machine learning and macro trends in ML. And you could think of it as trying to inform decision-making around this. This should inform research agendas. It should inform how people think about how it’s informative to policymakers. It’s informative to people who are thinking about working on AI or not. It’s informative to people in industry. But you could think of it as just trying to be really rigorous, which is like another way of thinking about it. Like it’s mostly ex-physicists and physicists just want to understand things.

Hardware expertise

I think that hardware expertise is worth quite a bit. […] So, for instance, the kind of person who I’d be most interested in trying to make good forecasts about Moore’s law and other trends, is somebody who has been building chips for a while or has worked in building chips for awhile. I think there aren’t that many of those people. I haven’t seen somebody from that background that is working in policy yet, but my guess is that they could be very useful at some time and that it’d be reasonable to try starting now with that kind of thing in mind. But that’s pretty speculative. I know less about that than the forecasting type thing. I think hardware forecasting is very interesting.

Getting precise about your beliefs

If you believe AI progress is fast, what would progress look like that would convince you it’s slow? Paint a picture of that five years from now. What does slow progress look like to you? And now you’re like, “Oh yeah, progress is actually slow”. And what could have happened that would convince you that it’s actually fast. But you can make what would update you clear to yourself and others and that for big decisions, this is generally worthwhile.

Articles, books, and other media discussed in the show

Blog posts and papers from OpenAI

Danny’s OpenAI posts
AI and Compute by Dario Amodei & Danny Hernandez, Girish Sastry, Jack Clark, Greg Brockman & Ilya Sutskever (2019)
AI and Efficiency by Danny Hernandez & Tom Brown (2020)
How AI Training Scales by Sam McCandlish, Jared Kaplan & Dario Amodei (2018)
OpenAI Microscope by Ludwig Schubert et al. (2020)
Scaling Laws for Neural Language Models by Jared Kaplan et al. (2020)
Improving Verifiability in AI Development Miles Brundage et al. (2020)
OpenAI LP (2019)

Everything else

How to Measure Anything: Finding the Value of Intangibles in Business by Douglas W. Hubbard (2014)
Superforecasting: The Art and Science of Prediction by Philip Tetlock and Dan Gardner (2016)
The Windfall Clause: Distributing the Benefits of AI for the Common Good from the Future of Humanity Institute (2020)
Is Science Stagnant? by Patrick Collison & Michael Nielsen (2018)
DeepMind’s AlphaFold: Using AI for scientific discovery by Andrew Senior, John Jumper, Demis Hassabis & Pushmeet Kohli (2020)
DeepMind’s WaveNet: A generative model for audio by Aäron van den Oord & Sander Dieleman (2016)
AlexNet
MNIST database
Human-level concept learning through probabilistic program induction by Brenden M. Lake, Ruslan Salakhutdinov & Joshua B. Tenenbaum (2015)
Spectre
Center for Applied Rationality
OpenAI jobs
Become an expert in AI hardware by 80,000 Hours

Transcript

Table of Contents

1 Rob’s intro [00:00:00]
2 The interview begins [00:01:29]
3 Forecasting [00:07:11]
4 Improving the public conversation around AI [00:14:41]
5 Danny’s path to OpenAI [00:24:08]
6 Calibration training [00:27:18]
7 AI and Compute [00:45:22]
8 AI and Efficiency [01:09:22]
9 Safety teams at OpenAI [01:39:03]
10 Careers [01:49:46]
11 AI hardware as a possible path to impact [01:55:57]
12 Triggers for people’s major decisions [02:08:44]
13 Rob’s outro [02:11:09]

Rob’s intro [00:00:00]

Robert Wiblin: Hi listeners, this is the 80,000 Hours Podcast, where each week we have an unusually in-depth conversation about one of the world’s most pressing problems and how you can use your career to solve it. I’m Rob Wiblin, Director of Research at 80,000 Hours.

This episode returns us to focusing on one of the most important trends in the world with Danny Hernandez a research scientist on the Foresight team at OpenAI, one of the world’s top AI research labs.

His work focuses on measuring progress in artificial intelligence and the factors that drive it. This helps us make better predictions about where the field is going, and when and where machine learning will develop important capacities.

If you think, like me, that breakthroughs in the abilities of thinking machine could have a huge influence on society, then forecasting when those breakthroughs will happen, seems very valuable.

Today Danny’s speaking with my colleague Arden Koehler, who’ll be familiar to regular listeners.

Among others she’s been a co-host on two of our best-received episodes of all time: #67 – David Chalmers on the nature and ethics of consciousness and #72 – Toby Ord on the precipice and humanity’s potential futures.

People have emailed us to say they’ve loved Arden’s contributions to the show so far, especially her perceptive follow-up questions, and I’m sure that will be the case for this episode as well.

Just a quick reminder that applications are still open for the Effective Altruism Global X Virtual conference, and will stay open until the start of the conference on the 12th of June.

If you’re interested, go to www.eaglobal.org and click through to the application.

Alright, without further ado, here’s my colleague Arden Koehler interviewing Danny Hernandez.

The interview begins [00:01:29]

Arden Koehler: Today, I’m speaking with Danny Hernandez. Danny is a research scientist on the Foresight team at OpenAI, one of the safety teams. Danny’s work is primarily focused on measuring progress in artificial intelligence and the factors that go into it in order to better inform experts’ understanding and predictions about where the field is going. Before he was at OpenAI, Danny consulted for Open Philanthropy on calibration training and attaching forecasts to grants. Before that, Danny was an early data scientist at Twitch, where he made popular prediction training and transitioned to helping the company better manage planning after reading Phil Tetlock’s book “Superforecasting” and “How to Measure Anything” by Douglas W. Hubbard. Thanks for coming on the podcast, Danny.

Danny Hernandez: Hey Arden, great to be here.

Arden Koehler: Today, I hope to discuss forecasting, both in order to improve decision-making within organizations and in broader public conversations, and two recent papers from OpenAI: “AI and Compute” and “AI and Efficiency”, as well as progress in AI and the safety teams at OpenAI, and working in or researching AI hardware as a possible path to impact. But first, what are you working on right now and why do you think it’s important work?

Danny Hernandez: Yeah, so I’m working on measuring AI progress and I think it’s really important because a lot of people are giving a lot of attention to AI and trying to decide like, “Okay, is this something I should work on? Is this something I should think about? You know, politicians, researchers… And I’d like to ground the conversation in some measurements and evidence, because otherwise it seems like people can talk past each other and it’s quite difficult for someone who’s not an expert to evaluate the claims people are making about how important AI is and how much attention it should be getting.

Arden Koehler: Do you have an example of people talking past each other or a claim that’s opaque without some of this evidence that you’re gathering?

Danny Hernandez: Some people are saying that AI is the most important technology that people are working on, and that’s kind of a claim that people are making and that kind of everybody should be giving it lots of attention. I think that’s kind of this generic claim. And then there’s also a group of people that are trying to ground the conversation in what AI can currently do and don’t like speculation. They think of speculation and/or forecasting really as unscientific; the norms of science are around engaging with existing evidence, not around speculating about the future. And so they’re trying to talk about what AI can currently do, and other people are talking about what AI might be able to do in the future.

Arden Koehler: And those people are usually the ones that are saying that AI is going to be one of the most important technologies to work on, are the ones that are talking about the future applications.

Danny Hernandez: Yeah, that’s right. So I think that’s how they talk and can disagree as they then talk past each other as they’re talking about two different things and problems: what should AI be able to do in the future and what can AI do now? And they’re related, but forecasting has its own kind of strange expertise that’s not that common and requires different norms and a different way of thinking about evidence. It requires thinking about weak evidence a lot. And so it’s quite different than science in this way.

Arden Koehler: So would you say that working on forecasting AI and measuring progress in AI is trying to ground some of the claims that people are going to make about future AI progress in something a bit more tangible that might be a bit more respectable to the people who are interested in talking about what AI is doing now?

Danny Hernandez: I think so, yeah. I think everybody’s interested. Well measurement gets you what AI is doing now, but in a way, that can sometimes be extrapolated. One of the most reliable ways to try to forecast the future is to just have some line and be like, “This line continues” or a curve and be like, “This curve continues”. And to have some kind of intuition for what would stop the curve. That this is a more straightforward way to try to forecast things, and more understandable and generally accepted than something like, “Get a bunch of superforecasters to make a forecast on that thing”. A bunch of superforecasters making a forecast on something that’s compelling to me, but that’s not what people are usually looking for when they’re trying to set their expectations about the future.

Arden Koehler: Yeah. Also, I mean just extrapolating a trend is one thing that can go into a forecast made by superforecasters. I’m sure they would be extremely excited to have that piece of research.

Danny Hernandez: Yeah. And I think measurement of AI progress is a particularly difficult measurement problem. I’ve just thought about lots of different kinds of measurement: qualitative surveys and lots of quantitative measurement at Twitch and yeah, it just feels like one of the fuzzier things I’ve ever tried to measure.

Arden Koehler: Interesting. Yeah. So we’re going to get into a couple of measures of AI progress that you’ve worked on at OpenAI. Do you want to say anything quickly about just why it feels fuzzy to you in comparison with other things?

Danny Hernandez: I guess I’ll just describe what a crisp measurement looks like? You could think about a corporation: it’s trying to maximize profits. It measures its revenue and it measures its costs. You could think of accountants as the first people who are focused on measurements all the time, and they knew all the interesting measures and data at their company and that’s what really, really good measurement looks like. And then imagine that you had that level of measurement for AI progress. We’re really far away from that in terms of having something that is that connected with what progress actually looks like.

Arden Koehler: So then you come up with these proxy measures.

Danny Hernandez: Yeah, a problem can look more like that. Like sometimes you come into something and you’re like, “Okay, there’s just this straightforward measure that is very connected to the thing that I care about and it’s trivial to find and everybody agrees on”. And other times it can be its own research project to find a measure that’s interesting.

Forecasting [00:07:11]

Arden Koehler: Okay. Well we’re going to return to these topics when we get to the papers that OpenAI has put out. But let’s talk about forecasting first. So there are two different kinds of forecasting that you have tried to improve in your work as far as I can tell. So there’s forecasting at the individual level, like improving individual people’s forecasts or even the forecasts of organizations where that means just trying to put a probability on some event happening in some timeframe. And then there’s improving the public conversation around certain kinds of events where that feels a little bit fuzzier to me. And forecasting is one thing that might go into that. Both kinds of forecasting seem sort of extra topical right now because of COVID-19. A lot of people are talking about needing to forecast the progression of the disease in order to do adequate policy work and figure out, for instance, when to open certain kinds of businesses.

Arden Koehler: It also feels like it’s made forecasting more salient at organizations like 80,000 Hours. One thing that we’re sort of struggling with is this question right now of, “Well, does COVID-19 and the economic impacts mean that our advice should change in some way or our strategy should change in some big way”, and like trying to forecast the impact of it so that we can be ahead of that change. So improving those kinds of techniques, I mean, they always feel important but I feel like it’s especially obvious how it’s important right now. So I’m excited that we’re having this conversation.

We’ve talked about some techniques for the first kind of forecasting in our episodes with Phil Tetlock. That’s episodes 15 and 60, which listeners should go check out if they’re interested. So those are things like getting quantitative where that means instead of saying, I think this will probably happen, you think I think there’s a 60% chance that this will happen or a 70% chance that this will happen by a certain time and forecasting in ways that allow you to get lots of feedback so you can figure out how you’re doing and sort of hone your skills. Do you want to add anything or highlight anything as especially important for that sort of forecasting?

Danny Hernandez: Yeah, so I think making good forecasts starts with understanding. As far as things that Tetlock talks about that resonate with me are kind of breaking things down into part and getting quantitative. Something that I think about as more an individual trying to make forecasts in one domain for a long period of time is being able to pass the ideological Turing test of the experts that I’m talking to. So what I want is to understand their viewpoint, to be able to summarize it, and have them be like, “Yeah, that’s basically right”, and to keep going.

Arden Koehler: I think when I’ve heard of the ideological Turing test, it’s been in the context of how do you know that you don’t have a totally crazy view of your ideological opponents view like a total straw man. Well, if you can explain it to them in a way that they would say, “Yeah, that’s somebody who believes the same thing I do”, then you’ve passed the Turing test.

Danny Hernandez: Yeah. That’s kind of what it feels like where, I think with scientific views, people can have that same sort of thing where the belief is very important to them. They have a different feeling about you if they’re like, “Okay, this person gets it”. Even if you don’t totally agree, they’re just like, “Okay, this person understands it enough to put me at ease” I think is kind of what’s happening and why I think there still is something that feels a little ideological about it. Being able to explain to somebody something in their terms.

Arden Koehler: So this is important for forecasting because you want to be able to basically represent lots of different people’s positions in a way that they would agree with. Because that just shows that you understand their positions relatively well.

Danny Hernandez: Well, you can think about understanding the different experts as model uncertainty. You don’t know what experts are right in the world. If you could just choose which experts to listen to, as a leader, that would solve all of your problems. If you’re like, “Which experts do I listen to in different times”, you’ve solved your entire problem of leadership. And so evaluating experts is this critical problem. And if you can explain their arguments, then you’ve kind of internalized it and you’ve avoided this failure mode, whereas you could imagine that there were some experts, they made some arguments to you, you couldn’t really explain them back to them, meaning you didn’t really understand them, and so later you’ll have regret because you’ll make a decision that you wouldn’t have made if you actually understood their arguments. So you could try to avoid all of those: this regret minimization. If you want understanding, it’s kind of this path towards regret minimization. We could tie it back back to COVID where you’re just trying to figure out “Which epidemiologists do I want to defer to here”?

Arden Koehler: Yeah. There’s certainly been lots of disagreement among experts about what we should do.

Danny Hernandez: Yeah. And so I can imagine in some distant future… at least some of these experts have built up a track record. One expert could just kind of stand out and look kind of like Nate Silver in elections, where they had just predicted the trajectories in many cities for a long time, had made lots of predictions, had the best write-ups of lots of stuff additionally. Maybe they’re listened to by now. Maybe they weren’t listened to initially, but they just have this track record that looks very different to them than other people’s and is much easier to understand and so then they would have made themselves an easy expert for some people to be drawn to.

Arden Koehler: Okay. So there’s this problem of which experts we should listen to and obviously that’ll help a lot with forecasting because they are going to be making forecasts. Is that the main way it helps with forecasting or is it just to help you have an understanding of the problem area so you can then make forecasts yourself?

Danny Hernandez: Yeah, I mean I think it’s helping you with lots of things. I think both effects are important. Okay. Say you have a difficult problem and you’re not really sure. Like in COVID, you had just lots of model uncertainty and one way of representing that would be like different experts could be right about the world. And so a good way to make an overall forecast is to put some weight on each expert’s view being right approximately. And to do that you want to understand that view so you can evaluate it yourself. At least someone like that. That would be ideal, if the overall forecast is yours. If you have a long time, you could just measure everybody over a long horizon and then figure out who’s right. But as it is, if you’re the decision maker or you’re helping the decision maker, then I think yeah, you could measure their track record.

Danny Hernandez: But you don’t have that long because usually they’re urgent decisions that you care about. And so that kind of goes into another thing into making forecasts in a domain, which is, I think you want to get to at least a new higher level in a domain that you want to make forecasts in. For me that’s been product and engineering and AI research. This also just gets you a lot of understanding and it makes it worth the person’s time. The best experts are hard to talk to. They’re busy and so if you want to produce good forecasts, you need this valuable resource ‘expert time’ and it needs to be worthwhile for you to engage. That’s one way. Having some expertise in their domain helps. You get to ask them good questions. When you ask good questions, people recognize it and they’re like, “Okay, that’s a good question”. They just wanted to think about it anyway and they’re pretty happy and so that’s like the rare raw input that I think about often is experts, and I think it’s particularly the case in AI that it’s quite difficult to get much expert time. And right now with epidemiologists, it’s very hard to talk to epidemiologists right now, and so you really have to make it very clear to them it’s worth their time for them to want to engage.

Improving the public conversation around AI [00:14:41]

Arden Koehler: Do you want to just talk about the various mechanisms by which you think the work at OpenAI can help inform and improve the public conversation around AI?

Danny Hernandez: Maybe I could describe some ways the communication is notable or something. Like there’s some aspects of it that seem intentional. One is that there are blog posts that are targeted towards just a broad audience that would include any kind of decision-maker. Like all decision-makers in the world, they are part of this audience. So that could be CEOs, that could be politicians, that could be political analysts, other researchers and actually, the blog post is often what we as researchers want first also. It takes a while to in depth read a paper and before I decide if I read a paper, I would rather read the blog post. So it’s really just targeted at everybody. And I think another thing is that once you have this, you can take it to the people that you wanted to talk to and wanted to see it.

Danny Hernandez: So Greg Brockman and Jack Clark have both testified in front of congressional committees on AI. Once you’ve created something that should be part of the conversation, then you can go all the way to taking it to the last mile to the people that you wanted to see it. I think those are two notable ways. And then amongst researchers, I think it’s just a more productive debate to look at a piece of evidence. So I think that’s the main point. But you could also think of it as you could try to be constructing… So you have five or 10 things that are two to four pages long that you would just be like, “Okay, somebody who knows literally nothing about AI, what should they read”? You could think of these as beginner materials, which I think… 80K actually puts quite a bit of effort into this kind of thing.

Arden Koehler: Having accessible materials?

Danny Hernandez: Yeah, I think accessible materials are high impact and maybe slightly less exciting than novel work to people. So you have to be motivated by impact, I think, to want to make accessible materials.

Arden Koehler: Yeah. Listeners should definitely go and check out the blog posts for the papers we’re going to discuss. You definitely do get the main point in the first two sentences and the graph. That’s great. So, how do you figure out what research questions are most valuable to work on?

Danny Hernandez: There’s lots of inputs. But one input is forecasting related. At some high level, it might be a Peter Thiel type thing: what are the most interesting things that I think are true that people disagree with? What are the most important such things? I might start with those. What are just the most important problems in the world? What are the most important problems in my field? What are the things in there that are the most uncertain where there’s the most information that’s the most tractable to get? And then, where’s my fit? So it’s all pretty abstract, but I think there’s other things where I’ll assume that there’s one big true thing and that I need a simple model of the world and I need to just double down really hard on whatever seems most important and is working. I think that’s another way that I’ll try and do stuff.

Arden Koehler: And then you try to test that big true thing?

Danny Hernandez: Try to test it and try not to get bored.

Arden Koehler: This seems like there’s often a tension between questions where their answers are really interesting and are really gonna push the field forward and questions that are really well-defined and tractable. So, for instance, in the first camp you might think “If I could have the answer to any question that’s going to help me understand the progress of AI”, like you might just say, “Well, what will transformative AI do”? These kinds of things. I mean it’s obviously a totally intractable question. It’s ill-defined, and then the tractable questions and the more well-defined questions are sometimes farther away from what you ultimately want to know. Like “How much money was invested in AI companies between 2010 and 2020” or something like that. And I’m just wondering if you have any techniques for finding the sweet spot of things that are the most informative but also still tractable?

Danny Hernandez: In the same way I described choosing experts as almost the entire problem of leadership. I feel like long-term planning is a similarly difficult question or like your research agenda, and that figuring out what should be on your research agenda, there’s lots of generically good advice like “Talk to people that will improve your research agenda”. You know, “Think about how important things are and how tractable things are”. I think it’s hard to add something novel to the question of how to pick your research agenda. And I think for me, I think the novel thing I might be able to add is that I try to make the gnarly forecast of the thing I actually care about that seems totally intractable. I’m like “What do I believe currently”? And then I think about where does that come from and what does that make me want to look into? And what it makes me want to look into are the things that are the most tractable that would be the most informative.

Arden Koehler: Okay. But it’s like you figured those out by first trying to tackle the really gnarly question. Even if you don’t end up being able to answer it. So I’m curious whether you have any tips in particular for thinking well and forecasting well very unlikely events that are super impactful. Listeners will know we’re especially interested in those sorts of events at 80,000 Hours and it seems like even things that aren’t existential risks but are still very terrible, like COVID-19 show that we don’t quite know how to think about these as a society. It seems like we got burned by not being able to really appreciate how bad something could be, even if it was pretty unlikely. Do you have any specific tips about how to think about these tail risks?

Danny Hernandez: You could try to make better institutions. You could try to make a more numerate public. When I think about numeracy here, I think about understanding probabilities. Thinking they exist in the world and are real and are things you can say and believe. And understanding exponentials. And there’s not that much else in my list of “Be numerate” besides that. But those aren’t things that are particularly emphasized. We have like 12 years to make people numerate and we don’t really try that hard to make them understand either of these two things.

Arden Koehler: Yeah, I guess the thing that seems especially challenging about forecasting or trying to figure out, “Okay how probable is some very improbable event? Is it like there’s a 1% chance, or a 1.5% chance”: it really matters if the thing is a really big deal, but it’s just really hard to get very fine-grained in our forecast and also often, we’re talking about events that have no precedent so we can’t do reference class forecasting as easily, and I’m just curious if there’s any techniques that you’ve come across that might help us get better at trying to get accurate beliefs about those kinds of events?

Danny Hernandez: I think there is something that’s often extremely helpful and neglected, which is to try and find a decision boundary. The conversation was how likely is this thing? It was not at all like how likely would this thing need to be for us to take different actions? And if it makes sense to take a bunch of actions, if there’s a 1% chance of this thing, then everybody can agree that you do that. If there’s a bunch of other actions that you can agree on 5% and 20%, it’s like you have this very complicated decision with a lot of uncertainty.

Danny Hernandez: The other thing, right, like what would we have to believe? We have perfect information about that. We know what we would have to believe, or like we get to decide. It’s like the decision-maker gets to decide what they would need to believe in order to do that. They just have to introspect.

Arden Koehler: So that would be something like “If there’s this chance that this many people will die in a global pandemic, then we should institute this policy” and we get to decide what that threshold is. I guess it feels like it’s not always clear that we get to decide that because you might think, “Well, what is the impact of that many people dying”? There’s lots of further questions. How costly is it to implement the policy that you know you’re putting a threshold on? You might not know the answer to that question, so it feels like there’s still other unknowns that might go into the setting of the threshold.

Danny Hernandez: Yeah, there’s still questions and I haven’t thought about this decision boundary thing as much in COVID. I think if we go back to AI, you could do something like “What chance would I have to put on a transformative science capability”? Like when I think about transformative science, I think that a lot of science comes out of great scientists like Einstein, Turing, and what if at some point AI was making it so that it was like there were more such scientists? At one level, it could actually just be capable of being that scientist entirely on its own, but it could also just be making such scientists. So given that kind of setup, it’s like, what chance would you need to believe to be interested in AI or to want to work on AI?

Danny Hernandez: Is that a 1% chance in 10 years? Is that like a 10% chance in 10 years? Like what is the threshold? And people have very different horizons that they’re interested in and probabilities that are meaningful to them and they can actually make a lot of progress on that part of the question in terms of thinking about whether or not they want to work on AI like quite quickly. And then it is like their intuition again. Like when do they feel motivated and excited, or like philosophically excited or something, right? Like they get to decide this threshold.

Danny’s path to OpenAI [00:24:08]

Arden Koehler: Let’s talk a little bit about just your path to where you are right now. So you’re on the Foresight team at OpenAI. Before that you were at Open Philanthropy. Before that you were at Twitch. Can you just talk us through that transition?

Danny Hernandez: Yeah, so I can start off with Twitch. At Twitch I started off as an analyst, just understanding what people did and why. That was the question that really motivated me and made me want to work on the internet.

Arden Koehler: Twitch’s users?

Danny Hernandez: Yeah, Twitch’s users. Twitch is where people watch each other play video games on the internet and it’s bigger than most people think it is who don’t know what it is. Okay. So yeah, I was there. I did a bunch of different things. I led the mobile team for a little while. I did project management. I did engineering management. I learned to be an engineer. I didn’t show up knowing that much software engineering which was kind of strange. Then I became a data scientist, the kind of data scientist that produces evidence. I would look at the database, I’d be like, “Okay, all the users, they show up on our website, they do a bunch of stuff, we track all of these things”. I would turn that into evidence as to “What should we do”? Who are these people, what are their problems? Where are they getting stuck? What things are succeeding, how much are they succeeding?

Arden Koehler: I see. So you were kind of working on strategy questions which are sort of related.

Danny Hernandez: Yeah. So that’s like one kind of data scientist. Yeah, you could think of it like analysts just negotiated for a better title when there were better tools that made them higher powered. And that’s like one side of data science. And sometimes I did some machine learning things, but a lot of the time, the thing to do was something very simple and then I read “How to Measure Anything” and “Superforecasting” and I went to a CFAR workshop and kind of all of these things got me.

Arden Koehler: That’s the Center for Applied Rationality?

Danny Hernandez: Yeah. And so when I read “How to Measure Anything” and “Superforecasting”, I realized that these were the questions that I wanted to ask the product managers and other people at Twitch. I wanted to know what they expected to happen. This was the thing I most wanted to ask them and understand. And I was like, this book gave me permission to ask the question that I wanted, which was quantitatively “What do you expect to happen? What probability would you put on the thing you just said would happen”?

Arden Koehler: At Twitch?

Danny Hernandez: Yeah, at Twitch. This feature would increase revenue by 10%: how likely do you think that actually is? And like, let’s see what happens. That was a question I wanted to ask them, and it didn’t feel normal, but it’s how I thought of the world. And so then I just started pairing with people on trying to help them make forecasts that were useful to them. And I got really excited about it because I guess the way I think about it is my old meetings, I would have meetings with people and I would often feel like I didn’t succeed in what I wanted to happen or something like that. Only 50% of these meetings were good. And then when I started having these meetings around forecasting where I tried to help them, they seemed very happy at the end. I thought the meeting was super interesting and now I was like, “Okay, 90% of my meetings are good and now I can have as many good meetings as I want and I have to figure out some better way to scale this thing”. Like there’s something here. That’s kind of how I started.

Calibration training [00:27:18]

Arden Koehler: So these are people who are coming out of these with personal forecasts about various things happening at Twitch?

Danny Hernandez: Yeah. Then I made this training… In “How to Measure Anything”, he has this calibration training curve and so I was like, “Okay, I’ll try and make this training”. And in a day—

Arden Koehler: Sorry, we’ve talked about this with Professor Tetlock, but just for people who haven’t listened to that episode, do you want to just say what calibration training is?

Danny Hernandez: Yeah. So that thing I just described before where somebody launches a product and then they see what happens, they could have made a prediction. You could imagine that all being a very tight loop where somebody makes a prediction about something that is already known in the world and then they get the answer immediately back. And so they just keep doing this. And so if you do this over two or three hours, most people become calibrated so that their probabilities are reliable. Things they think have an 80% chance of happening happen approximately 80% of the time. And so I was excited about this and I made this training where as a data scientist, I just collected all the most interesting numbers at Twitch for the entire history of Twitch.

Danny Hernandez: I just had them all in my head and I could’ve written them up in a document or put them in a presentation, but both of those would have been kind of uninteresting to people. They would have bored people. Like they’d already seen these numbers before in past analysis. But I had them try to predict it. I had them try to predict the numbers I thought were most impactful in describing Twitch as a business, and they were super interested in this thing. I made this thing in a day because I just had all these things cached. I had this unique knowledge. And I sent it out and maybe there’s like a thousand people at Twitch and maybe a hundred of them do it in a day. And yeah, I had one question at the end where I was like, “Would you recommend this to a colleague”? Kind of like “Would you recommend this to a friend”, but the business version. And 97% of people said “Yes”. And in that moment I was like… I’d read all these books about minimum viable products and product-market fit. And I was like, “Okay, this calibration training has this product-market fit” where it’s like you’ve made it so that its questions the person was interested in instead of it being trivia.

Arden Koehler: Yeah. So how useful do you think this calibration training was for making people better forecasters?

Danny Hernandez: Well, I think it took them from totally uninterested in forecasting to able to make forecasts like somewhat comfortable. Some of them were good at it. It made it so that the ones that were good at it and/or interested in it, like the other people would listen to them or accept that as a reasonable form of communication. One way of thinking about it is I think it was more useful for almost all of these people than talking to me for two hours.

Arden Koehler: That does seem much more efficient. Do you think it makes any difference that these weren’t properly forecasts, right? These were guesses at things that were already the case, things in the past, as opposed to forecasts, of course, where we’re talking about guessing at what’s going to happen in the future. Do you think that makes any difference or is that just cosmetic?

Danny Hernandez: I think the thing that matters more is that it’s in the domain you care about. So I wanted people to be able to make these kinds of forecasts. I also just want people to remember these numbers and be able to cite them without looking at them as a range, even. Stuff where somebody’s like “How much did we grow in the last year”? And to be able to give a range, like “It was between 40 and 42%” and to have them be right and not have to look it up so the conversation could keep going.

Arden Koehler: That’s a pretty narrow range.

Danny Hernandez: Yeah, that’s the kind of range I could have given, I think, when I was there. Like I would have known how much we grew by that much. I would have looked at it. And then we wouldn’t have to look it up and we could have kept going.

Danny Hernandez: But I think most of the time it was like if you are forecasting on something that you just don’t know about, like some of the things where product launches that happened before they joined that they never heard about, and so I think that there are beliefs that are just hidden from you and the world, like information that’s sufficiently hidden that you could predict on in most domains and would be just as good, in terms of training you, as if you were making future predictions. It just would take a lot less time and it’d be a lot of work to connect the scenarios. Like you can imagine, I don’t know… Have you seen these Harvard business case studies?

Arden Koehler: I’ve heard of such things.

Danny Hernandez: It’s like a pretty big thing in the business world. I think if those were good, they would look kind of like this. It would be like something in the past, and you would guess what happened. They wouldn’t tell you what happened. You’d just guess what happened in a bunch of ways that were interesting, and I think that would be something really good because it’s just old stuff that none of those people are usually not familiar with. But I think you can predict on the past stuff and learn.

Arden Koehler: So how did this turn into consulting for Open Philanthropy?

Danny Hernandez: Yeah, this is kind of a standard story of “Talk to people who might be able to help, and talk to people that are interested in the thing you’re interested in”.

Arden Koehler: But this calibration training tool that you made was a helpful experience for being able to do what you did at Open Philanthropy.

Danny Hernandez: Yeah, so they wanted people to be calibrated there. They wanted to make a calibration training for their staff. And we just talked about what that should look like and how it would be good. And then I helped some of their program officers make some of their early and first forecasts to get a feel for it and see that it was a useful exercise to them. Like what it could be like to be useful exercise. And then at some point they were particularly interested in forecasting AI and they’re like, for somebody like me, they thought that was a particularly impactful way to go. And I thought it was just really interesting. I was always motivated by AI. I remember reading a lot of sci-fi and always thought that AI would eventually be really impactful. And I remember reading arguments around how AI would be impactful and just being like, “Yeah, I agree with these”. I just didn’t have any way to act on it before.

Arden Koehler: So you ended up transitioning to working on informing forecasts about AI in particular. Did that happen before you started working at OpenAI?

Danny Hernandez: Yeah. I started kind of thinking about forecasting AI progress while working with Open Phil and then transitioned to doing it at OpenAI. Earlier I was talking about this rare resource of you’re trying to forecast something, you want to talk to experts, and the better the experts, the more access you have to them the better. It’s like maybe one of your most important resources. So I think OpenAI or an AI research lab generally is just the best place to do this kind of thing.

Arden Koehler: Cool. So how important do you think it is for lots of people at an organization to be able to make good predictions in order for that organization to be successful? Should it just be the leadership when they’re making their strategy or is it actually pretty helpful for everybody to be able to make these kinds of predictions?

Danny Hernandez: I think lots of organizations can be very successful without making any predictions, to be clear. I think there are returns to everybody being able to do it, but it’s a numeracy rigor cultural thing. In a numerate, rigorous culture, I think you want everybody to be able to do it. I think what you’re doing is you’re training everybody to be a bit of an executive. I’m going to talk more abstractly about future organizations that could exist. So you could imagine an organization in which… I’ll talk about a startup because I understand such organizations a little bit better, where all of the people that were product managers are kind of like the mini-executive generally, but like there’s other people like engineers and designers that are all working on a project. I kind of want them all to make predictions because it could be that some of them have really good executive judgment as to what will succeed and fail, but they just have this other expertise and that’s what they double down on because they can clearly succeed in that domain or not.

Danny Hernandez: And this executive domain takes a while to observe. It’s kind of stressful and it often requires managing people and all this other stuff rather than just figuring out what will work and what won’t work. Like this judgment around that. And what I’d want is just measurements of how good everybody’s judgment is on everything. That’s really important. And so that you can talk to those people and listen to those people on those questions. And so then I’d want it everywhere. And you could imagine a company that’s less political because it has a better measurement of people’s judgment. And so yeah, the more measurable success is, the less political things become. Like the more obvious it is that somebody is succeeding.

Arden Koehler: I see. So where political here means something like there’s some sort of power game going on?

Danny Hernandez: You make allies. You don’t actually believe that the thing’s important. You’re just trading favors and stuff and you could think of this as like some overhead tax on a company. That like some companies, you know, like politics or something. It might be like 90% of the effort is going into political things possibly or 80% or 70%. I don’t know, but it’s like a high fraction versus a startup company with two people. They’re just trying to succeed. There isn’t a game to play. They just have split things up or however they’re going to split up, and now it’s really just time to do stuff. And this overhead kind of goes up. Like the bigger you are, the less clear it is what’s working.

Arden Koehler: Is the second thing because you can convince people that you are the expert even if you’re not, because there’s no clear measure of whether you are or not, and so then you have to convince them through other means like being very persuasive or making allies?

Danny Hernandez: Yeah.

Arden Koehler: Interesting. So one thing that came to mind when you’ve been talking about the importance of forecasting in organizations including businesses, is that I’m sort of surprised that people haven’t been implementing these techniques for a long time. So this is not a new technology that you know was just invented and that explains why people haven’t been doing it. These concepts seem relatively old and learnable by lots of people and it seems like if it really does make it more efficient and more possible to function really well as an organization, why do you think organizations haven’t been adopting these sorts of techniques for a long time?

Danny Hernandez: Yeah, I think that nobody has made the user experience good. You’re a business. The business is the user. The business is thinking about whether or not to adopt these forecasting techniques. What’s that whole experience like? Is it obviously good? Is it obviously good the whole time? Does it have quick returns? Is it easy for someone at this business to implement this thing? Have people think of that as a success and like have the thing grow or not–

Arden Koehler: Why hasn’t that happened?

Danny Hernandez: Well one I think just like not very much effort has gone into forecasting at all, total. Tetlock’s work is great. There could be a lot more people. I think this domain looks very promising and seems very important to the world. Not that many people are in it. It has long horizons. It takes a while to produce work that’s meaningful. I also think that the place to do it is with new businesses. The place to make it part of their culture. So, I mean also I think I thought about trying to do it and it looked like a long, slow thing.

Arden Koehler: It helps businesses?

Danny Hernandez: Yeah.

Arden Koehler: But you ended up doing some of this for Open Philanthropy, right?

Danny Hernandez: Yeah. So I guess my overall metric was I wanted to influence decision-making positively at the highest level. It’s kind of how I’ve thought about working on improving forecasting generally to AI. And when I looked at AI forecasting, it was just obviously working and people were very interested and I was like, “Okay, I think this is just a lot more tractable and more impactful”. I thought it was like both. And so yeah, when I look at how hard I think it is to make civilization as a whole have more foresight broadly, then it’s a big hard task that I think somebody would have to have a lot of different domain knowledge to have. I think that the way that academics have approached it is they’re kind of doing what I’m doing now, which is they’ll show up and they’ll help people make forecasts while they’re in the room.

Danny Hernandez: And then you could think of the superforecasters as like, “Okay, now I’ve given you a business interface, give me $10,000 and I’ll give you forecasts in a domain that you’re interested in”. I was like “That’s more scalable”. They don’t have to be in the room anymore and that can keep growing. So I think that model kind of makes sense and that people might use more of them in the future. But this other thing of just “Make it just obviously valuable to be rigorous and quantitative in lots of cultures in this way”, I think that that would take something more like calibration training that was targeted at startups that led to some startups to grow into the next Airbnbs and Googles. Then they have this as part of their culture and then like Fortune 500 people start copying them or Google starts copying them, but they run with it and have shown that it’s good, and they’re getting real advantages out of it. I think it’s more likely to happen through this bespoke new organization thing because you’re going to try and show that some executives aren’t competent.

Arden Koehler: Yeah, so the reason I’m pursuing this line of questioning is basically this seems just obviously good to me. Like we’ll have better beliefs about what’s going to happen that makes you better at acting in the world. And I’m trying to think of reasons why that might not be true. And the thing that seems the most obvious to me is like, well, if it was true, then people would have done it more. So I think in analogy with management training, right? So there’s tons of literature and tons of academic work on what makes a good management structure and supposedly this is because some of this allows firms to perform much better. Even though it involves having to go in and change some stuff. This seems like it should be like that. Forecasting seems like in a similar genre and it seems like it should be similarly popular but it’s not, which makes me wonder, “Okay, what’s going on”? Maybe it’s not as helpful to put these exact probabilities on things as I would have thought?

Danny Hernandez: I think part of what I meant by the UX not being good is I agree that lots of people come at forecasting and they see that. They’re like, “Yeah, this is how decisions should work”. And other people come but more are like, I don’t know, 90% of people, you explain it to them and why it should be a good idea, and they just are inherently skeptical. Probabilities don’t seem real to them. They think it’s going to be like a lot of work and a lot of rigor. Maybe they’re scared. But their initial stance, like calibration training that’s on trivia, that’s really only appealing to people that already just kind of believe or intuitively think that this whole line of forecasting is a good idea. And other people, they show up and their initial reaction is negative and it never gets overcome.

Danny Hernandez: I think calibration training did overcome it at Twitch for people. It was calibration training about a thing that they were interested in and that flipped the switch towards like, okay, this makes sense and they want to communicate in this way sometimes. But I think for the average person, the user experience of starting forecasting is pretty bad. You make some forecasts. You have no idea if you did a good job. You find out in a long time. You’re worried you’re going to be wrong. You’re really uncomfortable. It’s a pretty bad experience.

Arden Koehler: So I’ve done the calibration training that Open Phil released, and it did like a little bit just make me feel like an idiot, so I can see why maybe some people are not super excited about doing it just because of probabilities being sort of way off the mark and being sort of shocked by how far off the mark they were. So maybe that’s a really common experience.

Danny Hernandez: Yeah, I think calibration training is a better experience than just trying to start forecasting on your projects. Like it could easily take you two years to get that same level of signal of “I’m just overconfident all the time” on all the things you really care about. And at some point, you’re just going to stop because that sounds pretty unpleasant to just always be wrong and to not see how and it to be that slow.

Arden Koehler: So before we leave this topic, what do you think you need more? More research on new techniques to make us better at forecasting, or even like other aspects of decision-making, or just more implementation of what we already know is relatively good?

Danny Hernandez:
I think I’m most excited about people trying to leverage calibration training generally because that’s the thing with the fast feedback loop. I think we have a solid grounding of research but that we could use more entrepreneurs in that space that have this goal of improving civilizations’ foresight. I think that’s the main reason you would pursue this. Not because you like to think it’s a good way to make the most money. I do think there is research to be done. The research question that I would have encouraged people to do, is to see to what degree calibration training generalizes. You could imagine a setup where you had people take calibration training on like 10 different topics or something, and in different orders, and try to figure out what percentage of people are generally calibrated and what does being generally calibrated look like? If I’m calibrated in three domains, does that mean I’m just almost certainly calibrated in all domains, even ones I haven’t seen before? Because almost all of the studies are on trivia.

Arden Koehler: Yeah. I guess this sort of relates to the question I asked earlier about does it matter that the things that people are being trained on are things that already happened? You could imagine that it’s much easier to be calibrated on some kinds of things. Perhaps things that have already happened than things that will happen in the future. I mean, I’m not saying that’s necessarily true, but this is a question of generalization, and you want to see more research on that.

Danny Hernandez: Yeah, I’d like to see more research on that. I also think it would be nice to just be certified as generally calibrated.

AI and Compute [00:45:22]

Arden Koehler: So let’s move on to talking about these two papers that have come out of the foresight team at OpenAI in the last year. So AI and Compute and AI and Efficiency. Let’s start with AI and Compute. Can you just tell us exactly what was measured and what was found?

Danny Hernandez: Yeah, so AI and Compute, what was measured was the amount of computation in terms of floating point operations that have gone into training the largest neural networks, where floating point operations are like additions and multiplications and subtractions. And what was shown is that between 2012 and 2018, this went up by 10X per year, 300,000X total. And AI and Compute was the first thing I worked on at OpenAI, and it was joint work with Dario Amodei.

Arden Koehler: So there was also an addendum released that studied compute before 2012, and I feel like that’s really useful for getting a sense of what’s really going on. So can you describe that?

Danny Hernandez: Yeah. So if you look at this same kind of question, how much compute went into learning and training systems from the sixties to 2012, then now this compute has just been approximately following Moore’s law over this long period of time. And so you’re going from 2X every two years to 10X per year. So it’s a nice graph because there’s just a clear kink and it’s like, “Okay, whatever is happening in this domain is different than what’s happening before”.

Arden Koehler: Yeah. So do you have a sense of what did happen in 2012 that caused the big change in the growth rate?

Danny Hernandez: I would credit it to AlexNet, which is this machine learning result where before this, you had handcrafted heuristics assembled by experts to recognize images. And this simpler neural net system that’s a lot simpler to create, took a lot fewer people than most of these other systems, just beat those old systems by a huge amount. Maybe it got 79% accuracy, where old systems probably got 10% less or 15% less accuracy on the same system. And so it just was the first thing in which neural networks were kind of the state of the art system. In the nineties neural networks were like… Well, I’m not sure about how good handwritten digit systems were before, but in the nineties, there were neural nets that could read handwriting or could read numbers and transcribe things. And that’s kind of maybe the most interesting thing that happened before.

Arden Koehler: So we have this really interesting model that is just much more successful using neural networks. And then is the compute growth just a matter of people being like, “Well, this is now worth it for me to invest a lot of resources in order to be able to do a lot more computations because this seems to be actually getting returns and performance”, whereas it didn’t as much before?

Danny Hernandez: Yeah. So it’s like the systems that learned before, they didn’t get returns from. They never had capabilities that made people want to keep ramping up investment. It’s kind of like they never hit product-market fit or something. It was never clear how you were going to be able to leverage them to do something interesting. Just from these vision systems, I think it pretty clearly was true that you were going to start to be able to do economically interesting things with neural nets. As soon as you saw AlexNet, you would know that there would be interesting economic applications.

Arden Koehler: So you say in that blog post that the three major drivers to AI progress are the amount of compute available, algorithmic or conceptual innovation, and the amount of data available. Why do you think people started ramping up the compute in particular around 2012 after this proof of concept?

Danny Hernandez: Well, I think the compute was faster to ramp up than people. Though people have ramped up. But it takes a while to get a PhD, and that’s like some portion of the field, that branch of people who’ve got PhDs and that takes a while.

Arden Koehler: These are the people driving algorithmic innovation, which are researchers, and you just can’t ramp that up very quickly.

Danny Hernandez: Yeah. So I mean more people are going into those programs than before. So it got ramped up somewhat, but there was just a lot of compute in the world doing lots of other stuff. So it’s like capital moves around maybe faster than people.

Arden Koehler: Yeah, sorry, what does that mean? There’s a lot of compute in the world. Like it feels a little abstract to me.

Danny Hernandez: There was a huge amount of computation in the world that wasn’t doing AI stuff that could be moved over quickly.

Arden Koehler: So this is just available computers and chips?

Danny Hernandez: In Google’s Cloud, in Amazon’s Cloud, computers that are on people’s desks that were not doing AI research before that were gaming machines. This is my AI research machine now. Yeah. So there’s just a lot of GPUs that you could buy to do AI research that before were just doing other stuff.

Arden Koehler: So in terms of what this result means, do you have a view about whether compute, just sheer amount of compute is more important or less important than these other factors, or more important for certain kinds of phases of advancement or anything like that?

Danny Hernandez: Well, at least from this work, I’d say the AI compute thing didn’t make it clear whether or not compute was more important than other stuff. It just made it clear that it was worth paying some attention to and was very measurable. And I think it makes it so that you can look at a model and try and understand why is this thing better, and get a lot of extra context. So sometimes something is better because I have more data or because it has better algorithms. But other times it seemed like it was mostly just better because it was scaled up for something. And so if you don’t understand the amount of compute that went into the system, then it’s hard to evaluate which of these three things was most of making it better, or was it some combination, or to what degree do you want to attribute to each? So to go back to an earlier question when you’re like, “How do you decide what to do research on”? I think this sentence was part of how I was trying to think about my research and I was like, “Okay, that’s a very interesting question. How important are each of these three things and one of them now is well measured, so all the uncertainty is in the other two. What are the best measurements of the other two to try and think about this”?

Arden Koehler: I see. So maybe we can’t draw a lot of conclusions from this, but we now have this measured, so once we can measure the others, maybe we can actually say something about what’s the combination of these factors in the best performing systems out there?

Danny Hernandez: Yeah.

Arden Koehler: Interesting. So it also says in the blog post that you see multiple reasons to believe that the trend in the graph is this exponential increase in compute used in training the most high performing systems is going to continue. And you cite the fact that there’s a bunch of hardware startups that are developing AI specific chips and that they’re promising to have much more efficient chips that you can do a lot more compute with a lot less money. Is this just a general observation about computing that like, “Look, we’re going to be able to have more of this resource available”, or is this something more specific to AI? Like we’re expecting to see the investment of compute in AI grow to a greater degree than for everything else?

Danny Hernandez: Well, maybe the caveat; I pay attention to hardware. I’m not a hardware expert. Hardware experts make chips. Somebody claims they’re a hardware expert and they haven’t made lots of chips, then I’d be skeptical.

Arden Koehler: Okay. I’ll keep that in mind.

Danny Hernandez: I think maybe what I can say is Moore’s law made it so that most specialized chips were uninteresting for a long time. Like if you try to make a specialized chip, it’s like 2X better as 4X better, but it costs a lot of money and it’s beaten by general purpose processors like two to four years later. So it was never quite expensive to make chips. And so GPUs were kind of the second thing that got that. Whereas chips that were made at large scale and that were for video games and for some other applications, but video games seem particularly important–

Arden Koehler: Hence the graphics processing unit.

Danny Hernandez: Yeah. And their computations are a little bit less general than CPUs. It’s like more parallel, right? It’s like they just kind of could rely on the fact that there were always a lot of pixels on your screen and a lot of parallel operations to do.

Danny Hernandez: And that one way in which AI is different is that there are very few kinds of chips that people are trying to make. And one of them is AI chips and so it’s possible that this will lead to a meaningful gain in efficiency. That you have all these people and they see this as like one of the big things they should try to do, is to try to make AI chips. There’s lots of startups and so there’s just some chance that somebody will succeed. There’s lots of startups and the large companies are also quite interested in anything they can do in the AI space in hardware.

Arden Koehler: So maybe the answer to this question will be too technical to be of interest/understandable to me/our audience, but what makes an AI chip an AI chip?

Danny Hernandez: I mean at the abstract level it should be that it’s designed for what ML researchers are currently doing, and doing in the future. And so it’s this kind of hard target, but it should also be quite flexible because it should be able to do the things that they’re interested in, in the future. I often think that the most interesting hardware question to me is what will happen with Moore’s law rather than the AI hardware chips. Because the AI hardware chips you could think of as mostly a trend on top of that. That they’re some offset. And that could be a large offset, but the long-term trend is kind of Moore’s law, whatever happens there. And so that’s kind of what I think about more often and what longtermists should be more interested in. If you’re like a longtermist, then Moore’s law’s like really big over the next 20 to 30 years, whatever happens. Even if it’s exponent goes down some, you really care what the new exponent is, or if there’s no exponent.

Arden Koehler: Can you explain what Moore’s law is and what it means to say what’s going to happen to it?

Danny Hernandez: Yeah. So there’s a lot of news articles that say Moore’s law is dead. You would see a lot of those. I guess there’s been a lot of those for a long time and so it’s kind of hard to evaluate what’s going on with it. But if you look at CPUs and how much more efficient they’ve been getting in terms of FLOPS, floating point operations per dollar from 1960 to sometime in 2000, they just very reliably grew. They got like 2X better every two years. 2X cheaper. And then after that, it seemed to have a different exponent in the CPUs, and it’s unclear if it’s going to continue to slow down or go back to its old speed. Like you could have some new kind of hardware regime where it’s underneath Moore’s law, there’s a lot of S curves, a lot of things that died and no longer got better, but then you found some new thing to replace it. In many domains, that’s happened before.

Danny Hernandez: And so it could be that when you just zoom out in the history of humanity a hundred years from now, our current thing is an aberration and Moore’s law just goes back to its old speed or speeds up or whatever. But if you think about what’s the most interesting compute trend, it’s definitely Moore’s law and that’s the longtermist most interesting compute trend, and much of what’s happened in compute kind of follows from that. If you’re in the sixties and you know Moore’s law is going to go on for a long time, you’re ready to predict the internet and you’re ready to predict smartphones, and you’re ready to make 30 and 40 year long investments in basic science from just knowing this one kind of fact. And you could still be kind of in that position, if you think you know what’s going to happen with Moore’s law.

Arden Koehler: So aren’t you just ready to predict that we’ll be able to do more computations for the same price at a certain rate, or like we’ll be able to do more computations without it being that much more expensive. And I guess that gets you part of the way to the internet and smartphones, but I’m not quite seeing how it gets you all the way?

Danny Hernandez: Well, it got people at DARPA… They were kind of talking about how this is where things were going. It’s like there were people and they kind of could tell. Like the people that were most in the know, they saw this trend and they’re like “This is what this trend means”. And they were right and they didn’t put probabilities on it, but they just were right about the most important thing that happened in the world based on this trend in the past. And so I think we should still respect it and pay a bunch of attention to this trend and be like, “What does this trend mean about our future”?

Arden Koehler: Yeah, definitely. No, I mean, just to be clear, I wasn’t exactly skeptical. I was voicing the sense of not being able to quite connect the dots in my own head between this efficiency and these advances that we’re seeing now.

Danny Hernandez: Yeah. So I guess the way they thought of it was when they looked at computers, they’re like, “Okay, my computer fits in a warehouse”. And they’re like, “Will it ever be on my desk”? Will I ever have enough computation to have anything more than just a cursor in front of me? What if the computer that currently weighs a hundred pounds on my desk fit in my pocket? They could have imagined that. They could draw a curve and be like, “This is when that happens”. And they’re like “Oh, and this is when the computer gets to a price point when everybody’s going to want one, probably”. And maybe that was 10 years from when they started to kind of think about it. But you really care about the exponent because you really care as to whether or not you can fit that phone in your pocket in 20 or 30 years or a thousand or 500 and depending on the exponent, either world seems like it could have been plausible.

Arden Koehler: Okay, cool. So we’re getting ahead of ourselves a little bit because this is more related to AI and efficiency. But just to get back to AI and compute for a second, I guess one question that I would just love to get clear on is why is compute so expensive? So one thing that this huge explosion in computation explains is why it’s so expensive to run these really advanced AI models. Do you have a short answer for why it costs so much money?

Danny Hernandez: I mean, I think the question is like why are people willing to spend that amount of money? Or why are the marginal returns to spending that money worth the cost of spending it? Because initially you could have thought about compute as very cheap. You know AlexNet was approximately somebody’s souped-up gaming rig. It’s like a computer with two GPUs. It’s like a researcher on their desk type thing. And so at some point that compute was super cheap and then what people saw was that they could make their systems better by scaling them up carefully. Like that was one way to make things better. And so they were willing to do that as long as the returns to doing it were better than the costs. So that’s why people are willing to spend a lot currently when they spend a lot of money in the same way the world is giving a lot of attention to AI. The world’s willing to spend a lot of money on AI right now also is how I see it rather than compute being expensive

Arden Koehler: Before we leave that, do you have a sense of is it just that people are seeing future promise in these AI systems or is it like right now there’s a lot of money to be made in running these really computationally expensive systems?

Danny Hernandez: I think another important point is that AI and compute is about training systems. And that the training of systems is actually a small fraction of their computation. So, for instance, Google had this paper about a model called WaveNet. They were looking into letting people search through voice. Maybe in like 2013 or 2014, they were doing these calculations and they realized that if they got a meaningful amount of voice traffic, maybe like 20% of searches were voice searches, that they’d have to double their data centers. So that was from running the model, not from training the model. And so that made them very excited to try to make a more efficient version to run that model. And if when you’re talking about running a model, you’re talking about Google doubling its data centers, then that’s like a lot of money and you start to be willing to spend a lot of money training something if that’s how much money you’re going to spend running it if it actually is that successful.

Arden Koehler: So what ways does this research update you, if any, on how quickly you think AI will progress in the coming years? So it seems like there might be some reasons to think that having to invest a lot of computation means that progress in AI might actually slow because it’s really expensive and maybe we won’t be willing to do that for that long. And some reasons to think maybe it could speed up as it shows everybody has a lot of interest right now. Do you have a view on what some of the most interesting arguments are on how this should update us on our views about AI progress in the future?

Danny Hernandez: Yeah, I think for me it was a modest update towards more progress, but I think it’s not obvious. I think that I agree that if the trend ends, and the trend was driving a lot of progress, then that seems like that would make progress slow down. I think part of my intuition around transformative AI (TAI)… Like when I was talking about that there are some milestones. There’s some things that could be built that would be very impactful. And I was talking about these scientists earlier. One very simple model of the world is that only one of these three factors: compute, data or algorithms is the main driver of progress. And when I’d only seen the AI and compute results, then the way I thought about it was that at least from the sixties to 2012, it’s like compute is this main driver of progress. And if you had some model of AI progress, it might’ve looked something like the algorithm stuff will just kind of happen and it will be just kind of ready in the background.

Danny Hernandez: We’ve got kind of backprop and convolutional neural nets and all this kind of stuff. All the stuff that you made AlexNet with was just kind of waiting for compute. For 10 or 20 years or something. And so for a while it was compute, and you could have this simple model of the world where you think there’s one limiting reagent and if you do a model it’s compute then you’re like “Okay, at some point we’ll get capabilities” and there’s some capabilities that once we get to some amount of compute, like I was talking about science stuff, something that is faster, that should happen before you can just instantiate a great scientist is you should have some tool like a microscope or something, or a telescope that you can make with AI that just advances some domain of science really quickly that wouldn’t have happened otherwise.

Danny Hernandez: I think biology is a pretty good thing. Like AlphaFold could be such a thing that happens… I think there’s lots of things like that that could happen. And so it could be that that will happen as soon as you should expect that to happen kind of quickly after there’s enough compute to enable that technology. And you just don’t know what that level of compute is. You just have huge amounts of uncertainty. It’s like, “Where are these things”? And one 10X could easily push you over such an edge. When you think about how humans look. Like we have this one capability as humans that it seems like something like a 4X scaling up of chips. Now people would argue about this–

Arden Koehler: Sorry, what’s the capability?

Danny Hernandez: The capability is human-level intelligence and it seems like it’s chimp-level intelligence scaled up by 4X compute where I’d say that brains… people could disagree. There’s some literature kind of discussing this claim, but from what I can tell, humans seem mostly like scaled up chimp brains rather than an algorithmic improvement. And so you at least have one example of like a 4X amount of compute can be the difference between a much more powerful system than another system. And so yeah, I think we should have quite a bit of uncertainty as to what can happen when you increase compute by a lot.

Arden Koehler: So saying that it’s true that human beings’ brains are basically scaled up chimp brains, where the main difference is compute. And taking the results from “AI and Compute”, one thing I’m not quite following is how any of this shows that compute is a big driver of progress in AI? Is it supposed to be that, “Look, people would not be willing to invest this much compute if it wasn’t having big returns? So the fact that they’re willing to invest so much compute suggests that it’s a really big driver of progress”?

Danny Hernandez: I think that’s an argument that you could make that people see returns to this thing and so then it must have some value. I think the argument I was trying to make was that you have this compute thing, it’s going quickly, and it’s possible that that has a large effect.

Arden Koehler: Okay. Yeah. Well that seems right.

Danny Hernandez: Because we don’t understand… Our model of this system is pretty poor.

Arden Koehler: Yeah. And it seems like in other systems that are somewhat similar, it seems to make a really big difference in how much compute there is.

Danny Hernandez: Yeah. At least, it can make a difference at some point.

Arden Koehler: I mean, well this doesn’t quite seem to actually decide between thinking that these results mean that progress will be faster or slower, right? Because if you thought… Basically it’s just we’re very uncertain and we don’t know how long this trend is going to go.

Danny Hernandez: Yeah. So I think it depends on what you start off kind of believing or something. Then maybe your priors matter quite a bit here and how impressive you think the current systems are and so I don’t think there’s a definitive case to be made. I think that it’s just that it’s like a measurement that should be in your arguments somehow. You should have this measurement in your arguments. It’s not really clear where they go. It is really clear what to do with this measurement if you’re in industry or something, or if you’re in government. Where in government I’m like, “Look, the AI researchers in academia are going to have trouble. There is a growing distance between the amount of compute that industrial labs have and researchers in academia, and that makes it so that when their research forks, it’s like one person can’t verify the research of other people. They get very different research interests as a result”.

Danny Hernandez: And so yeah it’s like I want academics to have particle accelerators too. I don’t want all the particle accelerators to be owned privately. That sounds weird. So I think the government should be looking at giving large amounts of compute to academics and I think that that’s a clear thing you could get out of this. So I think it’s quite hard to get evidence on this question we were talking about before as to like what does something mean about TAI. But I think there’s lots of other ways to draw a clear thing you should do from this measure or if you’re like a CEO of Nvidia or a chip company, you show this graph to your investors and you’re like, “Look, they want more compute. There’s demand for this thing I’m trying to build”. And so it’s really clear what this means to some people and it’s a lot harder to draw what it means here.

Arden Koehler: Yeah, that’s helpful. I guess on the ‘governments providing compute to academics point’, I suppose we already knew that these private firms had access to a lot more resources. Maybe what this shows is just like one way in which they’re spending their resources and it’s sort of suggestive that this is going to be a way that this particular research should be made available to academics as well. Yeah. Cool.

AI and Efficiency [01:09:22]

Arden Koehler: So let’s move on to talk about the new paper, “AI and Efficiency”, which should be up by the time this episode goes on the air. So the headline results of that paper is that approximately over the same period as “AI and Compute” studied, models now require less compute to do the same sort of thing. So do you want to just talk a little bit about that result and what you think it means?

Danny Hernandez: Yeah, so this is kind of my attempt to measure algorithmic progress quantitatively. Specifically it takes 25 times less compute to train something to AlexNet-level performance, where AlexNet, we talked about earlier, it got everybody excited about neural nets. And there are lots of reasons that I like this. One is that it’s in the same units as the “AI and Compute” result. They’re both about what’s happening with compute and you can kind of make a merged worldview with them.

Arden Koehler: So yeah, I guess it’s a sort of a clever way of measuring algorithmic innovation indirectly by measuring compute, which maybe is more measurable and more comparable. It makes your results more comparable to the other piece of research.

Danny Hernandez: Yeah, and I think the other thing that I like about this… I guess I’ll just explain how algorithmic progress usually gets measured in AI research. What usually happens is you release some new paper like ResNets were a new thing that got released in 2015, and they showed that their top five accuracy, their ability to label, to classify a picture was a dog versus a cat versus a tree, that they could get the top five classes with 93% accuracy instead of maybe 92% accuracy which was like the previous state-of-the-art And so it’s like kind of a bunch of complicated measurements that need to be explained like that that are pretty hard to understand and explain and there’s just like lots of them, right? There’s Go, there’s Atari, there’s all of these things and they all kind of require quite a bit of context to understand how impressive that is, really.

Arden Koehler: It also seems sort of unclear that it’s necessarily the algorithmic innovation that’s doing the work there, right?

Danny Hernandez: Yeah. It could have been that they scaled up the models. But I guess that’s how people talk about capabilities progress, and they like to tie it to algorithmic innovation because algorithmic innovation is the most prestigious way to have made your thing better. But in normal computer science, when you talk about something like sorting, you talk about it’s computational complexity. You’re like, “Here’s an algorithm, and if you want to sort a list that’s ‘N’ entries long, it’s going to take this algorithm N times the logarithm of N and this other algorithm will take N squared and this other outcome will take N cubed, and all of algorithmic progress in traditional computer science is talked about in terms of computational costs.

Arden Koehler: So getting that computational cost down. The amount of time it takes in this example, but I guess also the number of computations.

Danny Hernandez: Yeah, so you could formulate it as either, but yeah. Usually it’s formulated as operations or time given a consistent amount of compute. But in those domains, it’s more straightforward to do that analysis to come up with that number. But we can use this same lens in AI where we focus on reducing computational costs and given that we’re training to constant performance. Yeah. And so I think this is like a generally good way for researchers to compare. It’s like an axis that gives me a lot of clarity as to how their system is better than previous systems. For instance, some systems make progress on this axis, they’re more efficient and better and they get to a new capability that’s never been reached before. And some just get to a new capability that’s never been used before and they’re less efficient at earlier parts. Like getting into earlier levels of capability than other systems. And so those are just like different ways to make progress that I want to understand.

Arden Koehler: So can I ask how you’re isolating algorithmic innovation here in particular? Because if there are three things that go into progress in AI and capabilities progress (compute, algorithmic innovation and data), how do you know that the models are being run to do this AlexNet thing, how do you know they don’t have access to better data?

Danny Hernandez: Yeah. So they’re trained on ImageNet, so they’re trained on the same data.

Arden Koehler: Okay. So you guys were actually running these experiments?

Danny Hernandez: Yeah. I ran these experiments. I used the same data. And I just count the amount of compute they used at every amount of training.

Arden Koehler: So one thing I want to get clear on. So when I first read this I thought, “Well, this seems to suggest that performance must be rising a lot because we’re using exponentially more compute and we’re 25 times more efficient with the compute that we’re using”. So that sort of seems to undermine any concern that we’re getting diminishing marginal returns on the amount of compute that we’re using. Unless a bunch of it is somehow being wasted, which seems unlikely. But then I wasn’t really sure if I really could conclude anything about how much performance has been rising from the observation that we’re using a lot more compute and we’re getting more efficient with using it. Can we draw any kind of conclusions about that?

Danny Hernandez: I think so. So one of the graphs I have is about this concept of effective compute. So we’re trying to compare kind of what we could do now to what we could do in the past and in the past what we could have done is we could have done some simple things to try and scale up this model somehow. Like they’re just known ways to just try and do that and maybe you’ll get returns out of it. And so you can imagine trying to scale up these past models and compare them to what you could do right now. It took me quite a while after I measured this trend to figure out what I thought it meant. More like how to merge it with AI and compute in some way that was clean.

Danny Hernandez: I think that for the intuition part, when we consider that we have more compute, and that each unit of compute can do more, that it becomes clear that somehow these two trends multiply. You have to kind of carefully do that. It’s not clear how to multiply them and how to think about that multiplication, but that they should multiply and that the conception we found most useful is that if we imagine how much more efficient it is to train models of interest in 2018 than it would have been to just scale up training of 2012 models until they got to current capability levels. Like if we’re just given past models a lot of compute, we might have more parameters, more data, some tuning, but like nothing clever. Nothing more clever than just kind of this obvious kind of stuff.

Danny Hernandez: Then how much more compute do we have now than we had in the past? And I would argue that now, in this kind of framework, that we can multiply them and think about us having 25 times the 300,000 amount of compute available in the largest experiments in 2018 compared to 2012 and that this is actually an underestimate of what’s happened because the 25X doesn’t measure the contribution of new capabilities. It didn’t measure AlexNet. AlexNet didn’t show up here in this estimation of progress between 2012 and 2018 even though it represented a lot of progress. And so when you unlock a new capability, you might’ve made it a hundred times cheaper to do it than it was before by doing that new thing or a thousand, and so a lot of the gains in a domain are actually when the thing is unlocked, not when the thing keeps getting more efficient.

Arden Koehler: So this is more like a floor?

Danny Hernandez: This is more like a floor. And like a floor for how much algorithmic progress there is and the floor is still surprisingly large. When I talk to AI researchers, they kind of expect it to be more like 10X or something there. It’s still more, and it’s more in some other domains. Like in translation, it’s 60X over three years. People are particularly excited about language models right now and maybe the progress in language models is faster.

Arden Koehler: Sorry, is that also a floor, because using the same sort of method?

Danny Hernandez: I think I would argue about the 25X being the floor and the 60X is more an argument as to why is the 25X a floor.

Arden Koehler: Oh, I see. I thought you were saying, “Well, if we were to do this instead of with AlexNet with some language model, we would have gotten the 60X”.

Danny Hernandez: Which is true. So yeah, I think you’re interpreting some aspect of it correctly, but maybe less like the claim that I’m trying to make with it. Whereas I guess the claim I’m trying to make is there’s some more progress than some other domains of interest, and so this AlexNet 25X is lower than some of the other domains that were observed.

Arden Koehler: I see. Yeah. So if you’re talking about progress in AI in general, this is another argument for this being a really conservative claim.

Danny Hernandez: Yeah. But then if you were trying to talk about a specific domain, and you’re particularly interested in language, then maybe you focus on trying to measure the language thing better and then yeah, also like this number hasn’t really been optimized yet. Like nobody’s tried to optimize this number, including me.

Arden Koehler: Wait, sorry, what do you mean optimize the number? Like optimize how much efficiency we are seeing?

Danny Hernandez: Not that exactly. More like optimize how much compute it takes to get to AlexNet level accuracy. AI researchers are trying to make the most performant systems for the most part. Sometimes they’re trying to make more efficient systems, but when they’re trying to make more efficient systems, it’s usually at runtime, not in this early training phase. And so we have this, I don’t know, like a Goodhart’s law style thing where it’s like nobody’s tried to optimize this thing yet, so maybe it’s a more reliable measure right now because this is just what happened without people trying to make this better.

Arden Koehler: Yeah. Okay. So you’ve given some theoretical reasons to expect performance to have increased a lot. Is that what we see? I mean, I think it’s a bit harder to measure perhaps than the amount of compute. But can we say anything sketchy about whether performance is increasing the amount that we might expect given the increase in these inputs?

Danny Hernandez: I think there’s some things we can say about that. Like I think that kind of the amount of value that AI or neural nets are creating is a relevant way to think about that. What that trend looks like over time.

Arden Koehler: The amount of economic value?

Danny Hernandez: Yeah, economic value. Economic value is downstream. If you wanted to get more upstream, you could try to get to implied economic value or implied cash flows or something where when people are investing more, they’re showing that they expect there to be more future cash flows. And so that’s why investment is kind of interesting. And so yeah, I think both of those are ways you could try to talk about AI progress in a way that’s still quantitative. But I think a lot of what people are generally talking about is how impressed they are by some new capabilities. They list all the capabilities over time and how impressed they are by these, and then they argue something about whether or not that is what that means, and those arguments are quite hard, I think, to be made convincing, though I think they still can be quite meaningful.

Arden Koehler: Is that because people are just impressed by different things and there’s no good arguments that this is more impressive? That one sort of capability is more impressive than another?

Danny Hernandez: I think that it requires a lot of expertise. It’s almost like evaluating those arguments is kind of equivalent to making a research agenda over many years on AI on its difficulty, because evaluating those arguments well implies that you could do that.

Arden Koehler: I don’t understand, what do you mean that you could do what?

Danny Hernandez: So say I’m a research lead, and I have to direct other people to do AI research over multiple years and figure out what’s relatively promising. That’s similarly difficult to reading these trends and making such a qualitative argument, or like you need the same skill set to do both. Well I think it’s just extremely difficult to evaluate such arguments is kind of the claim I’m trying to make.

Arden Koehler: I definitely buy that claim. But what’s the skill set involved that you’re referring to?

Danny Hernandez: It’s like the skill set that an AI research lead has to have because they have to pick out what’s promising to work on given what we’ve observed.

Arden Koehler: I was thinking maybe you were going to say, “Well, it’s really understanding how difficult certain tasks are and that kind of involves understanding how they are done”.

Danny Hernandez: I think that’s also true. You have to know what’s promising and what’s tractable and yeah, I think that’s the kind of thing that it is. And like the research leads are all gonna pursue different paths. And you don’t actually want them to agree because you want them to pursue different paths and so it’s not surprising that they disagree as to how to evaluate the arguments. So mostly I was just trying to talk about or suggest that the economic stuff might be more informative because it’s understandable at a broad level.

Arden Koehler: I guess one small worry that comes to mind, maybe it’s not small or maybe it’s totally misguided about the economic measure is just “Well, investment is also what we’re is almost equivalent to what we’re measuring when we measure the amount of compute going into something”. So then I get a little worried if the point is to say like, “Well how do the trends in the input compare to trends in the output”? That seems like a really interesting question. I don’t want to use the same thing for both.

Danny Hernandez: That sounds right for that lens. Yeah. When I think about economic progress, I mostly think about what industry is doing with AI systems. I think one of the most interesting deployments of AI systems is that Google talks about maybe handling like 15% of search cases or something with a new new language model helping with that. It seems like one of the biggest improvements to Google that’s happened in a long time and improving Google searches has a lot of economic value. There’s nothing in 2012 that was as cool or interesting as improving Google search that neural nets were doing or as producing as much value. And I think you can just try to chase down where the economic value is being created at different points in time in a very broad way and be like, “Okay, these are orders of magnitude different”.

Danny Hernandez: And I think it’s harder to say a lot. It’s hard for me to say a lot more than that now, but I think that that’s like a way somebody could do a better job of this thing. I think the investment data is harder to read because it just kind of shot up and then investment data is usually spiky. It usually shoots up kind of too much in the beginning and then flattens out and maybe the summary of investment is people still think AI is promising. They still think it’s one of the things that they should consider investing in and are most interested in investing in. Investment hasn’t been going down as far as I can tell. Private investment in AI startups went from like 7 billion in 2012 to 40 billion in 2018. That’s kind of indicative of the amount of money I see changing.

Danny Hernandez: It went up by 5X or something. The investors expect more future cash flows. I think it’s hard to get much past that in the investment domain. In the capture domain all I have is like a couple of examples of where do I think AI is generating the most money right now? I think businesses use it to detect cracks in industrial processes. That’s what I’ve heard from people who sell AI systems like big enterprises. Like what are they buying? And it’s that. And that’s kind of like ImageNet technology. Like you could have done that with ImageNet or a slightly better ImageNet probably. I mean you’re probably getting better at it, but that’s maybe when it started to be useful. And industrial processes really care about increasing their reliability. So there’s some amount of money being made there. I’m not really sure. But it’s a lot more money than there was in 2012, I think, or 2013. That’s how I start going about it.

Arden Koehler: So one thing I wanted to ask you, which now I think maybe is not going to be an answerable question, but I’m going to give it a shot anyway and see what happens. So we see these inputs going up a lot. In the case of compute, it’s exponential. If we thought that progress in capabilities was linear, then we might draw a pessimistic conclusion about our ability to increase progress or see really fast progress in the future because it would resemble this phenomenon in science where we’re investing more and more but seem to have linear outputs. And I was going to ask, do you think progress is linear? And maybe that’s a really hard question to answer.

Danny Hernandez: I mean, I think it’s a hard question to give a convincing answer to. I think what I was trying to set up a little bit before is I think in terms of economic captured value. I think we see something exponential in capture value like returns that corporations or somebody else makes from neural nets. I think that I get the feeling of exponential thing, not a linear thing there–

Arden Koehler: From looking at the examples?

Danny Hernandez: From looking at the examples. Because in like 2008, it basically rounds to zero as far as I can tell. And it seems like making Google better is worth a lot. Say it made Google 0.1% better or 1% better. That’s like a trillion dollars times 1 or 0.1%. That’s like a lot more than zero I rounded down to earlier. I think there’s more than that, but I think you only get from where we were at in 2010 to that with an exponential in terms of value being created. I think that’s what we’d see if we measured it. I think that perceived impressiveness it’s like at different scales, so I think what people are mostly saying when they say that perceived impressiveness feels linear, there’s a couple of different things that they could be saying.

Danny Hernandez: One is that they could be saying that they don’t find what’s happening surprising. That given the past, the present doesn’t look surprising, and that’s kind of a different claim to engage on, but it’s not about linearity. It’s really hard to say whether or not something’s linear without defining the units crisply. And so I think that that makes it harder. I think another thing that people could be saying is say you have some frequency of big insights. Like people are very impressed by the Transformer. And the thing I mentioned before like how language models… The Transformer is this large result that’s used very broadly and that was a 60X improvement over the first translation system with neural networks that was well-known, and it was a 60X improvement over that two to three years later.

Danny Hernandez: And so one way you could argue for a linear thing is to be like, how often do we get something like this? Do we get something like this at the same frequency all the time? Or are we getting these things more frequent? The things that are at a given size. But then you have to be like, “Okay, here’s a result. Like how impressive is this result compared to other results? But I think that could be measured. Michael Nielsen and Patrick Collison tried to measure Is progress slowing down or not? in physics and a bunch of other domains by having people pairwise compare the impressiveness of results because I think that’s an operation people can do pretty easily. And I think you could try and do something like that in ML. To try and be like is progress speeding up or slowing down or staying the same and then I think you could kind of try to argue for linearity if the most impressive results seem equivalent to the most impressive results two or three years ago. That’d be another way you could argue for linearity. I think that way I’d be reasonably likely to come to that being constant.

Arden Koehler: I would read a blog post that did that study. Okay. So before we leave progress in AI, are there any other developments that you think are particularly interesting and worth talking about?

Danny Hernandez: Well, I guess I have a different feeling as to how important algorithmic progress is versus compute progress after doing this AI and Efficiency thing. I think a first order model is kind of doomed. Like a model that only takes into account compute or algorithms. I think that the effects are kind of close enough to each other in magnitude and they multiply. And so because of this, you really need to represent both to understand what’s happening. At least they multiply in the space I’m interested in.

Arden Koehler: Sorry, when you say the effect is close enough to each other. Wasn’t one 25 times over this time period and the other one 300,000 times? Is that not the thing you’re saying is close together?

Danny Hernandez: Well, but the 25 is like a lower bound, and I made some arguments as to how it could be a lower bound by a hundred or a 1000X some of the time.

Arden Koehler: I see. I don’t think I realized that it was that much of a lower bound.

Danny Hernandez:
Yeah. So a way you could think about it is when you scale up past systems, sometimes they just smoothly scale up and keep giving you returns. But other times they just fall off a cliff and you could put astronomical amounts of compute into them and they wouldn’t do anything to get to some certain level of capability. And so I think sometimes that algorithmic progress make something possible with a hundred or a thousand times less compute than would have been needed before and I don’t think you could ignore such events being possible in your model, but even just to step back and make a slightly weaker claim or like a lot more defensible, just the fact that they multiply means that you care about any 2Xs and any 4Xs from either one. And so just realizing that they multiply in this space: I care quite a bit about both and about measuring progress in both.

Arden Koehler: Wouldn’t we have known that they multiply before measuring compute and measuring algorithmic innovation, because isn’t the argument that they multiply just that one is how much you can do with the other?

Danny Hernandez: Yeah, so I think that if one of them multiplied but was basically static. So say that the amount of algorithmic progress was more like it went down by like 2X, I don’t know, over this like five year period. It starts to feel more like a thing you could just ignore and be like “We could add it, but that’s like a detail of the model”. It’s not like this key component of the model that is the second thing you add.

Arden Koehler: Okay. That makes sense.

Danny Hernandez:
Whereas people reading “AI and Compute”, they might infer a pretty compute-centric viewpoint from it some of the time. I think of them both as very important effects to understand if you’re trying to think about progress and yeah, I think that another thing that’s maybe interesting that we didn’t talk about is the slow AI progress viewpoint. Some people look at what’s been happening and they’re just like, “This looks slow to me”. Like people are giving this thing too much attention.

Arden Koehler: They look at the performance and think this looks slow?

Danny Hernandez: No, they just look at AI. They look at AI capabilities progress over the last six or seven years and they can say, “We’re not solving the most important problems. We haven’t made progress on the problems that matter. We are not getting breakthroughs that are meaningful at any much frequency. We’re not getting a new transformer every six months. This is a kind of thing that we get every several years”. And so yeah, I just think that in the interest of giving people some overall take as to how if I was to think about AI progress, I think this other viewpoint is worth bringing up that I think that the best argument that AI progress might indeed be slow is that there are problems that we care about and we’re mostly measuring progress on the problems that aren’t the most important ones.

Danny Hernandez: And that when we look at how quickly we make progress on these most important ones, and just project how many of those chunks of progress we need to really get to something really impactful. Maybe it makes progress look quite far away, and I gave kind of a compute-centric view and I would also give this other view and encourage people to have an ensemble of both of these views and to take new evidence into how much it supports each view and how does it change each of these views, because I think they both have merit and should be taken seriously.

Arden Koehler: And how do you think the combination that’s informed by these observations in compute and algorithmic efficiency, how do you think that does impact how we should think about the slow AI view?

Danny Hernandez: When I look at 80,000 Hours, and what you guys talk about in EA, you talk about AI a lot and I’m just like, “Yeah, it seems I agree. That’s about how much focus and attention AI should get”. But these are the views that I use to get there where I’m just like, “Yeah, I have like a bunch of uncertainty”. People kind of expect different things. I put some weight on each and try to understand the arguments for both because I think what can happen… Like another way you can kind of–

Arden Koehler: Sorry, both the view that AI is going to be slower and less impactful in the near future than some people think and the view that it’s going to be very impactful quite soon?

Danny Hernandez: Yeah. And so different people make each of these arguments and I just would encourage people to like steelman each of them and give some weight to each as to how they try to hold an overall view of AI progress in their head to have model uncertainty about what’s going on in AI at the level of what some of the major different viewpoints people have. And I’m sharing the best argument for things being slow because I think that that’s an important viewpoint also.

Arden Koehler: Is there anything else that you would look at in order to update your view on AI progress that’s worth mentioning?

Danny Hernandez: I think one of the most interesting things to keep watching and thinking about is what happens with sample efficiency over time. As far as important problems that people would say we’re not making progress on, sample efficiency gets brought up a lot.

Arden Koehler: What sample efficiency?

Danny Hernandez: So to play a game like Go, or Defense of the Ancients (Dota) or StarCraft, or any game, AI systems will often play like 10,000 lifetimes of games or something. They’d play just a lot more than a person would have ever had to play to learn that game to a professional level. And so making progress on sample efficiency is often brought up as one of the ways in which maybe we’re not actually making progress. And so I think that improvements in sample efficiency are worth paying quite a bit of attention to.

Arden Koehler: Sorry, can you just say a little bit as to why you think it’s so important?

Danny Hernandez: Yeah, I mean I could start with an economic argument for why it’s important. Like for some things, there’s only 10 or a hundred data points in the world. It’s just really expensive to get data. And so if your system only works when there’s millions of data points, then you just can’t use your system. Also, some systems’ data can be generated. So say I show up, I’m your new coworker and you can teach me to do stuff, but you’re just gonna have to show me a thousand times. That’s pretty frustrating. That’s a much less useful system than a system that can be shown fewer times and learn more. But maybe this is one of the most interesting ways you can make progress in data is not making larger data sets or more data sets. It’s needing less data. If you need less data, then there’s just way more problems you can solve. And so yeah, I think that making progress on sample efficiency is quite interesting.

Arden Koehler: Are there any tasks that AI has human-level sample efficiency at or even better than human-level sample efficiency at right now?

Danny Hernandez: I think some of the more interesting work there is a paper from Josh Tenenbaum’s group where they learned to generate characters in an alphabet similarly quickly as humans. I think it wasn’t entirely based on neural nets. I think there were some other things going on in there, but I’m not super familiar with the details. But yeah, so I think there has been some work, but I think that there’s kind of two ways you could try to measure it. You could try to be like, “What can we do at human-level sample efficiency and how much does the amount of samples we need on old problems that we use, how much is that going down over time”? I think both of those ways of looking at it are interesting.

Arden Koehler: Have there been any milestones on either of those metrics?

Danny Hernandez: You could think of AlphaGo as a milestone in this, in the sense that before it required a bunch of human-labeled games and knowledge about Go and then it didn’t require that and it just moved from… So it still required a bunch of data like rollouts of simulations, but it did require a different kind of data than it would’ve required otherwise. And so maybe I think that’s probably the most interesting progress that happened. But you can also imagine later systems of AlphaGo that required less compute and just fewer games, and that would be an improvement in sample efficiency. Sometimes sample efficiency is an improvement in computers and other other times it’s not. So yeah, I have less interesting trends here to point to and more to say like here’s an interesting way of thinking about data and I think it’s good to say what kinds of things we should be looking at to update us.

Safety teams at OpenAI [01:39:03]

Arden Koehler: So let’s move on and talk about the safety teams at OpenAI. Can you start by describing the Foresight team and what you do?

Danny Hernandez: Yeah. Let’s see. So the Foresight team tries to understand the science underlying machine learning and macro trends in ML. And you could think of it as trying to inform decision-making around this. This should inform research agendas. It should inform how people think about how it’s informative to policymakers. It’s informative to people who are thinking about working on AI or not. It’s informative to people in industry. But you could think of it as just trying to be really rigorous, which is like another way of thinking about it. Like it’s mostly ex-physicists and physicists just want to understand things. So ML is mostly driven by trying to achieve state-of-the-art performance, which is a fine thing for a field to be driven by. It’s very measurable. It’s clear when you do it and it’s economically useful but it’s less understanding, right? Whereas here you have a team who just really wants to understand.

Arden Koehler: So the name almost makes it sound like what you’re trying to do is forecast AI. It’s the Foresight team. But it sounds like a lot of what you do. I mean, maybe that’s part of the end goal, but a lot of what you do is produce this somewhat backward looking research on what’s been going on and driving progress so far and explaining what’s happening in the field.

Danny Hernandez: Yeah, I mean we’ve talked mostly about macro ML trends. I think another way of explaining what the Foresight team does is to talk about the two other papers that it’s released.

Arden Koehler: Yeah. Do you want to just briefly describe those other two papers?

Danny Hernandez: Yeah. So there’s this one blog post I’d recommend people look at “How AI Training Scales” and they talk about this parameter they found. They call it the gradient noise scale. There’s just this aspect of a system that they can take a measurement on that you can measure any ML system on, and that will predict the parallelizability of that task. Like how many computers can I use at once to train the system? And we’ve learned that complex tasks… They looked at a bunch of tasks, and complex tasks empirically just are more parallelizable and that’s clear when you look at this measurement of them. And that might vary by like 10 to the fifth or something like Dota, like a very complex video game versus MNIST. There’s like a difference of a thousand and how parallelizable they are.

Arden Koehler: And what’s MNIST?

Danny Hernandez: MNIST is a handwriting recognition task. It’s like the thing that people did in the nineties.

Arden Koehler: Interesting. And what’s the other one?

Danny Hernandez: The other one is called “Scaling Laws for Neural Language Models”. And you could think of that as very related to the AI and Efficiency work where they scaled up systems and saw they could kind of predict how a bigger version of a system would perform on a metric that matters in machine learning. And they were trying to predict text and they found this trend where models were just very smoothly getting better as they got scaled up, which is the main thing that they did and this kind of is related to the AI and Efficiency thing I brought up earlier in that this is a technique that you could use to try and understand progress in a more complete way, which is to look at old systems we have, and try and scale them up. See what that implies about what would have happened if we scaled them up and then understand how much better our current systems are more rigorously as a result.

Arden Koehler: Cool. So it seems like OpenAI has a bunch of different safety teams and the Foresight team is only one of them. Do you want to talk just briefly about the other teams and what they do?

Danny Hernandez: Yeah, so there’s a team called Reflection and that’s the team that Paul Christiano leads who’s been on this podcast. They think about problems like how do you make it so that a system takes human feedback well and think about the long-term problem of having systems be alignable with what people really want or with what some person or some group of people really wants, rather than kind of go off the rails. And so I think that’s like a thing people generally think when they think of AI safety. And that problem seems very important, but it doesn’t resonate with everybody because it’s kind of abstract, but it’s not the only safety thing happening. In addition to Foresight and Reflection, there’s the clarity team. They try to make AI systems interpretable. If you’re running a system and say you’re going to give some system quite a bit of power at some point in time, then the more you can understand what it’s actually doing and why and can reason about what it’s doing and why. That team feels kind of like Foresight to me, in that it’s about rigor and understanding, but in kind of a different way.

Arden Koehler: Yeah. It seems like it’s about understanding the systems themselves and the Foresight team seems to be more about understanding the field or something like that.

Danny Hernandez: Yeah. Where it’s going. So they recently released this paper called “Microscope” and their motivation for this thing was there was a time where we made a microscope and a lot of science happened. We just started looking at cells. It felt very different. You could just be like, “Hey, I looked at a bunch of cells and I figured a bunch of stuff out and what if we did that kind of thing with machine learning systems where you just made it very cheap and easy for people to look at them and try to understand them and zoom in on them at the individual neuron level”. They recently did something in that and I think that’s quite cool.

Arden Koehler: You mean like actually building a tool, or sketching out what such a tool would look like or just talking about what the impact of such a tool would be?

Danny Hernandez: Well that’s like their motivation but yeah, they released a collection of visualizations for every layer of several well-known models so that it’s easy to figure out what things individual neurons are doing and a bunch of well known machine learning models and that. Yeah, that’s what they did. So they call it Microscope and that’s what they feel like it is and that resonates with me. There’s another team that focuses on safe exploration. It’s harder to talk about their work without getting more technical, I think. But you could think of it as this problem, where maybe a classic example of a problem that happens is say I’m a robot house cleaner and I’m trying to clean up and I don’t want to break any vases. I prefer to never break a vase. How do I explore my world? Or I’m like a baby. How do I explore my world without accidentally doing a lot of damage? It’s like an open hard question.

Arden Koehler: So how integrated is the work of the safety teams with the work of other teams? So, for instance, do the safety and policy teams at OpenAI work together a lot? How closely is safety understood as part of capabilities research? How do they all sort of fit together?

Danny Hernandez: Well first I can talk about how Foresight works with policy. One of the main points Jack Clark has been making, who’s the Head of Policy at OpenAI, is that measuring progress is important and that’s a thing that governments could do that would just be strictly good for them. It’d just get them better information and get them more capabilities to understand and make better decisions in the future while helping them build connections in industry that they want to build. And so the Foresight team, we try to produce these good examples of measurement that we hope will be useful and that will just lead to ongoing things that get measured that are of interest.

Arden Koehler: You’re hoping actors in government will be like, “Oh, we’re going to do some of that research too. We see now what it can look like”.

Danny Hernandez: Yeah. To like start tracking some of the things we think they should track and also to do some of the measurement that requires a lot of expertise to do. It would be hard to get the government to start maybe doing this kind of AI research on macro AI progress. That sounds hard to get them to do, but we can do that and give it to them and so yeah, that’s a way we interact with them. And then you could think that when you’re just kind of trying to do good policy work, that you’re going to have technical questions lots of the time and that the people on the safety team are just quite likely to answer your questions for them if they can. And so that’s one of the ways we collaborate quite a bit, is we just try to make sure that they understand whatever technical questions they need to understand to do their jobs.

Arden Koehler: Cool. That makes sense. The reason I asked about whether safety is conceptualized as part of capabilities research is I just sometimes hear people argue about whether safety is sort of properly thought of as separate or part of capabilities. Like you wouldn’t want to make a system that isn’t safe because then that’s a way in which it’s not capable. I’m just curious how it’s thought of at OpenAI in your experience.

Danny Hernandez: It’s interesting. Well maybe I could just start with Foresight. I think being able to better predict what will happen both puts you into a safer space and a more capable space. And so I think that maybe the way to think about it is not that “Are safety and capabilities totally divorced”, but how much optimization power are you putting into two different desirable aspects of the system? So it could be that you want to make a system cheap. You could want to make a system safe. You could want to make it capable. You could put different amounts of power into each of these things and that kind of the more optimal or more of your optimization power you’re putting into trying to make sure that it can be safe then just the more likely you should expect it will actually in fact be safe. And so the more you are motivated by that kind of concern, by making sure the system goes well, the more likely it is to go well. And I think it’s hard to disentangle it more than that.

Careers [01:49:46]

Arden Koehler: That makes sense. So what should somebody do if they want to do safety work at OpenAI? First of all, are you guys hiring and second of all what sort of steps should they take? What kind of skills and knowledge do they need?

Danny Hernandez: Yeah, we are definitely hiring. I expect we’ll always be hiring. And I know in particular that the Foresight team, we really need research engineers. And there’s kind of a spectrum between research scientists and engineering. But you know, you could kind of caricature it that people that look more like research scientists are more likely to have gotten PhDs and are more likely to do work that’s a bit more theoretical, whereas engineers are more likely to try to make infrastructure that makes it so that you can run a lot of experiments quickly or reimplement things well and reliably. And they specialize in that a little bit more. And so, for instance, there’s three physicists on the Foresight team now and two of them were theoretical physicists and one of them was an experimental physicist.

Danny Hernandez: And the experimental physicist is the one that’s more the research engineer.

Arden Koehler: I see (laughs).

Danny Hernandez:
Tom Henighan who got excited to apply to OpenAI because of an 80K podcast.

Arden Koehler: That’s awesome.

Danny Hernandez: And so yeah, I think research engineers look more like people who were in some other domain that they often were computer scientists or software engineers somewhere else. Or were kind of like math and ML curious and just kind of taught themselves and then at some point made a jump, and so I think yeah, I think people start in different places. I think you guys had a podcast with Daniel Ziegler and he talked about his path to becoming a research engineer. I think that his path was what a lot of research engineer’s paths might look like where you just kind of start reimplementing models. Find someone to guide your learning.

Danny Hernandez: Maybe a novel take on any career is… What being an AI researcher looks like… Like one way you could model it is there’s somebody who wants to spend at least an hour a week talking to you and overseeing your research and thinks that research would be interesting. And that that’s kind of where you’re trying to get to and that you could make progress towards that. If there was somebody who thinks that what you can produce… What you might be able to produce three months from now might be interesting and they’re willing to talk to you for an hour or half an hour to help you figure out like what should my next three months of learning look like? And then they help guide your learning to make sure it’s useful for them, or in the direction that’d be useful to them.

Danny Hernandez: And then at some point they’re like, “Well, here’s some vague ideas that if you did any of these things, that’d be interesting”. Then you come back like a month later and you did something that’s interesting to them and then eventually it’s two weeks and then one week and then they’re like, well you kind of work for me now because they keep being willing to invest more time, but you really need this guidance too. I think this guidance is immensely valuable to you. Like if you didn’t get them to tell you what to learn, you could have learned very different things and then never gotten to a position where anybody was trying to oversee your work.

Arden Koehler: So has this happened with people at OpenAI where some person who doesn’t work for OpenAI asks them for an hour of their time every three months at first and then sort of slowly it becomes the case that they work there?

Danny Hernandez: I’ve seen that happen, yeah. I’m trying to describe it as this smooth transition, but yeah. I think my friend, Tom Brown, who was part of how I got interested in AI. He made me take Coursera. He wanted me to take Coursera classes with him and convinced me to read Superintelligence a long time ago. He got advice on what to learn from Greg Brockman, one of the cofounders of OpenAI and then eventually got to a place where he was working at OpenAI full time. And AI and Efficiency was joint work with Tom Brown.

Arden Koehler: Yeah. It seems like it might not be quite the first step to have that meeting with somebody at OpenAI because you want to first be able to not have it be a waste of everyone’s time.

Danny Hernandez: Yeah. So I guess I’m trying to describe what should you be asking in that first meeting? And I think what you should be asking is… And it doesn’t have to be a meeting, it can be an email or something, but it’s like what should I go learn in the next three months or next month?

Arden Koehler: And do you feel like people are usually pretty receptive to those sorts of questions?

Danny Hernandez: I think people are open to that question, especially if they somewhat know you. Like I think you could walk your social graph to find who is the person who can give me this advice who I already know or who knows somebody, and I’m just trying to structure the question you should be asking them, which is like what’s worth learning given my eventual goal and how far away do I seem from that goal? And you have this very limited resource that’s expert time and you’re trying to leverage it for as much as possible and so you’re trying to use it to guide your learning process. I think that’s a way to kind of learn. It’s a general way to learn a lot of things inside of an organization, a new career, lots of things. And to conceptualize it as smooth.

Arden Koehler: Yeah, that’s really useful. At OpenAI, is there a more formal process that some people should take?

Danny Hernandez: Oh yeah, there’s a Fellows program and you could think of that as like the residency at Google if people are familiar with that. But it’s often people that had technical careers that, like if you got a PhD in something else and have been learning machine learning on your own, for maybe a couple months or something or longer, but that’s not what your PhD is in, then those people often apply to be fellows. That’s kind of like the entry point.

AI hardware as a possible path to impact [01:55:57]

Arden Koehler: So let’s move on to talking about AI hardware as a possible path to impact. This is a little bit more speculative, but do you think that working in AI hardware, developing chips, or researching AI hardware progress might be helpful for positively shaping the development of AI and if so, how?

Danny Hernandez: Yeah. I think there’s quite a bit of potential in the domain. I think a very concrete thing is… Recently OpenAI just had this blog post and paper around trust and how AI organizations might be able to make verifiable claims in the future about their work and build trust between organizations. Broadly people might be used to thinking about coordination being hard and you want institutions to coordinate and this is kind of discussing that and I think there’s some places that hardware falls in there that people might mostly be motivated by from a safety perspective. One is they talk about secure hardware for machine learning. So you could think of this as this hybrid which you guys have talked about before, that maybe security’s a quite interesting thing for people to go into. Maybe one part of security that might be particularly interesting to AI is having secure hardware that people trust.

Arden Koehler: So what does that mean?

Danny Hernandez: Yeah, there’s this thing that if you care about security, you might’ve saw… It was called Spectre and it was a vulnerability on a bunch of different processors and a bunch of different CPUs. And you could think of it as it’s lower than software. If your hardware has a security vulnerability, it doesn’t matter what software you’re running. Everybody has a CPU. And if it’s insecure, then all of them are insecure. And so yeah, that’s kind of how I would talk about hardware security. It’s a subfield of security that maybe you’d be more motivated by if you were motivated by trying to make AI go well that’s very concrete to be like, “Okay, there’s at least some paths that you would expect would be neglected otherwise because you kind of have to have this motivation of making things go well to be interested in them and they’re just starting to be kind of pointed out. So that’s another reason to think they might be neglected.

Arden Koehler: I guess I wouldn’t be that surprised if there were pretty big economic incentives in having secure hardware as well.

Danny Hernandez: I think there are pretty big economic incentives to having secure hardware. So yeah, I’m not trying to make a super strong argument. I’m just trying to be like where might I nudge somebody if they are interested in that kind of thing and point them towards that. I think that if you try to get back towards AI hardware generally, I could see a world in which hardware is… Like in the worlds where AI is influential, AI hardware might also be influential. So you have this very abstract argument.

Arden Koehler: Be in the places that are influential in the worlds where it matters.

Danny Hernandez: Yeah, in the worlds where it matters. I think it’s hard to get a lot better than that with AI hardware. But yeah, I think that in those worlds, hardware might be quite influential. If you think about it from an economics point of view, there’s some chance that’s where we’ll capture quite a bit of the value or more of the value than software.

Arden Koehler: Is that basically because if really amazing progress is made in AI, and it becomes extraordinarily profitable, then the demand for chips that can lead to really efficient computation will be super high, and so whoever’s making those chips gets super rich. Is that basically the story?

Danny Hernandez: Yeah, that’s a world in which AI hardware has a lot of impact.

Arden Koehler: Yeah, I mean, so I also just had this sort of inchoate sense of “Well, it seems like that’s an important thing so maybe people should go toward it”. But I’m not exactly sure if the mechanisms whereby you could actually make sure that things go relatively well in those worlds. I guess by maybe influencing your organization. Maybe trying to get them to take the right cautions or adopt certain kinds of business practices that make AI more broadly beneficial. So I know that there’s this idea coming out of the Center for Governance of AI at the Future of Humanity Institute called “The Windfall Clause”, where the basic idea is that firms precommit to sharing a bunch of their windfall if things go a certain way. And I guess that’s one way that people working at hardware companies could try to make AI more broadly beneficial?

Danny Hernandez: Yeah, I think if hardware companies signed up for a windfall clause, that’d be really amazing progress. I mean, you could think of OpenAI’s LP as an implementation of the windfall clause.

Arden Koehler: Do you want to describe that?

Danny Hernandez: Yeah. So at a high level, after some amount of returns are made to people that hold a kind of equity in this limited partnership, then all future profits or returns will go to the nonprofit.

Arden Koehler: Yeah, sort of its own windfall clause.

Danny Hernandez: Yeah. So that’s OpenAI’s LP. And I think it’d be quite good if more organizations had such a thing. I think it takes a lot of influence within an organization to get them to commit to something like that. So if you imagine somebody who’s founded a hardware company, they get to do that if they want. Probably they’re going to have to negotiate it with their investors. But I think that there is somewhere in the tails of payoff of influence that you could have. Like any executive might also have a chance. Anybody that was early at this hardware company that has a close relationship to this executive, whoever can actually make the decision to do this thing. So yeah, I think that that’s another reasonable example of how things could go quite well by influencing hardware. I think maybe another argument for it is just some amount of diversified portfolio. There’s probably some people that have a particular fit for hardware. They’re particularly interested in it. They think it’s a particularly interesting problem. They already have strong connections in it or something. I think that it’s got potential.

Arden Koehler: Yeah. I guess one worry about this is just in order to get into a place where you can have that kind of influence, you’re going to have to have a lot of success probably speeding up AI hardware progress, and it seems like not totally clear that that would be a good thing because it could give us less time to research ways to make AI safe and beneficial if AI hardware is getting better and better. So we can use more compute more and more quickly.

Danny Hernandez: Yeah. I think that similar argument could argue against doing most AI research. Like at least qualitatively. Maybe there’s an ad absurdum argument against it. I’m not sure. I think a related point, people sometimes talk about trying to coordinate to slow down AI progress and like “Wouldn’t that be good”? And I guess my response to that is that having the level of coordination in the world where we could do that kind of thing sounds amazing. Just having that level of coordination, and that seems like the good part, but we seem so far away from that, that it’s just outside of the considerations that I generally think about. And that’s kind of my reflexive response.

Arden Koehler: Yeah. So two things on this. So one thing is that it seems like when you’re doing some kinds of AI research, you might be actually able to make it more safe by designing it in a certain way. And that at least, I’m not sure, but it doesn’t seem obvious that there are ways of doing that in AI hardware.

Danny Hernandez: Yeah, I think it’s a research question.

Arden Koehler: It’s a research question I would be interested in. Not researching it myself of course, because I’d be totally unqualified, but somebody doing it.

Danny Hernandez: Yeah, I think there’s some portion of it that if you want to be quite careful, there are probably things you can think about that are pure safety or security things that aren’t really about capabilities. But I think that may be what I was trying to paint, was like one path towards a lot of impact. And paths towards lots of impact have risks and other stuff in them. I think it’s quite hard to get to. There’s like lots of risk of lots of different kinds along this path towards a lot of impact of getting a windfall clause at a hardware company. So I think there’s risks going down such a thing.

Arden Koehler: Yeah. I guess just trying to figure out whether the risks are worth it.

Danny Hernandez: Yeah, I agree that is the question. I’m not that particularly opinionated on it. My opinionated thing is mostly just think about the tail outcomes and how they would actually happen and that’s kind of why I brought it up. It might require a lot of influence inside of a company to get them to adopt a windfall clause.

Arden Koehler: That makes sense. I guess related to this, it seems like there could be a path that is a bit less risky. Something like “AI hardware policy”, which is not really a named field right now, but not so much going into hardware companies, but trying to work with governments or other sorts of bodies that might be able to regulate AI hardware or perhaps create the kinds of incentives that would make an advance at the right times and the right places. Does that seem like a promising path to you?

Danny Hernandez: I think that hardware expertise is worth quite a bit. It’s not that to policy and to forecasting. So, for instance, the kind of person who I’d be most interested in trying to make good forecasts about Moore’s law and other trends, is somebody who has been building chips for a while or has worked in building chips for awhile. I think there aren’t that many of those people. I haven’t seen somebody from that background that is working in policy yet, but my guess is that they could be very useful at some time and that it’d be reasonable to try starting now with that kind of thing in mind. But that’s pretty speculative. I know less about that than the forecasting type thing. I think hardware forecasting is very interesting.

Arden Koehler: Cool. So maybe somebody who has experience in hardware but wants to sort of convert that expertise into trying to make safe and beneficial AI more likely, they could go down a research route looking into Moore’s law and what we might expect to happen to it and maybe they could go into a policy position or they’d probably be very valuable there. I don’t know at the moment how receptive policymakers would be to that kind of thing. But it seems like it could be valuable.

Danny Hernandez: Yeah, I mean right now they’d have to be the kind of person that… For instance, there was never a job description of my job at OpenAI. There’s never a job rec. I just kind of started trying to do this job. And–

Arden Koehler: So did you take the route that you discussed before?

Danny Hernandez: Yeah, I met somebody at OpenAI and then started talking to them more frequently and then they became my manager at OpenA. That was the path for me.

Arden Koehler: That’s probably much less likely to happen in more formal domains like in the policy world, I would think.

Danny Hernandez: But so I think then what you look for is you get your first job at an informal place where you can show up informally. Like some places you can’t. It’s hard to do that in the government or something. But there’s some places that vary on that. So you just apply to the informal places first and you walk up the chain. Sometimes there’s a way to get some minimum credential. I think like a public policy masters or something is kind of one way where people get a credential quite quickly that makes them seem reasonable. So it’s like you could be somebody that has one of those and has a background in hardware and then all of a sudden you’re like one of the most credentialed people there is. It could happen pretty quickly.

Arden Koehler: That sounds like a really interesting route. I’m curious if any listeners will be able to do something like it.

Triggers for people’s major decisions [02:08:44]

Arden Koehler: So I think we should probably wrap up, but before we do, is there any sort of underappreciated idea that you think our listeners would benefit from hearing?

Danny Hernandez: Yeah, I mean something that you guys talked about, well that Allan Dafoe talked about on one of your podcasts, was that he had decided that he was going to work on AI policy once we beat Go. That was going to be his trigger. He could kind of ignore AI until that happened, and then he would work on it. And I think this is a very good way of viewing things that there isn’t a lot of evidence for right now. Or that the evidence seems too murky or too small amounts of it to make a decision, is to just be like when would there be enough evidence for me to pay attention to this thing or to start working on it, and to think about those kinds of triggers for people’s major decisions around AI or around other things.

Arden Koehler: I mean it must be kind of hard to pick a trigger, right? And then not reconsider it? I could imagine taking the Go trigger but then you get there and you’re like, “Oh, but I was misguided. I didn’t realize how unimpressive that was or this is going to be” or something like that.

Danny Hernandez: Yeah, I think there’s some downside. I think what happens otherwise is that it’s like reality is unsurprising and you just think that you expected that. So I think there’s trade-offs.

Arden Koehler: At least you know that there was a time when you didn’t expect that to be business as usual.

Danny Hernandez: I think commitments like that you should maybe think of as like 80 or 90% commitments or something. Like you haven’t literally bound your hands, but this is your strong intention as to what would convince you and I think that that’s a thing to do. I think another way you could phrase it is if you believe AI progress is fast, what would progress look like that would convince you it’s slow? Paint a picture of that five years from now. What does slow progress look like to you? And now you’re like, “Oh yeah, progress is actually slow”. And what could have happened that would convince you that it’s actually fast. But you can make what would update you clear to yourself and others and that for big decisions, this is generally worthwhile. It’s like a lot of rigor to do for smaller things. Yeah, I like to think about this.

Arden Koehler: Yeah, I guess it’s another version of getting precise about your beliefs.

Danny Hernandez: Yeah, that’s my style.

Arden Koehler: Yeah. Cool, well this has been really interesting and I feel like I’ve learned a lot from this interview, so thank you so much.

Danny Hernandez: Thanks Arden. It’s been great.

Rob’s outro [02:11:09]

I hope you enjoyed that conversation between Arden and Danny.

If you’d like to hear more of Arden, you can find that in episode #67 with David Chalmers, episode #72 with Toby Ord, #66 with Peter Singer, and episode #75 with Michelle Hutchinson.

There’s also a bonus conversation between me and her on demandingness, work-life balance and injustice that came out back on February 25th.

The 80,000 Hours Podcast is produced by Keiran Harris. Audio mastering by Ben Cordell. Transcripts by Zakee Ulhaq.

Thanks for joining, talk to you again soon.

Learn more

Working in US AI policy

Improving decision making (especially in important institutions)

Related episodes

August 21, 2018

#40 – Katja Grace on forecasting future technology & how much we should trust expert predictions.

Listen now

June 28, 2019

#60 – Prof Tetlock on why accurate forecasting matters for everything, and how you can do it better

Listen now

November 20, 2017

#15 – Phil Tetlock on how chimps beat Berkeley undergrads and when it's wise to defer to the wise

Listen now

June 6, 2017

#1 – Miles Brundage on the world's desperate need for AI strategists and policy experts

Listen now

October 2, 2018

#44 – Paul Christiano on how OpenAI is developing real solutions to the 'AI alignment problem', and his vision of how humanity will progressively hand over decision-making to AI systems

Listen now

July 21, 2017

#3 – Dario Amodei on OpenAI and how AI will change the world for good and ill

Listen now

About the show

The 80,000 Hours Podcast features unusually in-depth conversations about the world's most pressing problems and how you can use your career to solve them. We invite guests pursuing a wide range of career paths — from academics and activists to entrepreneurs and policymakers — to analyse the case for and against working on different issues and which approaches are best for solving them.

Get in touch with feedback or guest suggestions by emailing [email protected].

What should I listen to first?

We've carefully selected 10 episodes we think it could make sense to listen to first, on a separate podcast feed:

Check out 'Effective Altruism: An Introduction'

Subscribe here, or anywhere you get podcasts:

If you're new, see the podcast homepage for ideas on where to start, or browse our full episode archive.

#78 – Danny Hernandez on forecasting and the drivers of AI progress

#78 – Danny Hernandez on forecasting and the drivers of AI progress

On this page:

Highlights

The question of which experts to believe

Moore's law

The foresight team at OpenAI

Hardware expertise

Getting precise about your beliefs

Articles, books, and other media discussed in the show

Transcript

Rob’s intro [00:00:00]

The interview begins [00:01:29]

Forecasting [00:07:11]

Improving the public conversation around AI [00:14:41]

Danny’s path to OpenAI [00:24:08]

Calibration training [00:27:18]

AI and Compute [00:45:22]

AI and Efficiency [01:09:22]

Safety teams at OpenAI [01:39:03]

Careers [01:49:46]

AI hardware as a possible path to impact [01:55:57]

Triggers for people’s major decisions [02:08:44]

Rob’s outro [02:11:09]

Learn more

Working in US AI policy

Improving decision making (especially in important institutions)

Related episodes

#40 – Katja Grace on forecasting future technology & how much we should trust expert predictions.

#60 – Prof Tetlock on why accurate forecasting matters for everything, and how you can do it better

#15 – Phil Tetlock on how chimps beat Berkeley undergrads and when it's wise to defer to the wise

#1 – Miles Brundage on the world's desperate need for AI strategists and policy experts

#44 – Paul Christiano on how OpenAI is developing real solutions to the 'AI alignment problem', and his vision of how humanity will progressively hand over decision-making to AI systems

#3 – Dario Amodei on OpenAI and how AI will change the world for good and ill

About the show

What should I listen to first?

Our research

Follow us

Take action

About us