#223 – Neel Nanda on leading a Google DeepMind team at 26 – and advice if you want to work at an AI company (part 2)

At 26, Neel Nanda leads an AI safety team at Google DeepMind, has published dozens of influential papers, and mentored 50 junior researchers — seven of whom now work at major AI companies. His secret? “It’s mostly luck,” he says, but “another part is what I think of as maximising my luck surface area.”

This means creating as many opportunities as possible for surprisingly good things to happen:

  • Write publicly.
  • Reach out to researchers whose work you admire.
  • Say yes to unusual projects that seem a little scary.

Nanda’s own path illustrates this perfectly. He started a challenge to write one blog post per day for a month to overcome perfectionist paralysis. Those posts helped seed the field of mechanistic interpretability and, incidentally, led to meeting his partner of four years.

His YouTube channel features unedited three-hour videos of him reading through famous papers and sharing thoughts. One has 30,000 views. “People were into it,” he shrugs.

Most remarkably, he ended up running DeepMind’s mechanistic interpretability team. He’d joined expecting to be an individual contributor, but when the team lead stepped down, he stepped up despite having no management experience. “I did not know if I was going to be good at this. I think it’s gone reasonably well.”

His core lesson: “You can just do things.” This sounds trite but is a useful reminder all the same. Doing things is a skill that improves with practice. Most people overestimate the risks and underestimate their ability to recover from failures. And as Neel explains, junior researchers today have a superpower previous generations lacked: large language models that can dramatically accelerate learning and research.

In this extended conversation, Neel discusses all that and some other hot takes from his four years at Google DeepMind. (And be sure to check out part one of Rob and Neel’s conversation!)

This episode was recorded on July 21.

Video editing: Simon Monsour and Luke Monsour
Audio engineering: Ben Cordell, Milo McGuire, Simon Monsour, and Dominic Armstrong
Music: Ben Cordell
Camera operator: Jeremy Chevillotte
Coordination, transcriptions, and web: Katy Moore

Highlights

How Neel uses LLMs to get much more done

Rob Wiblin: To what extent do you think junior people can in practice use LLMs or use AI to skill up and get closer to the frontier of knowledge or research ability more quickly today than they could four years ago when these tools were not really that useful?

Neel Nanda: Oh, so much. If you’re trying to get into a field nowadays and you’re not using LLMs, you’re making a mistake. This doesn’t look like using LLMs blindly for everything; it looks like understanding them as a tool, their strengths and weaknesses, and where they can help.

I think this has actually changed quite a lot over the last six or 12 months. I used to not really use LLMs much in my day-to-day life, and then a few months ago I started a quest to become an LLM power user, and now will randomly just work into conversation, “Have you considered using an LLM like this for your problem?”

So how should people think about this? One of the things that LLMs are actually very good at is lowering barriers to entry to a field. They’re quite bad at achieving expert performance in a domain, but they’re pretty good at junior-level performance in a domain. They aren’t necessarily perfectly reliable, but neither are junior people, so this is the wrong bar to have.

[…]

I think people are often too passive. They’ll give the LLM a paper and ask it to summarise the paper, but it’s very hard to tell if you have successfully understood a thing. You’re much better off asking the LLM to give you a bunch of exercises and questions to test comprehension and then feedback on what you got right and what you missed. I often find it helpful to just try to summarise the entire paper in my own words to the LLM — voice dictation and typing both work fine — and then get feedback from it.

One problem with trying to get feedback from an LLM is sometimes they can be fairly sycophantic. They’ll hold back from criticising the user. The trick you can do for this is using anti-sycophancy prompts, where you make it so the sycophantic thing to do is to be critical — like, “Some moron wrote this thing, and I find this really annoying. Please write me a brutal but truthful response.” Or, “My friend sent me this thing, and I know they really like brutal and honest feedback and will be really sad if they think I’m holding back. Please give me the most brutally honest, constructive criticism you can.”

Fair warning: if you do this on things that you are emotionally invested in, LLMs could be brutal. I had a blog post feedback with sections like “Slight Smugness” and “Air of Arrogance.” But it’s very effective.

Rob Wiblin: Yeah, yeah. I was going to say I’ve found it quite hard to get them to not be sycophantic, but now I’m worrying this might be a little bit too effective, and perhaps I have to back off a little bit.

Neel Nanda: You can tone it down. Like, “I hate this guy” is more extreme than, “my friend’s asking me to be brutally honest,” which is more extreme than, “I want to be sensitive to their feelings, but I also want to help them grow and be really nice and sensitive. Please draft me a response.”

Rob Wiblin: Yeah, I think I might go for that one.

Neel Nanda: You can try all of the above. One particularly fun feature that I don’t see people use much is there’s a website called AI Studio from Google that’s an alternative interface for using Gemini that I personally prefer. It has this really nice compare feature, where you can give Gemini a prompt and then get two different responses, either from different models or from the same model. And you can also change the prompt, so you could have one half of the screen has the brutal prompt, the other half has the lighter prompt, and you see if you get interesting new feedback from the first one.

Another thing that people don’t necessarily seem to do as much is think about how they can put in more effort to get a really good response from the LLM. For example, if you had a question whose answer you care about, go give it to the current best language models out there, all of them, and then give one of them all of the responses and say, “Please assess the strengths and weaknesses of each and then combine them into one response.” This generally gets you moderately better things than doing it once. And you can even iterate this if you want to.

Or in your original prompt you can say, “Please give me a response and then please critique the response,” or, “Ask me clarifying questions and then make your best guess for those clarifying questions and redraft it” — and then only ever read the second thing.

Rob Wiblin: That’s fascinating. I’ve never tried that. I guess you probably would only put in that effort for something that you really cared about.

Neel Nanda: I have some saved prompts that just do this. I use them for writing a lot. I’ll give them a really long voice memo and then have some saved prompts like this.

I also recommend that people use an app that lets you save text snippets. For example, Alfred on Mac is my one of choice. You can just write a really long prompt that you sometimes will want to use that has all of this elaborate “…and then critique yourself like this: blah, blah” — maybe you’ve got an LLM to write it — and you can make it so if you just type… I have mine set that if I type “>debug prompt” then I get my really long debugging prompt. And this means that you actually use this stuff because it’s really low friction.

An AI PhD – with these timelines?!

Rob Wiblin: So you don’t have a PhD yourself. You potentially saved many years of study that way. As an outsider, it’s hard for me to imagine that the best move in the AI industry right now would be to go away and spend four or five or six years possibly writing a thesis, doing a full PhD — at least if you could get a job in the industry in some sort of ML research some other way.

Is that kind of the right intuition? That this is no moment to be taking that much time away from advancing your career directly?

Neel Nanda: It’s complicated. I think one common mistake people make is they assume you need to finish a PhD. This is completely false. You should just view PhDs as an environment to learn and gain skills. And if a better opportunity comes along, it’s often quite easy to take a year’s leave of absence. Sometimes you should just drop out. If you’ve got an option that’s having a job doing research at a serious organisation that you expect to go better than your PhD, that’s kind of the point of doing a PhD. You’re done. You’re done early. Leave. I don’t know.

Someone I recently hired, Josh Engels, was doing a PhD in interpretability, had done some really fantastic work. And I was just already convinced he was excellent and managed to convince him to drop out a few years early to come join my team. I know I’m very biased, but I think this is the right call.

That aside, should people do them at all? I think that one of the most useful ways to get into research is mentorship. You can get mentorship from PhD supervisors or other people in the lab. You can get it from colleagues and your manager if you join an industry research team. But different places vary a lot in what you’ll learn and how valuable an experience it will be.

I think something people often don’t realise is that the skill as a manager of the person who you’ll be working with — either your manager in industry or your PhD supervisor doing a PhD — is incredibly important, and it is a very different skill from being a good researcher. Often the most famous PhD supervisors are the worst because they are really busy and they don’t have time for you, or because being a good researcher just doesn’t translate into being a patient supervisor, caring about nurturing people, making time for them. And the same applies for industry roles.

I like to think I make an environment where people on my team can learn, have some research autonomy, and grow as researchers. Because if nothing else, my life’s a lot easier if people on my team are great researchers. The incentives are very aligned.

But there are some teams where you get much less autonomy and they might want you to just be an engineer or something. Or you might be able to do research, but on a very specific agenda. And I do think engineering is a real skill set. It’s often hard to learn in a PhD and easy to learn in industry, but so is doing research.

One thing I will particularly call out is I think that we do still need people who can set new research agendas, come up with creative new ideas. Often PhDs are better environments for this, because you can just do whatever you want and no one can really stop you. And often supervisors are fine with this. Managers in industry less so.

Some of the people I respect who’ve done PhDs and recommend it emphasise this as one of the crucial reasons: that we really want more people who can lead new research agendas. And being kind of thrown into the wilderness on your own for a while to figure this out can be one of the best ways to learn this.

Navigating towards impact at a frontier AI company

Rob Wiblin: What’s something interesting you’ve learned about trying to have an impact in a frontier AI company, especially a large organisation like Google DeepMind?

Neel Nanda: So I definitely learned a lot about how organisations work, in my time actually working in real companies. But maybe to begin, I just want to talk about what I’ve learned about large organisations in general — nothing to do with DeepMind specifically — which is that it’s very easy to think of them as a monolith: there’s some entity with a single decision maker who is acting according to some objective.

But this is just fundamentally not a practical way to run an organisation of more than, say, 1,000 people. If it’s a startup, everyone can kind of know what’s going on, know each other, have context. But if you’ve got enough people, you need structure and bureaucracy. There are a bunch of people who decision making power is delegated to. There are a bunch of stakeholders who are responsible for protecting different things important to the org, who will represent those interests.

These decision makers are busy people, and they will often have advisors or sub-decision makers they listen to. And sometimes decisions get made pretty far down the tree, but if they’re important, or if there’s enough disagreement, they go to more and more senior people until you have someone who’s able to just make a decision. But this means that if you go into things expecting any large organisation to be acting like a single, perfectly coherent entity, you’ll just make incorrect predictions.

Rob Wiblin: Like what? Does it mean that they end up making conflicting decisions? That one group over here might be pushing in this direction, another group over there might be pushing in another direction, and until something has escalated to a manager who has oversight over all of it, you can just have substantial incoherency?

Neel Nanda: So that kind of thing can happen. But maybe the thing that I found most striking is I think of it as these companies are not efficient markets internally.

Unpacking what I mean by that: when I’m considering trading in the stock market, I generally don’t. That’s on the grounds of: if there was money to be made, someone else will have already made it. And I kind of had some similar intuition of, “If there’s a thing to do that will make the company money, someone else is probably making it happen; if it won’t make the company money, probably no one will let this happen. Therefore, I don’t really know how much impact a safety team could have.”

I now think this is incredibly wrong in many ways. Even just within the finance analogy, markets are not actually perfectly efficient: hedge funds sometimes make a lot of money because they have experts who know more and can spot things people are missing. As the AGI safety team, we can spot AGI-safety-relevant opportunities that other people are missing.

But more importantly, financial markets just have a tonne of people whose sole job is spotting inefficiencies and fixing them. Companies generally do not have people whose sole job is looking over the company as a whole and spotting inefficiencies and fixing them — especially when it comes to ways you could add value safety wise. People are often busy. People need to care about many things.

There are often some cultural factors that lead to me prioritising different things. For example, a pretty common mindset in machine learning people is: focus on the problems that are there today. Focus on the things that you’re very confident are issues now, and fix those. It’s really hard to predict future problems. Don’t even bother; just prioritise noticing problems and fixing them.

In many contexts I think this is actually extremely reasonable. I think that it’s more difficult with safety because there can be more subtle problems, and it can take longer to fix the problems, so you need to start early on. And this means that if we can identify opportunities like that, there’s just a lot of things where no one minds the thing happening; people might actively be pro the thing happening — but if safety teams don’t make them happen, it will take a really long time, or won’t happen at all.

I want to emphasise that, by and large, I don’t think this is people being negligent or unreasonable or anything like that. It’s just people with different perspectives, different philosophies of how you prioritise solving problems, what they view as their job. And as someone who’s very concerned about AGI safety, I can help kind of nudge this in a better direction.

Is a special skill set needed to guide large companies?

Neel Nanda: Sometimes I see people who just say, “I care about safety, so I should go work at a frontier AI company and this will make things good.” This is probably better than nothing in general, but I think you can do much better by having a plan.

One type of plan is the kind of thing I just outlined of either “be very good at navigating an org” or “be widely respected technical experts.” But the vast majority of people having an impact on safety within DeepMind are not doing that. Instead, it’s more like there’s a few senior people who are kind of interfaced with the rest of the org — and a lot of the impact that, say, the people on my team can have is by just doing good research to support those people who are pushing for safer changes.

Because good engineers and good researchers can do things like show that a technique works, invent a new technique, reduce costs, build a good evidence base for the effectiveness of the technique and that it doesn’t have big side effects or costs, or just implement the technique. It’s a lot easier for decision makers to say, “Yes, we will accept the code you have already written for us.” And this is a lot more time consuming in some senses than being the person who navigates the org.

And I think that if there are people that people respect in the org — and I have a lot of respect for some of DeepMind’s safety leadership, like Rohin Shah and Anca Dragan — just trying to empower those people can be great.

I also think that trying to empower a team can be very impactful. Recently on the AGI safety team we’ve started a small engineering team whose job is just trying to accelerate everyone’s research. And you might think that this is a mature tech company, so surely there’s nothing that could be improved. But there’s just a lot of ways you can improve the research iteration speed of a small team with fairly idiosyncratic use cases. And I think this team has been super impactful.

[…]

Rob Wiblin: Do you think that, for someone who is reasonably safety focused and has strong machine learning chops, are they likely to have more impact by going and working at a company that is already quite safety focused like Google DeepMind? Or perhaps could they have even more impact by going and working at a company, companies that will remain nameless, where there’s maybe less of a safety culture and less of a safety investment, and they might be one of relatively few people who have that as a significant personal priority? Should people seriously consider going to the less safety-focused labs?

Neel Nanda: I think it’s somewhere in the middle. I care a lot about every lab making frontier AI systems having an excellent safety team, whatever the leadership philosophy of that lab is. But in the kinds of labs where it’s harder to get this kind of thing through and there’s less interest, I think there are certain kinds of people who can have a big impact, but that many people will largely not.

But how could a listener tell if this might be a fit for them? I think the kind of person who would be good at this is:

  • Someone who has experience working in and navigating organisations effectively.
  • Someone who kind of has agency, who’s able to notice and make opportunities for themselves.
  • Someone who’s comfortable with the idea of working with a bunch of people who they disagree with on maybe quite important things — where ideally, it doesn’t feel like you’re kind of gritting your teeth every day into work. It’s just, “This is a job. These are people. I disagree with them. I can leave that aside.” I think it’s a much healthier attitude, less likely to lead to burnout, and will also just make you more effective at diplomacy.
  • And finally, people who are good at thinking independently. I personally find it a bit hard to maintain a very different perspective from the people around me. I can, but it requires active effort. Some people are very good at this, and some people are much worse than me at this. If you are my level or worse, probably you shouldn’t do this.

"If your safety work doesn't advance capabilities, it's probably bad safety work"

Rob Wiblin: One provocative take you had in your notes for the episode is if your safety work doesn’t advance capabilities a bit, it’s probably bad safety work. Can you explain and defend that?

Neel Nanda: Yeah. I often see people in the safety community criticising safety work because they’re like, “But isn’t this capabilities work?” In the sense of, “Doesn’t this make the model better?”

I think this just doesn’t make sense as a critique, because the goal of safety is to have models that do what we want. Even related things like control are about trying to have models that don’t do the things we don’t want. This is incredibly commercially useful. We’re already seeing safety issues like reward hacking make systems less commercially applicable. Things like hallucinations, jailbreaks. And I expect as time goes on, the more important safety issues from an AGI safety perspective will also start to matter.

This means that criticising work for being useful just doesn’t really make sense. I would in fact criticise work that doesn’t have a path to being useful as either: it’s something like evals that’s kind of a different theory of impact, or it sure sounds like you don’t have a plan for making systems better at doing what we want with this technique. What’s the point?

I want to emphasise that I’m not just saying you should do whatever you want. I think that there’s work that differentially advances safety over just general capabilities and work that doesn’t. But I just think it’s pretty nuanced, and I think that people might have counterarguments like, “But if it helps the system be a more useful product, won’t companies do it?” And I just don’t think this is a realistic model of how this kind of thing works.

Rob Wiblin: Yeah, I was going to raise that issue: if your safety work is doing a lot to make the model more useful, and especially more commercially useful, then plausibly the company will invest in it, regardless of whether you’re involved or not — because it’s just on the critical path of actually making the product that they can sell. But I guess you just think things are a bit more scrappy and ad hoc than that. It’s not necessarily the case that AI companies are always doing all of the things that would make their products better. Not by any means.

Neel Nanda: Kind of. The way I’d think about it is it’s kind of a depth and prep time thing. So if something becomes an issue, people will try to fix it — but there’ll be a lot of urgency, and there won’t be as much room to research more creative solutions; there won’t be enough room to try to do the thing properly. Often if there’s multiple approaches to a thing, and I think some are better than others, there won’t be interest in trying the less proven thing.

For example, the standard way to fix a safety issue is to just add more fine-tuning data fixing the issue. This often works. But I think that for things like deception, it would not work. And I want to make sure there are other tools that people see as realistic options — like chain of thought monitoring or evaluations with interpretability — that people see as an important part of the process that are realistic options. And I think that if you just wait for commercial incentives to take over, you don’t really get this.

I think that there’s work that makes models better without really getting safety benefits, and there’s work that is centrally about making the thing do what we want more. I think people should try to do the second kind of thing. But the question is: “Does this differentially advance safety?” Not, “Does this advance capabilities at all?” And if it advanced both a lot, that might be excellent.

Remember: You can just do things

Neel Nanda: I think one of the most important lessons I’ve learned is that you can just do things. This may sound kind of trite, but I think there’s a bunch of non-obvious things I’ve had to learn here.

The first one is that doing things is a skill. I’m a perfectionist. I often don’t want to do things. I’m like, “This seems risky. This could go wrong.” And the way I broke this myself was I challenged myself to write a blog post a day for a month. And that’s how I met my partner of the past four years.

Rob Wiblin: Wonderful.

Neel Nanda: And it also helped me produce a bunch of the public output that’s helped build the field of mech interp.

Another part of this is what I think of as kind of maximising your luck surface area. You want to just have as many opportunities as possible for good opportunities to come your way. You want to know people. You want to be someone who sometimes says yes, so people bring things to you. You want to just get involved in a bunch of things.

You also want to be willing to do things that are kind of a bit weird or unusual. One of the most popular videos on my YouTube channel, with like 30,000 views, was I read through one of the famous mech interp papers, “A mathematical framework for transformer circuits,” for three hours and just gave takes. I did no editing, put it on YouTube. People were into this.

And maybe as a final example, I kind of ended up running the DeepMind team by accident. I joined DeepMind expecting to be an individual researcher. Then unexpectedly, the lead decided to step down a few months after I joined. And in the months since, I ended up stepping into their place. I did not know if I was going to be good at this. I think it’s gone reasonably well. To me, this is both an example of the importance of having luck surface area — being in a situation where opportunities like that can arise — but also that you should just say yes to things, even if they seem kind of scary, and you’re not confident they’ll go well, so long as the downside is pretty low.

And worst case, I just didn’t do a good job leading the team. I stepped down. We had to pick someone else out.

Rob Wiblin: Seems to have gone pretty well.

Related episodes

About the show

The 80,000 Hours Podcast features unusually in-depth conversations about the world's most pressing problems and how you can use your career to solve them. We invite guests pursuing a wide range of career paths — from academics and activists to entrepreneurs and policymakers — to analyse the case for and against working on different issues and which approaches are best for solving them.

Get in touch with feedback or guest suggestions by emailing [email protected].

What should I listen to first?

We've carefully selected 10 episodes we think it could make sense to listen to first, on a separate podcast feed:

Check out 'Effective Altruism: An Introduction'

Subscribe here, or anywhere you get podcasts:

If you're new, see the podcast homepage for ideas on where to start, or browse our full episode archive.