Neel Nanda

Researcher at DeepMind, with a focus on mechanistic interpretability for AI safety

When it came to the end of his undergraduate maths degree at Cambridge, Neel was considering two options: quantitative finance, or continuing on to a master’s.

He’d spent his summers doing internships in quantitative finance, and enjoyed them a lot. At the same time, his degree was going well, and continuing into a master’s felt like the default option for people doing well at his university.

But for years he’d been reading about the possible risks of an AI-related catastrophe. Ultimately, he wanted to help reduce those risks, but wasn’t sure how to use his career to do so, or whether he actually wanted to.

He spoke to our team, and we were able to connect him to people doing real research in technical AI safety. Neel says that the call got him thinking about AI safety in a much more concrete and coherent way, and helped him feel like it was a real option worth taking action on.

So, instead of quantitative finance, or doing a master’s in maths, he decided to take a year to do internships at labs working on AI safety, with a variety of different approaches. We helped Neel find an internship at the Future of Humanity Institute at the University of Oxford, and he also did internships at DeepMind and the Center for Human-Compatible AI at UC Berkeley. These internships helped Neel feel like he could really see himself pursuing a longer-term career path as an AI safety researcher. He also became more convinced that AI safety was important: he was sceptical of longtermism, but became convinced that the risk was pressing enough to be a high-impact way to save lives today.

Neel received an offer to work on mechanistic interpretability with Chris Olah at Anthropic, and took it.

In hindsight, Neel thinks this was a great decision, but made for the wrong reasons. He significantly underweighted the importance of being mentored by fantastic researchers like Chris, and he underweighted the fact that he was more intrinsically excited about mechanistic interpretability than other areas of machine learning. And he overestimated the importance of factors like being unsure if wanted to pursue mechanistic interpretability long term: switching isn’t that expensive. Neel ended up leaving after a few months, but thinks he learnt and grew a lot while he was there.

He decided to spend some time doing independent work, which went very well (though Neel thinks he got lucky, and does not recommend this for everyone!). Neel published a spotlight paper on mechanistic interpretability at ICLR, a top machine learning conference.

In 2022, Neel joined DeepMind’s mechanistic interpretability team as researcher.

To learn more about Neel’s work and AI safety more generally, check out:

Do you want to receive one-on-one advice too?

Neel has spoken to our team three times over the course of his career. If you need help deciding what to do next, our team might be able to help.

We’re a nonprofit, so everything we provide, including our one-on-one advice, is free.

We can help you compare options, make connections, and possibly even find jobs or funding opportunities.