Ethan Perez

Ethan is a research scientist at Anthropic who first became interested in AI safety after coming across 80,000 Hours’ advice.

Ethan Perez came across 80,000 Hours as an undergraduate studying computer science. The page detailing the world’s most pressing problems intrigued him, and he started thinking about challenges posed to humanity by the development of advanced artificial intelligence. In addition to reading the 80,000 Hours website, he decided to read Superintelligence: Paths, Dangers, Strategies by Nick Bostrom. He found the argument that developments in AI could have extremely large impacts over the long term to be compelling.

He became convinced that AI safety was worth investigating more deeply. At the same time, he was reexamining his faith, exploring important philosophical questions, and trying to figure out what he really valued.

In one-on-one advising with 80,000 Hours, he got to discuss challenging questions about the long-term impacts of AI and whether such considerations should influence his career trajectory. He was particularly inspired by proposals for an AI safety technique that pits two systems against each other in a debate, with a human judging who wins. So he took an internship at Mila, an AI hub in Quebec.

A while later, Ethan was considering whether he should leave his internship and postpone his plans for a PhD to start a robotics startup with a friend from his lab. After a last-minute follow-up call with one of our advisors to talk through this decision, he decided that he should probably focus on gaining more research experience and getting a PhD.

He later earned his PhD in computer science at New York University, writing his thesis on fixing undesirable behaviour in language models. In his efforts to contribute to the field of AI safety, he has written papers on empirically testing debate-based AI safety tactics, finding safety failures in language models, improving methods to teach AI systems based on human feedback, and refining question-answering capabilities.

In 2022, Ethan took a role as research scientist at Anthropic, a frontier AI company whose mission statement is “to ensure transformative AI helps people and society flourish.”

He is currently leading the adversarial robustness team that does research aimed at reducing the risk of catastrophic outcomes from advanced machine learning systems. His recent work has identified vulnerabilities in state-of-the-art AI safety training techniques, specifically how LLMs can be ‘sleeper agents,’ which we discussed in this episode of The 80,000 Hours Podcast. Research like this can hopefully help steer the development of AI models in safer directions.