Technical AI safety upskilling resources
Sometimes, our advising team speaks to people who have enthusiasm for technical AI safety and a related skill set but need concrete ideas for how to enter the field. This list was developed in consultation with our advisors to find the resources they commonly share, including articles, courses, organisations, and fellowships.
While we recommend applying to speak to an advisor for 1-1 tailored guidance, this page gives a practical, non-comprehensive snapshot of how you might move from ‘interested in technical AI safety’ to ‘starting to work on technical AI safety.’
Overviews:
- AISafety.com
- Shallow Review of Technical AI Safety by technicalities, Stag, Stephen McAleese, jordine, Dr. David Mathers
- AI Safety Technical Research Career Guide - How to Enter by 80,000 Hours
- Levelling Up in AI Safety Research Engineering by Gabriel Mukobi
- Recommendations for Technical AI Safety Research Agendas by Anthropic
- Technical AI Safety Research Areas by Open Philanthropy
- Foundational Challenges in Assuring Alignment and Safety of Large Language Models by Anwar et al.
- An overview of areas of control work by Ryan Greenblatt, Redwood Research
- AI Safety Needs Great Engineers by Andy Jones
Staying up to date (podcasts, newsletters, etc):
- Follow top recent papers on the Alignment Forum, Zvi Mowshowitz’s Substack, the AI Safety Newsletter (CAIS), PapersWithCode, or a curated Twitter list.
- Alignment Workshop videos from FAR.AI
- The 80,000 Hours Podcast
- Dwarkesh Podcast
- The Cognitive Revolution Podcast
- AI X-Risk Podcast
Courses:
- ARENA’s curriculum
- BlueDot Impact’s AI Alignment course
- Andrej Karpathy’s Zero to Hero course
- His YouTube videos can also be great intro-friendly resources, as can 3Blue1Brown’s deep learning videos.
- Deep Learning Curriculum by Jacob Hilton
- Google ML Course
Ideas for Projects and Upskilling:
- “What are some projects I can try?” (AISafety.Info)
- 100+ concrete projects and open problems in evals by Marius Hobbhahn
- A list of 45+ Mech Interp Projects by Apollo Research
- Open Problems in Mechanistic Interpretability by Sharkey et al.
- Consider joining an alignment hackathon such as an Apart Research Sprint.
- Consider joining Eleuther’s community of researchers on their Discord.
- Consider writing a task using the METR framework.
- Consider writing your research theory of change (workshop slides, Michael Aird).
Advice from researchers:
- Karpathy on PhDs, research agendas, career advice
- Ethan Perez’s Tips for Empirical Alignment Research and Workflows
- Gabriel Mukobi’s ML Safety Research Advice
- Richard Ngo’s AGI safety career advice
- Marius Hobbhahn’s advice for independent research
- Neel Nanda’s Highly Opinionated Advice on How to Write ML Papers
- Adam Gleave on whether to do a PhD
- Lewis Hammond’s Advice (for doing a PhD (in AI (Safety)))
Fellowships:
- Anthropic AI Safety Fellowship
- MATS
- LASR Labs
- SPAR
- Pivotal
- ARENA
- Cambridge ERA AI
- CHAI Research Fellowship + Internship
- Global AI Safety Fellowship
Organisations:
- A larger list is available here: Overview of the AI safety ecosystem
- AI Alignment Research Center
- Apollo
- Center for Human-Compatible AI
- Conjecture
- FAR AI
- Goodfire
- METR
- Palisade
- Redwood Research
Key descriptions of the alignment problem:
- Ajeya Cotra – Without specific countermeasures, the easiest path to transformative AI likely leads to AI takeover
- Paul Christiano – What failure looks like
- Richard Ngo – The alignment problem from a deep learning perspective
- Joe Carlsmith – Is Power-Seeking AI an Existential Risk?