Reading list for understanding AI and how it could be dangerous
Want to get up to speed on the state of AI development and the risks it poses? Our site provides an overview of key topics in this area, but obviously there’s a lot more to learn.
We recommend starting with the following blog posts and research papers. (Note: we don’t necessarily agree with all the claims the authors make, but still think they’re great resources.)
Key blog posts
Scaling up: how increasing inputs has made artificial intelligence more capable by Veronika Samborska at Our World in Data
The article concisely explains how AI has gotten better in recent years primarily by scaling up existing systems rather than by making more fundamental scientific advances.
How we could stumble into AI catastrophe by Holden Karnofsky on Cold Takes
Holden Karnofsky makes the case that if transformative AI is developed relatively soon, it could result in global catastrophe.
AI could defeat all of us combined by Holden Karnofsky on Cold Takes
Read this to understand why it’s plausible that AI systems could pose a threat to humanity, if they were powerful enough and it would further their goals.
Machines of loving grace — How AI could transform the world for the better by Anthropic CEO Dario Amodei
It’s important to understand why there’s enthusiasm for building powerful AI systems, despite the risks. This post from an AI company CEO paints a positive vision for powerful AI.
Computing power and the governance of AI by Lennart Heim et al. at the Centre for the Governance of AI
Experts in AI policy argue that governing computational power could be a key intervention for reducing risks, though it also raises risks of its own.
Why AI alignment could be hard with modern deep learning by Ajeya Cotra on Cold Takes
This piece explains why existing AI techniques may make it hard to create powerful AI systems over the long term that remain under human control.
The most important graph in AI right now: time horizon by Benjamin Todd
How would we know if AI is really on track to make big changes in society? Benjamin Todd argues that the length of tasks AI can do is the most important metric to look at.
Key research papers
Preparing for the intelligence explosion by William MacAskill and Fin Moorhouse at Forethought Research
These authors argue that an intelligence explosion will compress a century of technological progress into a decade, creating numerous grand challenges beyond just AI alignment that humanity must prepare for now.
Can scaling continue to 2030? by Jaime Sevilla et al. at Epoch AI
Available data suggests AI companies can continue scaling their systems through 2030, primarily facing constraints in power availability and chip manufacturing capacity.
Is power-seeking AI an existential risk? by Joe Carlsmith
This is one of the central papers putting together the argument that extremely powerful AI systems could pose an existential threat to humanity.
Scheming AIs: Will AIs fake alignment during training in order to get power? by Joe Carlsmith
Here’s an in-depth argument that it may be hard to create AI systems without incentivising them to deceive us.
AI 2027 by Daniel Kokotajlo, Scott Alexander, Thomas Larsen, Eli Lifland, and Romeo Dean
This speculative scenario explains how superhuman AI might be developed and deployed in the near future.
Gradual disempowerment by Jan Kulveit, Raymond Douglas, Nora Ammann, Deger Turan, David Krueger, and David Duvenaud
Even if we avoid the risks of power-seeking and scheming AIs, there may be other ways AI systems could disempower humanity.
AI tools for existential security by Lizka Vaintrob and Owen Cotton-Barratt
While AI systems may pose existential risks, these authors argue that we may be able to develop some AI technology that will reduce existential risk.
Taking AI welfare seriously by Robert Long, Jeff Sebo, et al.
This paper makes a thorough case that we shouldn’t only worry about the risks AI poses to humanity — we also need to potentially consider the interests of future AI systems as well.