Note from the author: At its core, this problem profile tries to predict the future of technology. This is a notoriously difficult thing to do. In addition, there has been much less rigorous research into the risks from AI than into the other risks 80,000 Hours writes about (like pandemics or climate change).1 That said, there is a growing field of research into the topic, which I’ve tried to reflect. For this article I’ve leaned especially on this report by Joseph Carlsmith at Open Philanthropy (also available as a narration), as it’s the most rigorous overview of the risk that I could find. I’ve also had the article reviewed by over 30 people with different expertise and opinions on the topic. (Almost all are concerned about advanced AI’s potential impact.)
If you have any feedback on this article — whether there’s something technical we’ve got wrong, some wording we could improve, or just that you did or didn’t like reading it — we’d really appreciate it if you could tell us what you think using this form.
Why do we think that reducing risks from AI is one of the most pressing issues of our time? In short, our reasons are:
We’re going to cover each of these in turn, then consider some of the best counterarguments, explain concrete things you can do to help, and finally outline some of the best resources for learning more about this area.
In May 2023, hundreds of AI prominent scientists — and other notable figures — signed a statement saying that mitigating the risk of extinction from AI should be a global priority.
So it’s pretty clear that at least some experts are concerned.
But how concerned are they? And is this just a fringe view?
We looked at four surveys of AI researchers who published at NeurIPS and ICML (two of the most prestigious machine learning conferences) from 2016, 2019, 2022 and 2023.3
It’s important to note that there could be considerable selection bias on surveys like this. For example, you might think researchers who go to the top AI conferences are more likely to be optimistic about AI, because they have been selected to think that AI research is doing good. Alternatively, you might think that researchers who are already concerned about AI are more likely to respond to a survey asking about these concerns.4
All that said, here’s what we found:
In all four surveys, the median researcher thought that the chances that AI would be “extremely good” was reasonably high: 20% in the 2016 survey, 20% in 2019, 10% in 2022, and 10% in 2023.5
Indeed, AI systems are already having substantial positive effects — for example, in medical care or academic research.
But in all four surveys, the median researcher also estimated small — and certainly not negligible — chances that AI would be “extremely bad (e.g. human extinction)”: a 5% chance of extremely bad outcomes in the 2016 survey, 2% in 2019, 5% in 2022 and 5% in 2023.6
In the 2022 survey, participants were specifically asked about the chances of existential catastrophe caused by future AI advances — and again, over half of researchers thought the chances of an existential catastrophe was greater than 5%.7
So experts disagree on the degree to which AI poses an existential risk — a kind of threat we’ve argued deserves serious moral weight.
This fits with our understanding of the state of the research field. Three of the leading labs developing AI — DeepMind, Anthropic and OpenAI — also have teams dedicated to figuring out how to solve technical safety issues that we believe could, for reasons we discuss at length below, lead to an existential threat to humanity.8
There are also several academic research groups (including at MIT, Oxford, Cambridge, Carnegie Mellon University, and UC Berkeley) focusing on these same technical AI safety problems.9
It’s hard to know exactly what to take from all this, but we’re confident that it’s not a fringe position in the field to think that there is a material risk of outcomes as bad as an existential catastrophe. Some experts in the field maintain, though, that the risks are overblown.
Still, why do we side with those who are more concerned? In short, it’s because there are arguments we’ve found persuasive that AI could pose such an existential threat — arguments we will go through step by step below.
It’s important to recognise that the fact that many experts recognise there’s a problem doesn’t mean that everything’s OK because the experts have got it covered. Overall, we think this problem remains highly neglected (more on this below), especially as billions of dollars a year are spent to make AI more advanced.10
Before we try to figure out what the future of AI might look like, it’s helpful to take a look at what AI can already do.
Modern AI techniques involve machine learning (ML): models that improve automatically through data input. The most common form of this technique used today is known as deep learning.
Probably the most well-known ML-based product is ChatGPT. OpenAI’s commercialisation system — where you can pay for a much more powerful version of the product — led to revenue of over $2 billion by the end of 2023, making OpenAI one of the fastest growing startups ever.
If you’ve used ChatGPT, you may have been a bit underwhelmed. After all — while it’s great at some tasks, like coding and data analysis — it makes lots of mistakes. (Though note that the paid version tends to perform better than the free version.)
But we shouldn’t expect the frontier of AI to remain at the level of ChatGPT. There has been huge progress in what can be achieved with ML in only the last few years. Here are a few examples (from less recent to more recent):
If you’re anything like us, you found the complexity and breadth of the tasks these systems can carry out surprising.
And if the technology keeps advancing at this pace, it seems clear there will be major effects on society. At the very least, automating tasks makes carrying out those tasks cheaper. As a result, we may see rapid increases in economic growth (perhaps even to the level we saw during the Industrial Revolution).
If we’re able to partially or fully automate scientific advancement we may see more transformative changes to society and technology.13
That could be just the beginning. We may be able to get computers to eventually automate anything humans can do. This seems like it has to be possible — at least in principle. This is because it seems that, with enough power and complexity, a computer should be able to simulate the human brain. This would itself be a way of automating anything humans can do (if not the most efficient method of doing so).
And as we’ll see in the next section, there are some indications that extensive automation may well be possible through scaling up existing techniques.
There are three things that are crucial to building AI through machine learning:
Epoch is a team of scientists investigating trends in the development of advanced AI — in particular, how these three inputs are changing over time.
They found that the amount of compute used for training the largest AI models has been rising exponentially — doubling on average every six months since 2010.
That means the amount of computational power used to train our largest machine learning models has grown by over one billion times.
Epoch also looked at how much compute has been needed to train a neural network to have the same performance on ImageNet (a well-known test data set for computer vision).
They found that the amount of compute required for the same performance has been falling exponentially — halving every 10 months.
So since 2012, the amount of compute required for the same level of performance has fallen by over 10,000 times. Combined with the increased compute used for training, that’s a lot of growth.
Finally, they found that the size of the data sets used to train the largest language models has been doubling roughly once a year since 2010.
It’s hard to say whether these trends will continue, but they speak to incredible gains over the past decade in what it’s possible to do with machine learning.
Indeed, it looks like increasing the size of models (and the amount of compute used to train them) introduces ever more sophisticated behaviour. This is how things like GPT-4 are able to perform tasks they weren’t specifically trained for.
These observations have led to the scaling hypothesis: that we can simply build bigger and bigger neural networks, and as a result we will end up with more and more powerful artificial intelligence, and that this trend of increasing capabilities may increase to human-level AI and beyond.
If this is true, we can attempt to predict how the capabilities of AI technology will increase over time simply by looking at how quickly we are increasing the amount of compute available to train models.
But as we’ll see, it’s not just the scaling hypothesis that suggests we could end up with extremely powerful AI relatively soon — other methods of predicting AI progress come to similar conclusions.
It’s difficult to predict exactly when we will develop AI that we expect to be hugely transformative for society (for better or for worse) — for example, by automating all human work or drastically changing the structure of society.14 But here we’ll go through a few approaches.
One option is to survey experts. Data from the 2023 survey of 3000 AI experts implies there is 33% probability of human-level machine intelligence (which would plausibly be transformative in this sense) by 2036, 50% probability by 2047, and 80% by 2100.15 There are a lot of reasons to be suspicious of these estimates,4 but we take it as one data point.
Ajeya Cotra (a researcher at Open Philanthropy) attempted to forecast transformative AI by comparing modern deep learning to the human brain. Deep learning involves using a huge amount of compute to train a model, before that model is able to perform some task. There’s also a relationship between the amount of compute used to train a model and the amount used by the model when it’s run. And — if the scaling hypothesis is true — we should expect the performance of a model to predictably improve as the computational power used increases. So Cotra used a variety of approaches (including, for example, estimating how much compute the human brain uses on a variety of tasks) to estimate how much compute might be needed to train a model that, when run, could carry out the hardest tasks humans can do. She then estimated when using that much compute would be affordable.
Cotra’s 2022 update on her report’s conclusions estimates that there is a 35% probability of transformative AI by 2036, 50% by 2040, and 60% by 2050 — noting that these guesses are not stable.16
Tom Davidson (also a researcher at Open Philanthropy) wrote a report to complement Cotra’s work. He attempted to figure out when we might expect to see transformative AI based only on looking at various types of research that transformative AI might be like (e.g. developing technology that’s the ultimate goal of a STEM field, or proving difficult mathematical conjectures), and how long it’s taken for each of these kinds of research to be completed in the past, given some quantity of research funding and effort.
Davidson’s report estimates that, solely on this information, you’d think that there was an 8% chance of transformative AI by 2036, 13% by 2060, and 20% by 2100. However, Davidson doesn’t consider the actual ways in which AI has progressed since research started in the 1950s, and notes that it seems likely that the amount of effort we put into AI research will increase as AI becomes increasingly relevant to our economy. As a result, Davidson expects these numbers to be underestimates.
Holden Karnofsky, co-CEO of Open Philanthropy, attempted to sum up the findings of others’ forecasts. He guessed in 2021 there was more than a 10% chance we’d see transformative AI by 2036, 50% by 2060, and 66% by 2100. And these guesses might be conservative, since they didn’t incorporate what we see as faster-than-expected progress since the earlier estimates were made.
Method | Chance of transformative AI by 2036 | Chance of transformative AI by 2060 | Chance of transformative AI by 2100 |
---|---|---|---|
Expert survey (Grace et al., 2024) | 33% | 50% (by 2047) | 80% |
Expert survey (Zhang et al., 2022) | 20% | 50% | 85% |
Biological anchors (Cotra, 2022) | 35% | 60% (by 2050) | 80% (according to the 2020 report) |
Semi-informative priors (Davidson, 2021) | 8% | 13% | 20% |
Overall guess (Karnofsky, 2021) | 10% | 50% | 66% |
All in all, AI seems to be advancing rapidly. More money and talent is going into the field every year, and models are getting bigger and more efficient.
Even if AI were advancing more slowly, we’d be concerned about it — most of the arguments about the risks from AI (that we’ll get to below) do not depend on this rapid progress.
However, the speed of these recent advances increases the urgency of the issue.
(It’s totally possible that these estimates are wrong – below, we discuss how the possibility that we might have a lot of time to work on this problem is one of the best arguments against this problem being pressing).
We’ve argued so far that we expect AI to be an important — and potentially transformative — new technology.
We’ve also seen reason to think that such transformative AI systems could be built this century.
Now we’ll turn to the core question: why do we think this matters so much?
There could be a lot of reasons. If advanced AI is as transformative as it seems like it’ll be, there will be many important consequences. But here we are going to explain the issue that seems most concerning to us: AI systems could pose risks by seeking and gaining power.
We’ll argue that:
Thinking through each step, I think there’s something like a 1% chance of an existential catastrophe resulting from power-seeking AI systems this century. This is my all things considered guess at the risk incorporating considerations of the argument in favour of the risk (which is itself probabilistic), as well as reasons why this argument might be wrong (some of which I discuss below). This puts me on the less worried end of 80,000 Hours staff, whose views on our last staff survey ranged from 1–55%, with a median of 15%.
We’re going to argue that future systems with the following three properties might pose a particularly important threat to humanity:17
They have goals and are good at making plans.
Not all AI systems have goals or make plans to achieve those goals. But some systems (like some chess-playing AI systems) can be thought of in this way. When discussing power-seeking AI, we’re considering planning systems that are relatively advanced, with plans that are in pursuit of some goal(s), and that are capable of carrying out those plans.
They have excellent strategic awareness.
A particularly good planning system would have a good enough understanding of the world to notice obstacles and opportunities that may help or hinder its plans, and respond to these accordingly. Following Carlsmith, we’ll call this strategic awareness, since it allows systems to strategise in a more sophisticated way.
They have highly advanced capabilities relative to today’s systems.
For these systems to actually affect the world, we need them to not just make plans, but also be good at all the specific tasks required to execute those plans.
Since we’re worried about systems attempting to take power from humanity, we are particularly concerned about AI systems that might be better than humans on one or more tasks that grant people significant power when carried out well in today’s world.
For example, people who are very good at persuasion and/or manipulation are often able to gain power — so an AI being good at these things might also be able to gain power. Other examples might include hacking into other systems, tasks within scientific and engineering research, as well as business, military, or political strategy.
As we saw above, we’ve already produced systems that are very good at carrying out specific tasks.
We’ve also already produced rudimentary planning systems, like AlphaStar, which skilfully plays the strategy game Starcraft, and MuZero, which plays chess, shogi, and Go.18
We’re not sure whether these systems are producing plans in pursuit of goals per se, because we’re not sure exactly what it means to “have goals.” However, since they consistently plan in ways that achieve goals, it seems like they have goals in some sense.
Moreover, some existing systems seem to actually represent goals as part of their neural networks.19
That said, planning in the real world (instead of games) is much more complex, and to date we’re not aware of any unambiguous examples of goal-directed planning systems, or systems that exhibit high degrees of strategic awareness.
But as we’ve discussed, we expect to see further advances within this century. And we think these advances are likely to produce systems with all three of the above properties.
That’s because we think that there are particularly strong incentives (like profit) to develop these kinds of systems. In short: because being able to plan to achieve a goal, and execute that plan, seems like a particularly powerful and general way of affecting the world.
Getting things done — whether that’s a company selling products, a person buying a house, or a government developing policy — almost always seems to require these skills. One example would be assigning a powerful system a goal and expecting the system to achieve it — rather than having to guide it every step of the way. So planning systems seem likely to be (economically and politically) extremely useful.20
And if systems are extremely useful, there are likely to be big incentives to build them. For example, an AI that could plan the actions of a company by being given the goal to increase its profits (that is, an AI CEO) would likely provide significant wealth for the people involved — a direct incentive to produce such an AI.
As a result, if we can build systems with these properties (and from what we know, it seems like we will be able to), it seems like we are likely to do so.21
There are reasons to think that these kinds of advanced planning AI systems will be misaligned. That is, they will aim to do things that we don’t want them to do.22
There are many reasons why systems might not be aiming to do exactly what we want them to do. For one thing, we don’t know how, using modern ML techniques, to give systems the precise goals we want (more here).23
We’re going to focus specifically on some reasons why systems might by default be misaligned in such a way that they develop plans that pose risks to humanity’s ability to influence the world — even when we don’t want that influence to be lost.24
What do we mean by “by default”? Essentially, unless we actively find solutions to some (potentially quite difficult) problems, then it seems like we’ll create dangerously misaligned AI. (There are reasons this might be wrong — which we discuss later.)
It’s worth noting that misalignment isn’t a purely theoretical possibility (or specific to AI) — we see misaligned goals in humans and institutions all the time, and have also seen examples of misalignment in AI systems.25
The democratic political framework is intended to ensure that politicians make decisions that benefit society. But what political systems actually reward is winning elections, so that’s what many politicians end up aiming for.
This is a decent proxy goal — if you have a plan to improve people’s lives, they’re probably more likely to vote for you — but it isn’t perfect. As a result, politicians do things that aren’t clearly the best way of running a country, like raising taxes at the start of their term and cutting them right before elections.
That is to say, the things the system does are at least a little different from what we would, in a perfect world, want it to do: the system is misaligned.
Companies have profit-making incentives. By producing more, and therefore helping people obtain goods and services at cheaper prices, companies make more money.
This is sometimes a decent proxy for making the world better, but profit isn’t actually the same as the good of all of humanity (bold claim, we know). As a result, there are negative externalities: for example, companies will pollute to make money despite this being worse for society overall.
Again, we have a misaligned system, where the things the system does are at least a little different from what we would want it to do.
DeepMind has documented examples of specification gaming: an AI doing well according to its specified reward function (which encodes our intentions for the system), but not doing what researchers intended.
In one example, a robot arm was asked to grasp a ball. But the reward was specified in terms of whether humans thought the robot had been successful. As a result, the arm learned to hover between the ball and the camera, fooling the humans into thinking that it had grasped the ball.26
So we know it’s possible to create a misaligned AI system.
Here’s the core argument of this article. We’ll use all three properties from earlier: planning ability, strategic awareness, and advanced capabilities.
To start, we should realise that a planning system that has a goal will also develop ‘instrumental goals’: things that, if they occur, will make it easier to achieve an overall goal.
We use instrumental goals in plans all the time. For example, a high schooler planning their career might think that getting into university will be helpful for their future job prospects. In this case, “getting into university” would be an instrumental goal.
A sufficiently advanced AI planning system would also include instrumental goals in its overall plans.
If a planning AI system also has enough strategic awareness, it will be able to identify facts about the real world (including potential things that would be obstacles to any plans), and plan in light of them. Crucially, these facts would include that access to resources (e.g. money, compute, influence) and greater capabilities — that is, forms of power — open up new, more effective ways of achieving goals.
This means that, by default, advanced planning AI systems would have some worrying instrumental goals:
Crucially, one clear way in which the AI can ensure that it will continue to exist (and not be turned off), and that its objectives will never be changed, would be to gain power over the humans who might affect it (we talk here about how AI systems might actually be able to do that).
What’s more, the AI systems we’re considering have advanced capabilities — meaning they can do one or more tasks that grant people significant power when carried out well in today’s world. With such advanced capabilities, these instrumental goals will not be out of reach, and as a result, it seems like the AI system would use its advanced capabilities to get power as part of the plan’s execution. If we don’t want the AI systems we create to take power away from us this would be a particularly dangerous form of misalignment.
In the most extreme scenarios, a planning AI system with sufficiently advanced capabilities could successfully disempower us completely.
As a (very non-rigorous) intuitive check on this argument, let’s try to apply it to humans.
Humans have a variety of goals. For many of these goals, some form of power-seeking is advantageous: though not everyone seeks power, many people do (in the form of wealth or social or political status), because it’s useful for getting what they want. This is not catastrophic (usually!) because, as human beings:
(We discuss whether humans are truly power-seeking later.)
A sufficiently advanced AI wouldn’t have those limitations.
The point of all this isn’t to say that any advanced planning AI system will necessarily attempt to seek power. Instead, it’s to point out that, unless we find a way to design systems that don’t have this flaw, we’ll face significant risk.
It seems more than plausible that we could create an AI system that isn’t misaligned in this way, and thereby prevent any disempowerment. Here are some strategies we might take (plus, unfortunately, some reasons why they might be difficult in practice):27
Control the objectives of the AI system. We may be able to design systems that simply don’t have objectives to which the above argument applies — and thus don’t incentivise power-seeking behaviour. For example, we could find ways to explicitly instruct AI systems not to harm humans, or find ways to reward AI systems (in training environments) for not engaging in specific kinds of power-seeking behaviour (and also find ways to ensure that this behaviour continues outside the training environment).
Carlsmith gives two reasons why doing this seems particularly hard.
First, for modern ML systems, we don’t get to explicitly state a system’s objectives — instead we reward (or punish) a system in a training environment so that it learns on its own. This raises a number of difficulties, one of which is goal misgeneralisation. Researchers have uncovered real examples of systems that appear to have learned to pursue a goal in the training environment, but then fail to generalise that goal when they operate in a new environment. This raises the possibility that we could think we’ve successfully trained an AI system not to seek power — but that the system would seek power anyway when deployed in the real world.28
Second, when we specify a goal to an AI system (or, when we can’t explicitly do that, when we find ways to reward or punish a system during training), we usually do this by giving the system a proxy by which outcomes can be measured (e.g. positive human feedback on a system’s achievement). But often those proxies don’t quite work.29 In general, we might expect that even if a proxy appears to correlate well with successful outcomes, it might not do so when that proxy is optimised for. (The examples above of politicians, companies, and the robot arm failing to grasp a ball are illustrations of this.) We’ll look at a more specific example of how problems with proxies could lead to an existential catastrophe here.
For more on the specific difficulty of controlling the objectives given to deep neural networks trained using self-supervised learning and reinforcement learning, we recommend OpenAI governance researcher Richard Ngo’s discussion of how realistic training processes lead to the development of misaligned goals.
Control the inputs into the AI system. AI systems will only develop plans to seek power if they have enough information about the world to realise that seeking power is indeed a way to achieve its goals.
Control the capabilities of the AI system. AI systems will likely only be able to carry out plans to seek power if they have sufficiently advanced capabilities in skills that grant people significant power in today’s world.
But to make any strategy work, it will need to both:
Continue to work as the planning ability and strategic awareness of systems improve over time. Some seemingly simple solutions (for example, trying to give a system a long list of things it isn’t allowed to do, like stealing money or physically harming humans) break down as the planning abilities of the systems increase. This is because, the more capable a system is at developing plans, the more likely it is to identify loopholes or failures in the safety strategy — and as a result, the more likely the system is to develop a plan that involves power-seeking.
Ultimately, by looking at the state of the research on this topic, and speaking to experts in the field, we think that there are currently no known ways of building aligned AI systems that seem likely to fulfil both these criteria.
So: that’s the core argument. There are many variants of this argument. Some have argued that AI systems might gradually shape our future via subtler forms of influence that nonetheless could amount to an existential catastrophe; others argue that the most likely form of disempowerment is in fact just killing everyone. We’re not sure how a catastrophe would be most likely to play out, but have tried to articulate the heart of the argument, as we see it: that AI presents an existential risk.
There are definitely reasons this argument might not be right! We go through some of the reasons that seem strongest to us below. But overall it seems possible that, for at least some kinds of advanced planning AI systems, it will be harder to build systems that don’t seek power in this dangerous way than to build systems that do.
At this point, you may have questions like:
We think there are good responses to all these questions, so we’ve added a long list of arguments against working on AI risk — and our responses — for these (and other) questions below.
When we say we’re concerned about existential catastrophes, we’re not just concerned about risks of extinction. This is because the source of our concern is rooted in longtermism: the idea that the lives of all future generations matter, and so it’s extremely important to protect their interests.
This means that any event that could prevent all future generations from living lives full of whatever you think makes life valuable (whether that’s happiness, justice, beauty, or general flourishing) counts as an existential catastrophe.
It seems extremely unlikely that we’d be able to regain power over a system that successfully disempowers humanity. And as a result, the entirety of the future — everything that happens for Earth-originating life, for the rest of time — would be determined by the goals of systems that, although built by us, are not aligned with us. Perhaps those goals will create a long and flourishing future, but we see little reason for confidence.30
This isn’t to say that we don’t think AI also poses a risk of human extinction. Indeed, we think making humans extinct is one highly plausible way in which an AI system could completely and permanently ensure that we are never able to regain power.
Surely no one would actually build or use a misaligned AI if they knew it could have such terrible consequences, right?
Unfortunately, there are at least two reasons people might create and then deploy misaligned AI — which we’ll go through one at a time:31
Imagine there’s a group of researchers trying to tell, in a test environment, whether a system they’ve built is aligned. We’ve argued that an intelligent planning AI will want to improve its abilities to effect changes in pursuit of its objective, and it’s almost always easier to do that if it’s deployed in the real world, where a much wider range of actions are available. As a result, any misaligned AI that’s sophisticated enough will try to understand what the researchers want it to do and at least pretend to be doing that, deceiving the researchers into thinking it’s aligned. (For example, a reinforcement learning system might be rewarded for certain apparent behaviour during training, regardless of what it’s actually doing.)
Hopefully, we’ll be aware of this sort of behaviour and be able to detect it. But catching a sufficiently advanced AI in deception seems potentially harder than catching a human in a lie, which isn’t always easy. For example, a sufficiently intelligent deceptive AI system may be able to deceive us into thinking we’ve solved the problem of AI deception, even if we haven’t.
If AI systems are good at deception, and have sufficiently advanced capabilities, a reasonable strategy for such a system could be to deceive humans completely until the system has a way to guarantee it can overcome any resistance to its goals.
We might also expect some people with the ability to deploy a misaligned AI to charge ahead despite any warning signs of misalignment that do come up, because of race dynamics — where people developing AI want to do so before anyone else.
For example, if you’re developing an AI to improve military or political strategy, it’s much more useful if none of your rivals have a similarly powerful AI.
These incentives apply even to people attempting to build an AI in the hopes of using it to make the world a better place.
For example, say you’ve spent years and years researching and developing a powerful AI system, and all you want is to use it to make the world a better place. Simplifying things a lot, say there are two possibilities:
Let’s say you think there’s a 90% chance that you’ve succeeded in building an aligned AI. But technology often develops at similar speeds across society, so there’s a good chance that someone else will soon also develop a powerful AI. And you think they’re less cautious, or less altruistic, so you think their AI will only have an 80% chance of being aligned with good goals, and pose a 20% chance of existential catastrophe. And only if you get there first can your more beneficial AI be dominant. As a result, you might decide to go ahead with deploying your AI, accepting the 10% risk.
The argument we’ve given so far is very general, and doesn’t really look at the specifics of how an AI that is attempting to seek power might actually do so.
If you’d like to get a better understanding of what an existential catastrophe caused by AI might actually look like, we’ve written a short separate article on that topic. If you’re happy with the high-level abstract arguments so far, feel free to skip to the next section!
What could an existential AI catastrophe actually look like?
So far we’ve described what a large proportion of researchers in the field2 think is the major existential risk from potential advances in AI, which depends crucially on an AI seeking power to achieve its goals.
If we can prevent power-seeking behaviour, we will have reduced existential risk substantially.
But even if we succeed, there are still existential risks that AI could pose.
We’re concerned that great power conflict could also pose a substantial threat to our world, and advances in AI seem likely to change the nature of war — through lethal autonomous weapons32 or through automated decision making.33
In some cases, great power war could pose an existential threat — for example, if the conflict is nuclear. It’s possible that AI could exacerbate risks of nuclear escalation, although there are also reasons to think AI could decrease this risk.34
Finally, if a single actor produces particularly powerful AI systems, this could be seen as giving them a decisive strategic advantage. For example, the US may produce a planning AI that’s intelligent enough to ensure that Russia or China could never successfully launch another nuclear weapon. This could incentivise a first strike from the actor’s rivals before these AI-developed plans can ever be put into action.
We expect that AI systems will help increase the rate of scientific progress.35
While there would be clear benefits to this automation — the rapid development of new medicine, for example — some forms of technological development can pose threats, including existential threats, to humanity. This could be through biotechnology36 (see our article on preventing catastrophic pandemics for more) or through some other form of currently unknown but dangerous technology.37
An AI-enabled authoritarian government could completely automate the monitoring and repression of its citizens, as well as significantly influence the information people see, perhaps making it impossible to coordinate action against such a regime.38
If this became a form of truly stable totalitarianism, this could make people’s lives far worse for extremely long periods of time, making it a particularly scary possible scenario resulting from AI.
We’re also concerned about the following issues, though we know less about them:
This is a really difficult question to answer.
There are no past examples we can use to determine the frequency of AI-related catastrophes.
All we have to go off are arguments (like the ones we’ve given above), and less relevant data like the history of technological advances. And we’re definitely not certain that the arguments we’ve presented are completely correct.
Consider the argument we gave earlier about the dangers of power-seeking AI in particular, based off Carlsmith’s report. At the end of his report, Carlsmith gives some rough guesses of the chances that each stage of his argument is correct (conditional on the previous stage being correct):
Multiplying these numbers together, Carlsmith estimated that there’s a 5% chance that his argument is right and there will be an existential catastrophe from misaligned power-seeking AI by 2070. When we spoke to Carlsmith, he noted that in the year between the writing of his report and the publication of this article, his overall guess at the chance of an existential catastrophe from power-seeking AI by 2070 had increased to >10%.39
The overall probability of existential catastrophe from AI would, in Carlsmith’s view, be higher than this, because there are other routes to possible catastrophe — like those discussed in the previous section — although our guess is that these other routes are probably a lot less likely to lead to existential catastrophe.
For another estimate, in The Precipice, philosopher and advisor to 80,000 Hours Toby Ord estimated a 1-in-6 risk of existential catastrophe by 2120 (from any cause), and that 60% of this risk comes from misaligned AI — giving a total of a 10% risk of existential catastrophe from misaligned AI by 2120.
A 2021 survey of 44 researchers working on reducing existential risks from AI found the median risk estimate was 32.5% — the highest answer given was 98%, and the lowest was 2%.40 There’s obviously a lot of selection bias here: people choose to work on reducing risks from AI because they think this is unusually important, so we should expect estimates from this survey to be substantially higher than estimates from other sources. But there’s clearly significant uncertainty about how big this risk is, and huge variation in answers.
All these numbers are shockingly, disturbingly high. We’re far from certain that all the arguments are correct. But these are generally the highest guesses for the level of existential risk of any of the issues we’ve examined (like engineered pandemics, great power conflict, climate change, or nuclear war).
That said, I think there are reasons why it’s harder to make guesses about the risks from AI than other risks – and possibly reasons to think that the estimates we’ve quoted above are systematically too high.
If I was forced to put a number on it, I’d say something like 1%. This number includes considerations both in favour and against the argument. I’m less worried than other 80,000 Hours staff — our position as an organisation is that the risk is between 3% and 50%.
All this said, the arguments for such high estimates of the existential risk posed by AI are persuasive — making risks from AI a top contender for the most pressing problem facing humanity.
We think one of the most important things you can do would be to help reduce the gravest risks that AI poses.
This isn’t just because we think these risks are high — it’s also because we think there are real things we can do to reduce these risks.
We know of two broad approaches:
For both of these, there are lots of ways to contribute. We’ll go through them in more detail below, but in this section we want to illustrate the point that there are things we can do to address these risks.
The benefits of transformative AI could be huge, and there are many different actors involved (operating in different countries), which means it will likely be really hard to prevent its development altogether.
(It’s also possible that it wouldn’t even be a good idea if we could — after all, that would mean forgoing the benefits as well as preventing the risks.)
As a result, we think it makes more sense to focus on making sure that this development is safe — meaning that it has a high probability of avoiding all the catastrophic failures listed above.
One way to do this is to try to develop technical solutions to prevent the kind of power-seeking behaviour we discussed earlier — this is generally known as working on technical AI safety, sometimes called just “AI safety” for short.
Read more about technical AI safety research below.
A second strategy for reducing risks from AI is to shape its development through policy, norms-building, and other governance mechanisms.
Good AI governance can help technical safety work, for example by producing safety agreements between corporations, or helping talented safety researchers from around the world move to where they can be most effective. AI governance could also help with other problems that lead to risks, like race dynamics.
But also, as we’ve discussed, even if we successfully manage to make AI do what we want (i.e. we ‘align’ it), we might still end up choosing something bad for it to do! So we need to worry about the incentives not just of the AI systems, but of the human actors using them.
Read more about AI governance research and implementation below.
Here are some more questions you might have:
Again, we think there are strong responses to these questions.
In 2022, we estimated there were around 400 people around the world working directly on reducing the chances of an AI-related existential catastrophe (with a 90% confidence interval ranging between 200 and 1,000). Of these, about three quarters worked on technical AI safety research, with the rest split between strategy (and other governance) research and advocacy.41 We also estimated that there were around 800 people working in complementary roles, but we’re highly uncertain about this figure.42
In The Precipice, Ord estimated that there was between $10 million and $50 million spent on reducing AI risk in 2020.
That might sound like a lot of money, but we’re spending something like 1,000 times that amount10 on speeding up the development of transformative AI via commercial capabilities research and engineering at large AI labs.
To compare the $50 million spent on AI safety in 2020 to other well-known risks, we’re currently spending several hundreds of billions per year on tackling climate change.
Because this field is so neglected and has such high stakes, we think your impact working on risks from AI could be much higher than working on many other areas — which is why our top two recommended career paths for making a big positive difference in the world are technical AI safety and AI policy research and implementation.
As we said above, we’re not totally sure the arguments we’ve presented for AI representing an existential threat are right. Though we do still think that the chance of catastrophe from AI is high enough to warrant many more people pursuing careers to try to prevent such an outcome, we also want to be honest about the arguments against doing so, so you can more easily make your own call on the question.
Here we’ll cover the strongest reasons (in our opinion) to think this problem isn’t particularly pressing. In the next section we’ll cover some common objections that (in our opinion) hold up less well, and explain why.
The longer we have before transformative AI is developed, the less pressing it is to work now on ways to ensure that it goes well. This is because the work of others in the future could be much better or more relevant than the work we are able to do now.
Also, if it takes us a long time to create transformative AI, we have more time to figure out how to make it safe. The risk seems much higher if AI developers will create transformative AI in the next few decades.
It seems plausible that the first transformative AI won’t be based on current deep learning methods. (AI Impacts have documented arguments that current methods won’t be able to produce AI that has human-level intelligence.) This could mean that some of our current research might not end up being useful (and also — depending on what method ends up being used — could make the arguments for risk less worrying).
Relatedly, we might expect that progress in the development of AI will occur in bursts. Previously, the field has seen AI winters, periods of time with significantly reduced investment, interest and research in AI. It’s unclear how likely it is that we’ll see another AI winter — but this possibility should lengthen our guesses about how long it’ll be before we’ve developed transformative AI. Cotra writes about the possibility of an AI winter in part four of her report forecasting transformative AI. New constraints on the rate of growth of AI capabilities, like the availability of training data, could also mean that there’s more time to work on this (Cotra discusses this here.)
Thirdly, the estimates about when we’ll get transformative AI from Cotra, Kanfosky and Davidson that we looked at earlier were produced by people who already expected that working on preventing an AI-related catastrophe might be one of the world’s most pressing problems. As a result, there’s selection bias here: people who think transformative AI is coming relatively soon are also the people incentivised to carry out detailed investigations. (That said, if the investigations themselves seem strong, this effect could be pretty small.)
Finally, none of the estimates we discussed earlier were trying to predict when an existential catastrophe might occur. Instead, they were looking at when AI systems might be able to automate all tasks humans can do, or when AI systems might significantly transform the economy. It’s by no means certain that the kinds of AI systems that could transform the economy would be the same advanced planning systems that are core to the argument that AI systems might seek power. Advanced planning systems do seem to be particularly useful, so there is at least some reason to think these might be the sorts of systems that end up being built. But even if the forecasted transformative AI systems are advanced planning systems, it’s unclear how capable such systems would need to be to pose a threat — it’s more than plausible that systems would need to be far more capable to pose a substantial existential threat than they would need to be to transform the economy. This would mean that all the estimates we considered above would be underestimates of how long we have to work on this problem.
All that said, it might be extremely difficult to find technical solutions to prevent power-seeking behaviour — and if that’s the case, focusing on finding those solutions now does seem extremely valuable.
Overall, we think that transformative AI is sufficiently likely in the next 10–80 years that it is well worth it (in expected value terms) to work on this issue now. Perhaps future generations will take care of it, and all the work we’d do now will be in vain — we hope so! But it might not be prudent to take that risk.
If the best AI we have improves gradually over time (rather than AI capabilities remaining fairly low for a while and then suddenly increasing), we’re likely to end up with ‘warning shots’: we’ll notice forms of misaligned behaviour in fairly weak systems, and be able to correct for it before it’s too late.
In such a gradual scenario, we’ll have a better idea about what form powerful AI might take (e.g. whether it will be built using current deep learning techniques, or something else entirely), which could significantly help with safety research. There will also be more focus on this issue by society as a whole, as the risks of AI become clearer.
So if gradual development of AI seems more likely, the risk seems lower.
But it’s very much not certain that AI development will be gradual, or if it is, gradual enough for the risk to be noticeably lower. And even if AI development is gradual, there could still be significant benefits to having plans and technical solutions in place well in advance. So overall we still think it’s extremely valuable to attempt to reduce the risk now.
If you want to learn more, you can read AI Impacts’ work on arguments for and against discontinuous (i.e. non-gradual) progress in AI development, and Toby Ord and Owen Cotton-Barratt on strategic implications of slower AI development.
Making something have goals aligned with human designers’ ultimate objectives and making something useful seem like very related problems. If so, perhaps the need to make AI useful will drive us to produce only aligned AI — in which case the alignment problem is likely to be solved by default.
Ben Garfinkel gave a few examples of this on our podcast:
If we need to solve the problem of alignment anyway to make useful AI systems, this significantly reduces the chances we will have misaligned but still superficially useful AI systems. So the incentive to deploy a misaligned AI would be a lot lower, reducing the risk to society.
That said, there are still reasons to be concerned. For example, it seems like we could still be susceptible to problems of AI deception.
And, as we’ve argued, AI alignment is only part of the overall issue. Solving the alignment problem isn’t the same thing as completely eliminating existential risk from AI, since aligned AI could also be used to bad ends — such as by authoritarian governments.
As with many research projects in their early stages, we don’t know how hard the alignment problem — or other AI problems that pose risks — are to solve. Someone could believe there are major risks from machine intelligence, but be pessimistic about what additional research or policy work will accomplish, and so decide not to focus on it.
This is definitely a reason to potentially work on another issue — the solvability of an issue is a key part of how we try to compare global problems. For example, we’re also very concerned about risks from pandemics, and it may be much easier to solve that issue.
That said, we think that given the stakes, it could make sense for many people to work on reducing AI risk, even if you think the chance of success is low. You’d have to think that it was extremely difficult to reduce risks from AI in order to conclude that it’s better just to let the risks materialise and the chance of catastrophe play out.
At least in our own case at 80,000 Hours, we want to keep trying to help with AI safety — for example, by writing profiles like this one — even if the chance of success seems low (though in fact we’re overall pretty optimistic).
There are some reasons to think that the core argument that any advanced, strategically aware planning system will by default seek power (which we gave here) isn’t totally right.43
We’d love to see a more in-depth analysis of what aspects of planning are economically incentivised, and whether those aspects seem like they’ll be enough for the argument for power-seeking behaviour to work.
Grace has written more about the ambiguity around “how much goal-directedness is needed to bring about disaster”
It’s possible that only a few goals that AI systems could have would lead to misaligned power-seeking.
Richard Ngo, in his analysis of what people mean by “goals”, points out that you’ll only get power-seeking behaviour if you have goals that mean the system can actually benefit from seeking power. Ngo suggests that these goals need to be “large-scale.” (Some have argued that, by default, we should expect AI systems to have “short-term” goals that won’t lead to power-seeking behaviour.)
But whether an AI system would plan to take power depends on how easy it would be for the system to take power, because the easier it is for a system to take power, the more likely power-seeking plans are to be successful — so a good planning system would be more likely to choose them. This suggests it will be easier to accidentally create a power-seeking AI system as systems’ capabilities increase.
So there still seems to be cause for increased concern, because the capabilities of AI systems do seem to be increasing fast. There are two considerations here: if few goals really lead to power-seeking, even for quite capable AI systems, that significantly reduces the risk and thus the importance of the problem. But it might also increase the solvability of the problem by demonstrating that solutions could be easy to find (e.g. the solution of never giving systems “large-scale” goals) — making this issue more valuable for people to work on.
Earlier we argued that we can expect AI systems to do things that seem generally instrumentally useful to their overall goal, and that as a result it could be hard to prevent AI systems from doing these instrumentally useful things.
But we can find examples where how generally instrumentally useful things would be doesn’t seem to affect how hard it is to prevent these things. Consider an autonomous car that can move around only if its engine is on. For many possible goals (other than, say, turning the car radio on), it seems like it would be useful for the car to be able to move around, so we should expect the car to turn its engine on. But despite that, we might still be able to train the car to keep its engine off: for example, we can give it some negative feedback whenever it turns the engine on, even if we also had given the car some other goals. Now imagine we improve the car so that its top speed is higher — this massively increases the number of possible action sequences that involve, as a first step, turning its engine on. In some sense, this seems to increase the instrumental usefulness of turning the engine on — there are more possible actions the car can take, once its engine is on, because the range of possible speeds it can travel at is higher. (It’s not clear if this sense of “instrumental usefulness” is the same as the one in the argument for the risk, although it does seem somewhat related.) But it doesn’t seem like this increase in the instrumental usefulness of turning on the engine makes it much harder to stop the car turning it on. Simple examples like this cast some doubt on the idea that, just because a particular action is instrumentally useful, we won’t be able to find ways to prevent it. (For more on this example, see page 25 of Garfinkel’s review of Carlsmith’s report.)
Humans are clearly highly intelligent, but it’s unclear they are perfect goal-optimisers. For example, humans often face some kind of existential angst over what their true goals are. , And even if we accept humans as an example of a strategically aware agent capable of planning, humans certainly aren’t always power-seeking. We obviously care about having basics like food and shelter, and many people go to great lengths for more money, status, education, or even formal power. But some humans choose not to pursue these goals, and pursuing them doesn’t seem to correlate with intelligence.
However, this doesn’t mean that the argument that there will be an incentive to seek power is wrong. Most people do face and act on incentives to gain forms of influence via wealth, status, promotions, and so on. And we can explain the observation that humans don’t usually seek huge amounts of power by observing that we aren’t usually in circumstances that make the effort worth it.
For example, most people don’t try to start billion-dollar companies — you probably won’t succeed, and it’ll cost you a lot of time and effort.
But you’d still walk across the street to pick up a billion-dollar cheque.
The absence of extreme power-seeking in many humans, along with uncertainties in what it really means to plan to achieve goals, does suggest that the argument we gave that advanced AI systems will seek power above might not be completely correct. And they also suggest that, if there really is a problem to solve here,, in principle, alignment research into preventing power-seeking in AIs could succeed.
This is good news! But for the moment — short of hoping we’re wrong about the existence of the problem — we don’t actually know how to prevent this power-seeking behaviour.
We’ve just discussed the major objections to working on AI risk that we think are most persuasive. In this section, we’ll look at objections that we think are less persuasive, and give some reasons why.
People have been saying since the 1950s that artificial intelligence smarter than humans is just around the corner.
But it hasn’t happened yet.
One reason for this could be that it’ll never happen. Some have argued that producing artificial general intelligence is fundamentally impossible. Others think it’s possible, but unlikely to actually happen, especially not with current deep learning methods.
Overall, we think the existence of human intelligence shows it’s possible in principle to create artificial intelligence. And the speed of current advances isn’t something we think would have been predicted by those who thought that we’ll never develop powerful, general AI.
But most importantly, the idea that you need fully general intelligent AI systems for there to be a substantial existential risk is a common misconception.
The argument we gave earlier relied on AI systems being as good or better than humans in a subset of areas: planning, strategic awareness, and areas related to seeking and keeping power. So as long as you think all these things are possible, the risk remains.
And even if no single AI has all of these properties, there are still ways in which we might end up with systems of ‘narrow’ AI systems that, together, can disempower humanity. For example, we might have a planning AI that develops plans for a company, a separate AI system that measures things about the company, another AI system that attempts to evaluate plans from the first AI by predicting how much profit each will make, and further AI systems that carry out those plans (for example, by automating the building and operation of factories). Considered together, this system as a whole has the capability to form and carry out plans to achieve some goal, and potentially also has advanced capabilities in areas that help it seek power.
It does seem like it will be easier to prevent these ‘narrow’ AI systems from seeking power. This could happen if the skills the AIs have, even when combined, don’t add up to being able to plan to achieve goals, or if the narrowness reduces the risk of systems developing power-seeking plans (e.g. if you build systems that can only produce very short-term plans). It also seems like it gives another point of weakness for humans to intervene if necessary: the coordination of the different systems.
Nevertheless, the risk remains, even from systems of many interacting AIs.
It might just be really, really hard.
Stopping people and computers from running software is already incredibly difficult.
Think about how hard it would be to shut down Google’s web services. Google’s data centres have millions of servers over 34 different locations, many of which are running the same sets of code. And these data centres are absolutely crucial to Google’s bottom line, so even if Google could decide to shut down their entire business, they probably wouldn’t.
Or think about how hard it is to get rid of computer viruses that autonomously spread between computers across the world.
Ultimately, we think any dangerous power-seeking AI system will be looking for ways to not be turned off, which makes it more likely we’ll be in one of these situations, rather than in a case where we can just unplug a single machine.
That said, we absolutely should try to shape the future of AI such that we can ‘unplug’ powerful AI systems.
There may be ways we can develop systems that let us turn them off. But for the moment, we’re not sure how to do that.
Ensuring that we can turn off potentially dangerous AI systems could be a safety measure developed by technical AI safety research, or it could be the result of careful AI governance, such as planning coordinated efforts to stop autonomous software once it’s running.
We could (and should!) definitely try.
If we could successfully ‘sandbox’ an advanced AI — that is, contain it to a training environment with no access to the real world until we were very confident it wouldn’t do harm — that would help our efforts to mitigate AI risks tremendously.
But there are a few things that might make this difficult.
For a start, we might only need one failure — like one person to remove the sandbox, or one security vulnerability in the sandbox we hadn’t noticed — for the AI system to begin affecting the real world.
Moreover, this solution doesn’t scale with the capabilities of the AI system. This is because:
So the more dangerous the AI system, the less likely sandboxing is to be possible. That’s the opposite of what we’d want from a good solution to the risk.
For some definitions of “truly intelligent” — for example, if true intelligence includes a deep understanding of morality and a desire to be moral — this would probably be the case.
But if that’s your definition of truly intelligent, then it’s not truly intelligent systems that pose a risk. As we argued earlier, it’s advanced systems that can plan and have strategic awareness that pose risks to humanity.
With sufficiently advanced strategic awareness, an AI system’s excellent understanding of the world may well encompass an excellent understanding of people’s moral beliefs. But that’s not a strong reason to think that such a system would act morally.
For example, when we learn about other cultures or moral systems, that doesn’t necessarily create a desire to follow their morality. A scholar of the Antebellum South might have a very good understanding of how 19th century slave owners justified themselves as moral, but would be very unlikely to defend slavery.
AI systems with excellent understandings of human morality could be even more dangerous than AIs without such understanding: the AI system could act morally at first as a way to deceive us into thinking that it is safe.
There are definitely dangers from current artificial intelligence.
For example, data used to train neural networks often contains hidden biases. This means that AI systems can learn these biases — and this can lead to racist and sexist behaviour.
There are other dangers too. Our earlier discussion on nuclear war explains a threat which doesn’t require AI systems to have particularly advanced capabilities.
But we don’t think the fact that there are also risks from current systems is a reason not to prioritise reducing existential threats from AI, if they are sufficiently severe.
As we’ve discussed, future systems — not necessarily superintelligence or totally general intelligence, but systems advanced in their planning and power-seeking capabilities — seem like they could pose threats to the existence of the entirety of humanity. And it also seems somewhat likely that we’ll produce such systems this century.
What’s more, lots of technical AI safety research is also relevant to solving problems with existing AI systems. For example, some research focuses on ensuring that ML models do what we want them to, and will still do this as their size and capabilities increase; other research tries to work out how and why existing models are making the decisions and taking the actions that they do.
As a result, at least in the case of technical research, the choice between working on current threats and future risks may look more like a choice between only ensuring that current models are safe, or instead finding ways to ensure that current models are safe that will also continue to work as AI systems become more complex and more intelligent.
Ultimately, we have limited time in our careers, so choosing which problem to work on could be a huge way of increasing your impact. When there are such substantial threats, it seems reasonable for many people to focus on addressing these worst-case possibilities.
Yes, it can.
AI systems are already improving healthcare, putting driverless cars on the roads, and automating household chores.
And if we’re able to automate advancements in science and technology, we could see truly incredible economic and scientific progress. AI could likely help solve many of the world’s most pressing problems.
But, just because something can do a lot of good, that doesn’t mean it can’t also do a lot of harm. AI is an example of a dual-use technology — a technology that can be used for both dangerous and beneficial purposes. For example, researchers were able to get an AI model that was trained to develop medical drugs to instead generate designs for bioweapons.
We are excited and hopeful about seeing large benefits from AI. But we also want to work hard to minimise the enormous risks advanced AI systems pose.
It’s undoubtedly true that some people are drawn to thinking about AI safety because they like computers and science fiction — as with any other issue, there are people working on it not because they think it’s important, but because they think it’s cool.
But, for many people, working on AI safety comes with huge reluctance.
For me, and many of us at 80,000 Hours, spending our limited time and resources working on any cause that affects the long-run future — and therefore not spending that time on the terrible problems in the world today — is an incredibly emotionally difficult thing to do.
But we’ve gradually investigated these arguments (in the course of trying to figure out how we can do the most good), and over time both gained more expertise about AI and became more concerned about the risk.
We think scepticism is healthy, and are far from certain that these arguments completely work. So while this suspicion is definitely a reason to dig a little deeper, we hope that, ultimately, this worry won’t be treated as a reason to deprioritise what may well be the most important problem of our time.
That something sounds like science fiction isn’t a reason in itself to dismiss it outright. There are loads of examples of things first mentioned in sci-fi that then went on to actually happen (this list of inventions in science fiction contains plenty of examples).
There are even a few such cases involving technology that are real existential threats today:
Moreover, there are top academics and researchers working on preventing these risks from AI — at MIT, Cambridge, Oxford, UC Berkeley, and elsewhere. Two of the world’s top AI labs (DeepMind and OpenAI) have teams explicitly dedicated to working on technical AI safety. Researchers from these places helped us with this article.
It’s totally possible all these people are wrong to be worried, but the fact that so many people take this threat seriously undermines the idea that this is merely science fiction.
It’s reasonable when you hear something that sounds like science fiction to want to investigate it thoroughly before acting on it. But having investigated it, if the arguments seem solid, then simply sounding like science fiction is not a reason to dismiss them.
We never know for sure what’s going to happen in the future. So, unfortunately for us, if we’re trying to have a positive impact on the world, that means we’re always having to deal with at least some degree of uncertainty.
We also think there’s an important distinction between guaranteeing that you’ve achieved some amount of good and doing the very best you can. To achieve the former, you can’t take any risks at all — and that could mean missing out on the best opportunities to do good.
When you’re dealing with uncertainty, it makes sense to roughly think about the expected value of your actions: the sum of all the good and bad potential consequences of your actions, weighted by their probability.
Given the stakes are so high, and the risks from AI aren’t that low, this makes the expected value of helping with this problem high.
We’re sympathetic to the concern that if you work on AI safety, you might end up doing not much at all when you might have done a tremendous amount of good working on something else — simply because the problem and our current ideas about what to do about it are so uncertain.
But we think the world will be better off if we decide that some of us should work on solving this problem, so that together we have the best chance of successfully navigating the transition to a world with advanced AI rather than risking an existential crisis.
And it seems like an immensely valuable thing to try.
Pascal’s mugging is a thought experiment — a riff on the famous Pascal’s wager — where someone making decisions using expected value calculations can be exploited by claims that they can get something extraordinarily good (or avoid something extraordinarily bad), with an extremely low probability of succeeding.
The story goes like this: a random mugger stops you on the street and says, “Give me your wallet or I’ll cast a spell of torture on you and everyone who has ever lived.” You can’t rule out with 100% probability that he won’t — after all, nothing’s 100% for sure. And torturing everyone who’s ever lived is so bad that surely even avoiding a tiny, tiny probability of that is worth the $40 in your wallet? But intuitively, it seems like you shouldn’t give your wallet to someone just because they threaten you with something completely implausible.
Analogously, you could worry that working on AI safety means giving your valuable time to avoid a tiny, tiny chance of catastrophe. Working on reducing risks from AI isn’t free — the opportunity cost is quite substantial, as it means you forgo working on other extremely important things, like reducing risks from pandemics or ending factory farming.
Here’s the thing though: while there’s lots of value at stake — perhaps the lives of everybody alive today, and the entirety of the future of humanity — it’s not the case that the probability that you can make a difference by working on reducing risks from AI is small enough for this argument to apply.
We wish the chance of an AI catastrophe was that vanishingly small.
Instead, we think the probability of such a catastrophe (I think, around 1% this century) is much, much larger than things that people try to prevent all the time — such as fatal plane crashes, which happen in 0.00002% of flights.
What really matters, though, is the extent to which your work can reduce the chance of a catastrophe.
Let’s look at working on reducing risks from AI. For example, if:
Then each person involved has a 0.00006 percentage point share in preventing this catastrophe.
Other ways of acting altruistically involve similarly sized probabilities.
The chances of a volunteer campaigner swinging a US presidential election is somewhere between 0.001% and 0.00001%. But you can still justify working on a campaign because of the large impact you expect you’d have on the world if your preferred candidate won.
You have even lower chances of wild success from things like trying to reform political institutions, or working on some very fundamental science research to build knowledge that might one day help cure cancer.
Overall, as a society, we may be able to reduce the chance of an AI-related catastrophe all the way down from 10% (or higher) to close to zero — that’d be clearly worth it for a group of people, so it has to be worth it for the individuals, too.
We wouldn’t want to just not do fundamental science because each researcher has a low chance of making the next big discovery, or not do any peacekeeping because any one person has a low chance of preventing World War III. As a society, we need some people working on these big issues — and maybe you can be one of them.
As we mentioned above, we know of two main ways to help reduce existential risks from AI:
The biggest way you could help would be to pursue a career in either one of these areas, or in a supporting area.
The first step is learning a lot more about the technologies, problems, and possible solutions. We’ve collated some lists of our favourite resources here, and our top recommendation is to take a look at the technical alignment curriculum from AGI Safety Fundamentals.
If you decide to pursue a career in this area, we’d generally recommend working at an organisation focused on specifically addressing this problem (though there are other ways to help besides working at existing organisations, as we discuss briefly below).
There are lots of approaches to technical AI safety, including:
See Neel Nanda’s overview of the AI alignment landscape for more details.
AI labs in industry that have empirical technical safety teams, or are focused entirely on safety:
Theoretical / conceptual AI safety labs:
AI safety in academia (a very non-comprehensive list; while the number of academics explicitly and publicly focused on AI safety is small, it’s possible to do relevant work at a much wider set of places):
If you’re interested in learning more about technical AI safety as an area — e.g. the different techniques, schools of thought, and threat models — our top recommendation is to take a look at the technical alignment curriculum from AGI Safety Fundamentals.
We discuss this path in more detail here:
Career review of technical AI safety research
Alternatively, if you’re looking for something more concrete and step-by-step (with very little in the way of introduction), check out this detailed guide to pursuing a career in AI alignment.
It’s important to note that you don’t have to be an academic or an expert in AI or AI safety to contribute to AI safety research. For example, software engineers are needed at many places conducting technical safety research, and we also highlight more roles below.
Quite apart from the technical problems, we face a host of governance issues, which include:
To tackle these, we need a combination of research and policy.45
We are in the early stages of figuring out the shape of this problem and the most effective ways to tackle it. So it’s crucial that we do more research. This includes forecasting research into what we should expect to happen, and strategy and policy research into the best ways of acting to reduce the risks.
But also, as AI begins to impact our society more and more, it’ll be crucial that governments and corporations have the best policies in place to shape its development. For example, governments might be able to enforce agreements not to cut corners on safety, further the work of researchers who are less likely to cause harm, or cause the benefits of AI to be distributed more evenly. So there eventually might be a key role to be played in advocacy and lobbying for appropriate AI policy — though we’re not yet at the point of knowing what policies would be useful to implement.
AI strategy and policy organisations:
If you’re interested in learning more about AI governance, our top recommendation is to take a look at the governance curriculum from AGI safety fundamentals.
We discuss this path in more detail here:
Career review of AI strategy and policy careers
Also note: it could be particularly important for people with the right personal fit to work on AI strategy and governance in China.
Even in a research organisation, around half of the staff will be doing other tasks essential for the organisation to perform at its best and have an impact. Having high-performing people in these roles is crucial.
We think the importance of these roles is often underrated because the work is less visible. So we’ve written several career reviews on these areas to help more people enter these careers and succeed, including:
AI safety is a big problem and it needs help from people doing a lot of different kinds of work.
One major way to help is to work in a role that directs funding or people towards AI risk, rather than working on the problem directly. We’ve reviewed a few career paths along these lines, including:
There are ways all of these could go wrong, so the first step is to become well-informed about the issue.
There are also other technical roles besides safety research that could help contribute, like:
You can read about all these careers — why we think they’re helpful, how to enter them, and how you can predict whether they’re a good fit for you — on our career reviews page.
We think that the risks posed by the development of AI may be the most pressing problem the world currently faces. If you think you might be a good fit for any of the above career paths that contribute to solving this problem, we’d be especially excited to advise you on next steps, one-on-one.
We can help you consider your options, make connections with others working on reducing risks from AI, and possibly even help you find jobs or funding opportunities — all for free.
Our job board features opportunities in AI technical safety and governance:
We've hit you with a lot of further reading throughout this article — here are a few of our favourites:
On The 80,000 Hours Podcast, we have a number of in-depth interviews with people actively working to positively shape the development of artificial intelligence:
If you want to go into much more depth, the AGI safety fundamentals course is a good starting point. There are two tracks to choose from: technical alignment or AI governance. If you have a more technical background, you could try Intro to ML Safety, a course from the Center for AI Safety.
And finally, here are a few general sources (rather than specific articles) that you might want to explore:
Want to learn more about global issues we think are especially pressing? See our list of issues that are large in scale, solvable, and neglected, according to our research.
Huge thanks to Joel Becker, Tamay Besiroglu, Jungwon Byun, Joseph Carlsmith, Jesse Clifton, Emery Cooper, Ajeya Cotra, Andrew Critch, Anthony DiGiovanni, Noemi Dreksler, Ben Edelman, Lukas Finnveden, Emily Frizell, Ben Garfinkel, Katja Grace, Lewis Hammond, Jacob Hilton, Samuel Hilton, Michelle Hutchinson, Caroline Jeanmaire, Kuhan Jeyapragasan, Arden Koehler, Daniel Kokotajlo, Victoria Krakovna, Alex Lawsen, Howie Lempel, Eli Lifland, Katy Moore, Luke Muehlhauser, Neel Nanda, Linh Chi Nguyen, Luisa Rodriguez, Caspar Oesterheld, Ethan Perez, Charlie Rogers-Smith, Jack Ryan, Rohin Shah, Buck Shlegeris, Marlene Staib, Andreas Stuhlmüller, Luke Stebbing, Nate Thomas, Benjamin Todd, Stefan Torges, Michael Townsend, Chris van Merwijk, Hjalmar Wijk, and Mark Xu for either reviewing this article or their extremely thoughtful and helpful comments and conversations. (This isn’t to say that they would all agree with everything we’ve said here — in fact, we’ve had many spirited disagreements in the comments on this article!)
The post Preventing an AI-related catastrophe appeared first on 80,000 Hours.
]]>We usually focus on how people can help tackle what we think are the biggest global catastrophic risks. But there are lots of other pressing problems we think also deserve more attention — some of which are especially highly neglected.
Compared to our top-ranked issues, these problems generally don’t have well-developed fields dedicated to them. So we don’t have as much concrete advice about how to tackle them, and they might be full of dead ends.
But if you can find ways to meaningfully contribute (and have the kind of self-directed mindset necessary, doing so could well be your top option.
Here they are, in no particular order:
1. Risks of stable totalitarianism
If we put aside risks of extinction, one of the biggest dangers to the long-term future of humanity might be the potential for an ultra-long-lasting and terrible political regime. As technology advances and globalisation and homogenisation increase, a stable form of totalitarianism potentially could take hold, enabled by improved surveillance, advanced lie detection, or an obedient AI workforce. We’re not sure how big or tractable these risks are, but more research into the area could be highly valuable. Read more.
2. Long-term focused space governance
Humanity’s future, and the future of sentient life, may extend far beyond Earth and even the solar system. But whether this potential expansion goes well or badly is far from certain. In the meantime, because space governance is not yet settled, issues such as weapons in space might presently increase the risk of great power war. We think some people could have a significant impact and improve the prospects for our descendants by working in the field of space governance to lay a positive groundwork for the future — and that now might be a particularly good time to do so. Read more.
3. Civilisational resilience
We generally focus on measures to reduce global catastrophic risks, but it’s also worth asking: what can we do to help humanity survive a global catastrophe if one does happen, like an enormous nuclear war or biological disaster? How likely is civilisation to recover from most disasters by default? Despite the importance of these questions, we know of only a few people and organisations trying to answer them. Read more.
4. Wild animal suffering
Nature is not inherently kind. Many wild animals are forced to endure significant pain, suffering, and disease throughout their lives with little comfort. While there’s historically been little interest in assessing or mitigating these harms, there’s now a nascent field of research and advocacy around potential opportunities to significantly reduce the unnecessary suffering of wild animals. Read more.
5. Artificial sentience
We’ve written a lot about the risks AI poses to humanity. But we also think there’s a plausible case that AI systems themselves could one day become sentient — that is, able to suffer or flourish. That might well make them subjects of moral concern themselves, and it could become really important that we ensure the future goes well for them too. We hope to dive more into this issue soon, but for now, you can read more here.
6. S-risks
S-risks, or suffering risks, refer to the risk of seeing vastly more suffering in the future than has existed on Earth so far. Unfortunately, this doesn’t seem like such a remote possibility to us that it can be ignored, but very few people are thinking about it. These risks might arise as part of other problem areas, and could affect humanity, non-human animals, and perhaps even future sentient, non-biological beings. Read more.
These issues are all highly neglected, and though we don’t know as much about them as we’d like, we think they might be very important. This means that if you can find a way to help — for example by doing research to disentangle the issue or help start up a research field — you might be able to make quite an outsized difference.
This blog post was first released to our newsletter subscribers.
Join over 400,000 newsletter subscribers who get content like this in their inboxes weekly — and we’ll also mail you a free book!
Learn more:
The post Particularly neglected causes you could work on appeared first on 80,000 Hours.
]]>80,000 Hours is considering hiring a full-time, senior product manager to lead on iterating on and improving the 80,000 Hours website.
They would research, propose, and implement product changes to make the 80,000 Hours website more useful and delightful for talented people interested in having a high impact career.
To express interest in this role, please complete this form.
Note: This announcement is for an expression of interest rather than a job opening. It’s possible that we will launch a formal hiring round within the next month or two.
If we do run a formal hiring round, we’ll email everyone who filled out this expression of interest to invite them to fill out an application for the role.
80,000 Hours’ mission is to get talented people working on the world’s most pressing problems.
Since being founded in 2011, we have helped popularise using your career to ambitiously pursue impact while thinking seriously about cause and intervention prioritisation, as well as grow the fields of AI safety, AI governance, and global catastrophic biological risk reduction, among others.
Over a million people visit our website each year, and thousands of people have told us that they’ve significantly changed their career plans due to our work. Surveys conducted by our primary funder, Open Philanthropy, show that 80,000 Hours is one of the single biggest drivers of talent moving into work related to reducing global catastrophic risks.
As a senior product manager, you would:
This is a senior role. Our ideal is to find someone who can fairly quickly take on managing developers, designers, and others as needed, and who can make autonomous decisions about how to best change the website.
However, for the right candidate, we are open to hiring someone on the more junior end and supporting them to grow into the more senior role with time. So if you don’t have a lot of experience but think you might be able to do an exceptional job, please still consider expressing interest in the role.
Note that this role is different from product management roles typical in tech companies. We are looking for someone who can help shape the product from multiple relevant perspectives. This includes thinking about content formats, visual design, and how to present our most important research findings in ways that will make sense to new users.
We’re looking for someone who has:
We’re aware that factors like gender, race, and socioeconomic background can affect people’s willingness to apply for roles. We’d like to especially encourage people from underrepresented backgrounds to apply!
The salary will vary based on experience and the responsibilities you take on, but to give a rough sense, the starting salary for someone with 5 years of relevant experience might be between £76,000 and £85,000 per year, plus benefits. Our salaries are calculated according to a formula that is transparent to all primary staff at 80,000 Hours.
Staff can work flexible hours. We encourage staff to work whatever schedule (consistent with full-time status) will allow them to be most personally effective.
We prefer people to work in person in our London office if possible. We are open to remote work in some cases. We can sponsor visas.
The start date of the role is flexible, but we would expect you to start during 2024 and prefer you to start approximately as soon as you’re available.
Our benefits include:
To express interest in the role, please fill in this form.
The post Expression of interest: Senior product manager appeared first on 80,000 Hours.
]]>80,000 Hours is considering hiring full-time writers and writer-researchers. They would outline and write new articles for the 80,000 Hours website that will help people shift their careers towards high-impact options.
To express interest in either or both of these roles, please complete this form.
Note: This announcement is for an expression of interest rather than a job opening. It’s possible we will launch a formal hiring round within the next month or two.
If we do run a formal hiring round, we’ll email everyone who filled out this expression of interest to invite them to fill out an application for the role.
80,000 Hours’ mission is to get talented people working on the world’s most pressing problems. Since being founded in 2011, we have helped:
Over a million people visit our website each year, and thousands of people have told us that they’ve significantly changed their career plans due to our work. Surveys conducted by our primary funder, Open Philanthropy, show that 80,000 Hours is one of the single biggest drivers of talent moving into work related to reducing global catastrophic risks.
Our most popular pieces are read by over 1,000 people each month, and they are among the most important ways we help people shift their careers towards higher-impact options.
We’re listing these roles together because there’s a lot of overlap in what they’ll focus on, and we suspect some of the same candidates could be strong fits for both.
The main difference is that the writer role focuses more on the craft of writing compelling and informative pieces for the audience, and the writer-researcher role focuses more on supporting the knowledge base that informs the pieces. The ability to write clearly is key to both roles.
As a writer, you would:
As a writer-researcher, you would:
Types of pieces you might work on in either role include:
For both roles, what you’d focus on will depend on your strengths and interests, as well as the needs of our audience. E.g. if you have a particular interest in writing about the skills most needed for pursuing a high-impact career, or in writing about a particular problem area (such as AI safety), you may be able to specialise in that!
For both roles, we’re looking for:
For the writer role we’re also looking for:
For the researcher role we’re looking for:
We’re aware that factors like gender, race, and socioeconomic background can affect people’s willingness to apply for roles. We’d like to especially encourage people from underrepresented backgrounds to apply!
The salary will vary based on experience and the responsibilities you take on, but to give a rough sense, we’d expect the starting salary for someone in this position to be £60,000–70,000 per year plus benefits — and potentially more for someone exceptionally experienced. Our salaries are calculated according to a formula that is transparent to all primary staff at 80,000 Hours.
Staff can work flexible hours. We encourage staff to work whatever schedule (consistent with full time status) will allow them to be most personally effective.
We prefer people to work in-person in our London office if possible. We are open to remote work in some cases. We can sponsor visas.
The start date of the role is flexible, but we would expect you to start during 2024 and prefer you to start approximately as soon as you’re available.
Our benefits include:
To express interest in the role, please fill in this form.
The post Expression of interest: Writer and writer-researcher appeared first on 80,000 Hours.
]]>How do you prevent a new and rapidly evolving technology from spiralling out of control? How can governments, policymakers, and civil society ensure that we’re making the best decisions about how to integrate artificial intelligence into our society?
To answer these kinds of questions, we need people with technical expertise — in machine learning, information security, computing hardware, or other relevant technical domains — to work in AI governance and policy making.
Of course, there are roles for people with many different backgrounds to play in AI governance and policy. Experience in law, international coordination, communications, operations management, and more are all potentially valuable in this space.
But we think people with technical backgrounds may underrate their ability to contribute to AI policy. We’ve long regarded AI technical safety research as an extremely high-impact career option, and we still do. But this sometimes gives readers the impression that if they’ve got a technical background or aptitude, it’s the main path for them to consider if they want to help prevent an AI-related catastrophe.
But this isn’t necessarily true.
Technical knowledge is crucial in AI governance for understanding the current landscape and likely trajectories of the technology, as well as for designing and implementing policies that can reduce the biggest risks. Lennart Heim, an AI governance researcher, provides more details about why these skills are useful in a recent blog post.
We’ve spoken to experts who work in Washington, D.C. who say that technical credentials that some may regard as fairly modest — such as computer science bachelor’s degrees or a master’s in machine learning — can be highly sought after in policy roles.
People with greater experience with AI or related fields could be especially impactful in governance work, particularly if they have backgrounds in the following areas:
Other specific technical backgrounds can also be highly valuable. People with knowledge of virology, for example, could work on reducing the risk from AI-related biological threats.
If you have a technical background and want to work on reducing catastrophic risks from artificial intelligence, we’d encourage you to apply for 1-1 advising with our team to learn about how you might use your skills and what opportunities might be available for you.
We also recommend checking out Emerging Tech Policy Careers, which has extensive resources to learn about opportunities in US policy.
And we’ve been excited to see many promising opportunities coming out of the UK government’s new AI Safety Institute. The US Department of Commerce is also setting up its own AI Safety Institute, and we feature many AI-related roles in the US federal government on our job board. What’s more, the European Union is poised to pass a new AI Act, which will likely require new personnel to implement its provisions — offering even more opportunities.
But important work in AI governance can be done without working directly for a particular government. For example:
These kinds of roles might help people with technical experience pick up some of the policy skills and knowledge they might lack.
For more information on these and related paths, read our career review of AI governance.
This blog post was first released to our newsletter subscribers.
Join over 400,000 newsletter subscribers who get content like this in their inboxes weekly — and we’ll also mail you a free book!
Learn more:
The post The case for taking your technical expertise to the field of AI policy appeared first on 80,000 Hours.
]]>There’s still a lot of uncertainty about which AI governance strategies would be best. But some ideas for policies and strategies that would reduce risk seem promising to us. See, for example, a list of potential policy ideas from Luke Muehlhauser of Open Philanthropy1 and a survey of expert opinion on best practices in AI safety and governance.
But there’s no roadmap here. There’s plenty of room for debate about which policies and proposals are needed.
We may not have found the best ideas yet in this space, and there’s still a lot of work to figure out how promising policies and strategies would work in practice. We hope to see more people enter this field to develop expertise and skills that will contribute to risk-reducing AI governance and coordination.
In a nutshell: Advanced AI systems could have massive impacts on humanity and potentially pose global catastrophic risks. There are opportunities in the broad field of AI governance to positively shape how society responds to and prepares for the challenges posed by the technology.
Given the high stakes, pursuing this career path could be many people’s highest-impact option. But they should be very careful not to accidentally exacerbate the threats rather than mitigate them.
If you are well suited to this career, it may be the best way for you to have a social impact.
Based on an in-depth investigation
“What you’re doing has enormous potential and enormous danger.” — US President Joe Biden, to the leaders of the top AI companies
Artificial intelligence has advanced rapidly. In 2022 and 2023, new language and image generation models gained widespread attention for their abilities, blowing past previous benchmarks.
And the applications of these models are still new; with more tweaking and integration into society, the existing AI systems may become easier to use and more ubiquitous.
We don’t know where all these developments will lead us. There’s reason to be optimistic that AI will eventually help us solve many of the world’s problems, raising living standards and helping us build a more flourishing society.
But there are also substantial risks. Advanced AI could be used to do a lot of harm. And we worry it could accidentally lead to a major catastrophe — and perhaps even cause human disempowerment or extinction. We discuss the arguments that these risks exist in our in-depth problem profile.
Because of these risks, we encourage people to work on finding ways to reduce the danger through technical research and engineering.
But we need a range of strategies for risk reduction. Public policy and corporate governance in particular may be necessary to ensure that advanced AI is broadly beneficial and low risk.
Governance generally refers to the processes, structures, and systems that carry out decision making for organisations and societies at a high level. In the case of AI, we expect the governance structures that matter most to be national governments and organisations developing AI — as well as some international organisations and perhaps subnational governments.
Some aims of AI governance work could include:
We need a community of experts who understand modern AI systems and policy, as well as the severe threats and potential solutions. This field is still young, and many of the paths within it aren’t clear and are not sure to pan out. But there are relevant professional paths that will provide you valuable career capital for a variety of positions and types of roles.
The rest of this article explains what work in this area might involve, how you can develop career capital and test your fit, and some promising places to work.
There are a variety of ways to pursue AI governance strategies, and as the field becomes more mature, the paths are likely to become clearer and more established.
We generally don’t think people early in their careers should aim for a specific high-impact job. They should instead aim to develop skills, experience, knowledge, judgement, networks, and credentials — what we call career capital — that they can use later to have an impact.
This may involve following a standard career trajectory or moving around in different kinds of roles. Sometimes, you just have to apply to many different roles and test your fit for various types of work before you know what you’ll be good at. Most importantly, you should try to get excellent at something for which you have strong personal fit and that will let you contribute to solving pressing problems.
In the AI governance space, we see at least six broad categories of work that we think are important:
Thinking about the different kinds of career capital that are useful for the categories of work that appeal to you may suggest some next steps in your path. (We discuss how to assess your fit and enter this field below.)
You may want to move between these different categories of work at different points in your career. You can also test out your fit for various roles by taking internships, fellowships, entry-level jobs, temporary placements, or even doing independent research, all of which can serve as career capital for a range of paths.
We have also reviewed career paths in AI technical safety research and engineering, information security, and AI hardware expertise, which may be crucial to reducing risks from AI. These fields may also play a significant role in an effective governance agenda. People serious about pursuing a career in AI governance should familiarise themselves with these subjects as well.
Taking a role within an influential government could help you play an important role in the development, enactment, and enforcement of AI policy.
We generally expect that the US federal government will be the most significant player in AI governance for the foreseeable future. This is because of its global influence and its jurisdiction over much of the AI industry, including the most prominent AI companies such as Anthropic, OpenAI, and Google DeepMind. It also has jurisdiction over key parts of the AI chip supply chain. Much of this article focuses on US policy and government.2
But other governments and international institutions matter too. For example, the UK government, the European Union, China, and others may present opportunities for impactful AI governance work. Some US state-level governments, such as California, may have opportunities for impact and gaining career capital.
What would this work involve? Sections below discuss how to enter US policy work and which areas of the government that you might aim for.
In 2023, the US and UK governments both announced new institutes for AI safety — both of which should provide valuable opportunities for career capital and potential impact.
But at the broadest level, people interested in positively shaping AI policy should gain skills and experience to work in areas of government with some connection to AI or emerging technology policy.
This can include roles in: legislative branches, domestic regulation, national security, diplomacy, appropriations and budgeting, and other policy areas.
If you can get a role already working directly on this issue, such as in one of the AI safety institutes or working for a lawmaker focused on AI, that could be a great opportunity.
Otherwise, you should seek to learn as much as you can about how policy works and which government roles might allow you to have the most impact. Try to establish yourself as someone who’s knowledgeable about the AI policy landscape. Having almost any significant government role that touches on some aspect of AI, or having some impressive AI-related credential, may be enough to go quite far.
One way to advance your career in government on a specific topic is what some call “getting visibility.” This involves using your position to learn about the landscape and connect with the actors and institutions in the policy area. You’ll want to engage socially with others in the policy field, get invited to meetings with other officials and agencies, and be asked for input on decisions. If you can establish yourself as a well-regarded expert on an important but neglected aspect of the issue, you’ll have a better shot at being included in key discussions and events.
Career trajectories within government can be broken down roughly as follows:
Read more about how to evaluate your fit and get started building relevant career capital in our article on policy and political skills.
There’s still a lot of research to be done on AI governance strategy and implementation. The world needs more concrete policies that would really start to tackle the biggest threats; developing such policies and deepening our understanding of the strategic needs of the AI governance space are high priorities.
Other relevant research could involve surveys of public and expert opinion, legal research about the feasibility of proposed policies, technical research on issues like compute governance, and even higher-level theoretical research into questions about the societal implications of advanced AI.
Some research, such as that done by Epoch AI, focuses on forecasting the future course of AI developments, which can influence AI governance decisions.
However, several experts we’ve talked to warn that a lot of research on AI governance may prove to be useless. So it’s important to be reflective and seek input from others in the field about what kind of contribution you can make. We list several research organisations below that we think pursue promising research on this topic and could provide useful mentorship.
One approach for testing your fit for this work — especially when starting out — is to write up analyses and responses to existing work on AI policy or investigate some questions in this area that haven’t received much attention. You can then share your work widely, send it out for feedback from people in the field, and evaluate how you enjoy the work and how you might contribute to this field.
But don’t spend too long testing your fit without making much progress, and note that some are best able to contribute when they’re working on a team. So don’t over-invest in independent work, especially if there are few signs it’s working out especially well for you. This kind of project can make sense for maybe a month or a bit longer — but it’s unlikely to be a good idea to spend much more than that without funding or some really encouraging feedback from people working in the field.
If you have the experience to be hired as a researcher, work on AI governance can be done in academia, nonprofit organisations, and think tanks. Some government agencies and committees, too, perform valuable research.
Note that universities and academia have their own priorities and incentives that often aren’t aligned with producing the most impactful work. If you’re already an established researcher with tenure, it may be highly valuable to pivot into work on AI governance — your position may even give you a credible platform from which to advocate for important ideas.
But if you’re just starting out a research career and want to focus on this issue, you should carefully consider whether your work will be best supported inside academia. For example, if you know of a specific programme with particular mentors who will help you pursue answers to critical questions in this field, it might be worth doing. We’re less inclined to encourage people on this path to pursue generic academic-track roles without a clear idea of how they can do important research on AI governance.
Advanced degrees in policy or relevant technical fields may well be valuable, though — see more discussion of this in the section on how to assess your fit and get started.
You can also learn more in our article about how to become a researcher.
Internal policy and corporate governance at the largest AI labs themselves is also important for reducing risks from AI.
At the highest level, deciding who sits on corporate boards, what kind of influence those boards have, and the incentives the organisation faces can have a major impact on a company’s choices. Many of these roles are filled by people with extensive management and organisational leadership experience, such as founding and running companies.
If you’re able to join a policy team at a major company, you can model threats and help develop, implement, and evaluate proposals to reduce risks. And you can build consensus around best practices, such as strong information security, using outside evaluators to find vulnerabilities and dangerous behaviours in AI systems (red teaming), and testing out the latest techniques from the field of AI safety.
And if, as we expect, AI companies face increasing government oversight, ensuring compliance with relevant laws and regulations will be a high priority. Communicating with government actors and facilitating coordination from inside the companies could be impactful work.
In general, it seems better for AI companies to be highly cooperative with each other3 and with outside groups seeking to minimise risks. And this doesn’t seem to be an outlandish hope — many industry leaders have expressed concern about catastrophic risks and have even called for regulation of the frontier technology they’re creating.
That said, cooperation will likely take a lot of effort. Companies creating powerful AI systems may resist some risk-reducing policies, because they’ll have strong incentives to commercialise their products. So getting buy-in from the key players, increasing trust and information-sharing, and building a consensus around high-level safety strategies will be valuable.
People outside of government or AI companies can influence the shape of public policy and corporate governance with advocacy and lobbying.
Advocacy is the general term for efforts to promote certain ideas and shape the public discourse, often around policy-related topics. Lobbying is a more targeted effort aimed at influencing legislation and policy, often by engaging with lawmakers and other officials.
If you believe AI companies may be disposed to advocate for generally beneficial regulation, you might work with them to push the government to adopt specific policies. It’s plausible that AI companies have the best understanding of the technology, as well as the risks, failure modes, and safest paths — and so are best positioned to inform policymakers.
On the other hand, AI companies might have too much of a vested interest in the shape of regulations to reliably advocate for broadly beneficial policies. If that’s right, it may be better to join or create advocacy organisations unconnected from the industry — perhaps supported by donations — that can take stances opposed to commercial interests.
For example, some believe it might be best to deliberately slow down or halt the development of increasingly powerful AI models. Advocates could make this demand of the companies themselves or of the government. But pushing for this step may be difficult for those involved with the companies creating advanced AI systems.
It’s also possible that the best outcomes will result from a balance of perspectives from inside and outside industry.
Advocacy can also:
However, note that advocacy can sometimes backfire because predicting how information will be received isn’t straightforward. Be aware that:
It’s important to keep these risks in mind and consult with others (particularly those who you respect but might disagree with tactically). And you should educate yourself deeply about the topic before explaining it to the public.
You can read more in the section about doing harm below. We also recommend reading our article on ways people trying to do good accidentally make things worse and how to avoid them. And you may find it useful to read our article on the skills needed for communicating important ideas.
Case study: the Center for AI Safety statement
In May 2023, the Center for AI Safety released a single-sentence statement saying: “Mitigating the risk of extinction from AI should be a global priority alongside other societal-scale risks such as pandemics and nuclear war.”
Most notably, the statement was supported by more than 100 signatories, including leaders of major AI companies, including OpenAI, Google Deepmind, and Anthropic, as well as top researchers in the field, Geoffrey Hinton and Yoshua Bengio. It also includes a member of the US Congress, other public officials, economists, philosophers, business leaders, and more.
This statement drew media attention at the time, and UK Prime Minister Rishi Sunak and White House press secretary both reacted to the statement with expressions of concern. Both the UK government and the US government have subsequently moved forward with efforts to start to address these risks.
The statement has also helped clarify and inform the discourse about AI risk, as evidence that being concerned about catastrophes on the scale of human extinction is not a fringe position.
If regulatory measures are put in place to reduce the risks of advanced AI, some agencies and outside organisations will need to audit companies and systems to make sure that regulations are being followed.
Governments often rely on third-party auditors when regulating because the government lacks much of the expertise that the private sector has. There aren’t many such opportunities available in this type of role for AI-related auditing that we know of, but such roles play a critical part of an effective AI governance framework.
AI companies and the AI systems they create may be subject to audits and evaluations out of safety concerns.
One nonprofit, Model Evaluation and Threat Research (METR, formally known as ARC Evals), has been at the forefront of work to evaluate the capabilities of advanced AI models.4 In early 2023, the organisation partnered with two leading AI companies, OpenAI and Anthropic, to evaluate the capabilities of the latest versions of their chatbot models prior to their release. They sought to determine if the models had any potentially dangerous capabilities in a controlled environment.
The companiesvoluntarily cooperated with METR for this project, but at some point in the future, these evaluations may be legally required.
Other types of auditing and evaluation may be required as well. METR has said it intends to develop methods to determine which models are appropriately aligned — that is, that they will behave as their users intend them to behave — prior to release.
Governments may also want to employ auditors to evaluate the amount of compute that AI developers have access to, their information security practices, the uses of models, the data used to train models, and more.
Acquiring the technical skills and knowledge to perform these types of evaluations, and joining organisations that will be tasked to perform them, could be the foundation of a highly impactful career. This kind of work will also likely have to be facilitated by people who can manage complex relationships across industry and government. Someone with experience in both sectors could have a lot to contribute.
Some of these types of roles may have some overlap with work in AI technical safety research.
For someone with the right fit, working to improve coordination with China on the safe development of AI could be a particularly impactful career path.
The Chinese government has been a major funder in the field of AI, and the country has giant tech companies that could potentially drive forward advances.
Given tensions between the US and China, and the risks posed by advanced AI, there’s a lot to be gained from increasing trust, understanding, and coordination between the two countries. The world will likely be much better off if we can avoid a major conflict between great powers and if the most significant players in emerging technology can avoid exacerbating any global risks.
We have a separate career review that goes into more depth on China-related AI safety and governance paths.
As we’ve said, we focus most on US policy and government roles. This is largely because we anticipate that the US is now and will likely continue to be the most pivotal actor when it comes to regulating AI, with a major caveat being China, as discussed in the previous section.
But many people interested in working on this issue can’t or don’t want to work in US policy — perhaps because they live in another country and don’t intend on moving.
Much of the advice above still applies to these people, because roles in AI governance research and advocacy can be done outside of the US.5 And while we don’t think it’s generally as impactful in expectation as US government work, opportunities in other governments and international organisations can be complementary to the work to be done in the US.
The United Kingdom, for instance, may present another strong opportunity for AI policy work that would complement US work. Top UK officials have expressed interest in developing policy around AI, a new international agency, and reducing extreme risks. And the UK government announced the creation of a new AI Foundation Model Taskforce in 2023 to drive forward safety research.
The European Union has shown that its data protection standards — the General Data Protection Regulation (GDPR) — affect corporate behaviour well beyond its geographical boundaries. EU officials have also pushed forward on regulating AI, and some research has explored the hypothesis that the impact of the EU’s AI regulations will extend far beyond the continent — the so-called “Brussels effect.”
And any relatively wealthy country could fund some AI safety research, though much of it requires access to top talent and state of the art tech. Any significant advances in AI safety research could inform researchers working on the most powerful models.
Other countries might also develop liability standards for the creators of AI systems that could incentivise corporations to proceed cautiously before releasing models.
And at some point, there may be AI treaties and international regulations, just as the international community has created the International Atomic Energy Agency, the Biological Weapons Convention, and Intergovernmental Panel on Climate Change to coordinate around and mitigate other global catastrophic threats.
Efforts to coordinate governments around the world to understand and share information about threats posed by AI may end up being extremely important in some future scenarios. The Organisation for Economic Cooperation and Development, for instance, has already created the AI Policy Observatory.
Third-party countries may also be able to facilitate cooperation and reduce tensions betweens the United States and China, whether around AI or other potential flashpoints.
If you’re early on in your career, you should focus first on getting skills and other career capital to successfully contribute to the beneficial governance and regulation of AI.
You can gain career capital for roles in many ways. Broadly speaking, working in or studying fields such as politics, law, international relations, communications, and economics can all be beneficial for going into policy work.
And expertise in AI itself, gained by studying and working in machine learning and technical AI safety, or potentially related fields such as computer hardware and information security, should also give you a big advantage.
Try to find relatively “cheap” tests to assess your fit for different paths. This could mean, for example, taking a policy internship, applying for a fellowship, doing a short bout of independent research, or taking classes or courses on technical machine learning or computer engineering.
It can also involve talking to people doing a job and finding out what the day-to-day experience of the work is and what skills are needed.
All of these factors can be difficult to predict in advance. While we grouped “government work” into a single category above, that label covers a wide range of roles. Finding the right fit can take years, and it can depend on factors out of your control, such as the colleagues you work closely with. That’s one reason it’s useful to build broadly valuable career capital that gives you more options.
Don’t underestimate the value of applying to many relevant openings in the field and sector you’re aiming for to see what happens. You’ll likely face a lot of rejection with this strategy, but you’ll be able to better assess your fit for roles after you see how far you get in the process. This can give you more information than guessing about whether you have the right experience.
Try to rule out certain types of work if you gather evidence that you’re not a strong fit. For example, if you invest a lot of effort trying to get into reputable universities or nonprofit institutions to do AI governance research, but you get no promising offers and receive little encouragement, this might be a significant signal that you’re unlikely to thrive in that path.
That wouldn’t mean you have nothing to contribute, but your comparative advantage may lie elsewhere.
Read the section of our career guide on finding a job that fits you.
A mix of people with technical and policy expertise — and some people with both — is needed in AI governance.
While anyone involved in this field should work to maintain an understanding of both the technical and policy details, you’ll probably start out focusing on either policy or technical skills to gain career capital.
This section covers:
Much of this advice is geared toward roles in the US, though it may be relevant in other contexts.
The chapter of the 80,000 Hours career guide on career capital lists five key components that will be useful in any path: skills and knowledge, connections, credentials, character, and runway.
For most jobs touching on policy, social skills, networking, and — for lack of a better word — political skills will be a huge asset. This can probably be learned to some extent, but some people may find they don’t have these kinds of skills and can’t or don’t want to acquire them.
That’s OK — there are many other routes to having a fulfilling and impactful career, and there may be some roles within this path that demand these skills to a much lesser extent. That’s why testing your fit is important.
Read the full section of the career guide on career capital.
To gain skills in policy, you can pursue education in many relevant fields, such as political science, economics, and law.
Many master’s programmes offer specific coursework on public policy, science and society, security studies, international relations, and other topics; having a graduate degree or law degree will give you a leg up for many positions.
In the US, a master’s, a law degree, or a PhD is particularly useful if you want to climb the federal bureaucracy. Our article on US policy master’s degrees provides detailed information about how to assess the many options.
Internships in DC are a promising route to test your fit for policy and get career capital. Many academic institutions now offer a strategic “Semester in DC” programme, which can let you explore placements in Congress, federal agencies, or think tanks.
The Virtual Student Federal Service (VSFS) also offers part-time, remote government internships. Students in this program work alongside their studies.
Once you have a suitable background, you can take entry-level positions within parts of the government and build a professional network while developing key skills. In the US, you can become a congressional staffer, or take a position at a relevant federal department, such as the Department of Commerce, Department of Energy, or the Department of State. Alternatively, you can gain experience in think tanks (a particularly promising option if you have an aptitude for research). Some government contractors can also be a strong option.
Many people say Washington, D.C. has a unique culture, particularly for those working in and around the federal government. There’s a big focus on networking, bureaucratic politics, status-seeking, and influence-peddling. We’ve also been told that while merit matters to a degree in US government work, it is not the primary determinant of who is most successful. People who think they wouldn’t feel able or comfortable to be in this kind of environment for the long term should consider whether other paths would be best.
If you find you can enjoy government and political work, impress your colleagues, and advance in your career, though, you may be a good fit. Just being able to thrive in government work can be a valuable comparative advantage.
US citizenship
Your citizenship may affect which opportunities are available to you. Many of the most important AI governance roles within the US — particularly in the executive branch and Congress — are only open to, or will at least heavily favour, American citizens. All key national security roles that might be especially important will be restricted to those with US citizenship, which is required to obtain a security clearance.
This may mean that those who lack US citizenship will want to consider not pursuing roles that require it. Alternatively, they could plan to move to the US and pursue the long process of becoming a citizen. For more details on immigration pathways and types of policy work available to non-citizens, see this post on working in US policy as a foreign national. Consider also participating in the annual diversity visa lottery if you’re from an eligible country, as this is low effort and allows you to win a US green card if you’re lucky.
Technical experience in machine learning, AI hardware, and related fields can be a valuable asset for an AI governance career. So it will be very helpful if you’ve studied a relevant subject area for an undergraduate or graduate degree, or did a particularly productive course of independent study.
We have a guide to technical AI safety careers, which explains how to learn the basics of machine learning.
Working at an AI company or lab in technical roles, or other companies that use advanced AI systems and hardware, may also provide significant career capital in AI policy paths. Read our career review discussing the pros and cons of working at a top AI company.
We also have a separate career review on how becoming an expert in AI hardware could be very valuable in governance work.
Many politicians and policymakers are generalists, as their roles require them to work in many different subject areas and on different types of problems. This means they’ll need to rely on expert knowledge when crafting and implementing policy on AI technology that they don’t fully understand. So if you can provide them this information, especially if you’re skilled at communicating it clearly, you can potentially fill influential roles.
Some people who may have initially been interested in pursuing a technical AI safety career, but who have found that they either are no longer interested in that path or find more promising policy opportunities, might also decide that they can effectively pivot into a policy-oriented career.
It is common for people with STEM backgrounds to enter and succeed in US policy careers. People with technical credentials that they may regard as fairly modest — such as a computer science bachelor’s degree or a master’s in machine learning — often find their knowledge is highly valued in Washington, DC.
Most DC jobs don’t have specific degree requirements, so you don’t need to have a policy degree to work in DC. Roles specifically addressing science and technology policy are particularly well-suited for people with technical backgrounds, and people hiring for these roles will value higher credentials like a master’s or, better even, a terminal degree like a PhD or MD.
There are many fellowship programmes specifically aiming to support people with STEM backgrounds to enter policy careers; some are listed below.
Policy work won’t be right for everybody — many technical experts may not have the right disposition or skills. People in policy paths often benefit from strong writing and social skills, as well as being comfortable navigating bureaucracies and working with people holding very different motivations and worldviews.
There are other ways to gain useful career capital that could be applied in this career path.
Because this is one of our priority paths, if you think this path might be a great option for you, we’d be especially excited to advise you on next steps, one-on-one. We can help you consider your options, make connections with others working in the same field, and possibly even help you find jobs or funding opportunities.
Since successful AI governance will require work from governments, industry, and other parties, there will be many potential jobs and places to work for people in this path. The landscape will likely shift over time, so if you’re just starting out on this path, the places that seem most important might be different by the time you’re pivoting to using your career capital to make progress on the issue.
Within the US government, for instance, it’s not clear which bodies will be most impactful when it comes to AI policy in five years. It will likely depend on choices that are made in the meantime.
That said, it seems useful to give our understanding of which parts of the government are generally influential in technology governance and most involved right now to help you orient. Gaining AI-related experience in government right now should still serve you well if you end up wanting to move into a more impactful AI-related role down the line when the highest-impact areas to work in are clearer.
We’ll also give our current sense of important actors outside government where you might be able to build career capital and potentially have a big impact.
Note that this list has by far the most detail about places to work within the US government. We would like to expand it to include more options over time. (Note: the fact that an option isn’t on this list shouldn’t be taken to mean we recommend against it or even that it would necessarily be less impactful than the places listed.)
We have more detail on other options in separate (and older) career reviews, including the following:
Here are some of the places where someone could do promising work or gain valuable career capital:
In Congress, you can either work directly for lawmakers themselves or as staff on legislative committees. Staff roles on the committees are generally more influential on legislation and more prestigious, but for that reason, they’re more competitive. If you don’t have that much experience, you could start out in an entry-level job staffing a lawmaker and then later try to transition to staffing a committee.
Some people we’ve spoken to expect the following committees — and some of their subcommittees — in the House and Senate to be most impactful in the field of AI. You might aim to work on these committees or for lawmakers who have significant influence on these committees.
House of Representatives
Senate
The Congressional Research Service, a nonpartisan legislative agency, also offers opportunities to conduct research that can impact policy design across all subjects.
In general, we don’t recommend taking low-ranking jobs within the executive branch for this path because it’s very difficult to progress your career through the bureaucracy at this level. It’s better to get a law degree or a relevant graduate degree, which can give you the opportunity to start with more seniority.
The influence of different agencies over AI regulation may shift over time. For example, in late 2023, the federal government announced the creation of the US Artificial Intelligence Safety Institute, which may be a particularly promising place to work.
Whichever agency may be most influential in the future, it will be useful to accrue career capital working effectively in government, creating a professional network, learning about day-to-day policy work, and deepening your knowledge of all things AI.
We have a lot of uncertainty about this topic, but here are some of the agencies that may have significant influence on at least one key dimension of AI policy as of this writing:
Readers can find listings for roles in these departments and agencies at the federal government’s job board, USAJOBS; a more curated list of openings for potentially high impact roles and career capital is on the 80,000 Hours job board.
We do not currently recommend attempting to join the US government via the military if you are aiming for a career in AI policy. There are many levels of seniority to rise through, many people competing for places, and initially you have to spend all of your time doing work unrelated to AI.
However, having military experience already can be valuable career capital for other important roles in government, particularly national security positions. We would consider this route more competitive for military personnel who have been to an elite military academy, such as West Point, or for commissioned officers at rank O-3 or above.
Policy fellowships are among the best entryways into policy work. They offer many benefits like first-hand policy experience, funding, training, mentoring, and networking. While many require an advanced degree, some are open to college graduates.
(Read our career review discussing the pros and cons of working at a top AI company.)
Our job board features opportunities in AI safety and policy:
As we discuss in an article on accidental harm, there are many ways to set back a new field that you’re working in when you’re trying to do good, and this could mean your impact is negative rather than positive. (You may also want to read our article on harmful careers.)
There’s a lot of potential to inadvertently cause harm in the emerging field of AI governance. We discussed some possibilities in the section on advocacy and lobbying. Some other possibilities include:
We have to act with incomplete information, so it may never be very clear when or if people in AI governance are falling into these traps. Being aware that they are potential ways of causing harm will help you keep alert for these possibilities, though, and you should remain open to changing course if you find evidence that your actions may be damaging.
And we recommend keeping in mind the following pieces of general guidance from our article on accidental harm:
We think this work is exceptionally pressing and valuable, so we encourage our readers who might be interested to test their fit for governance work. But going into government, in particular, can be difficult. Some people we’ve advised have gone into policy roles with the hope of having an impact, only to burn out and move on.
At the same time, many policy practitioners find their work very meaningful, interesting, and varied.
Some roles in government may be especially challenging for the following reasons:
So we recommend speaking to people in the kinds of positions you might aim to have in order to get a sense of whether the career path would be right for you. And if you do choose to pursue it, look out for signs that the work may be having a negative effect on you and seek support from people who understand what you care about.
If you end up wanting or needing to leave and transition into a new path, that’s not necessarily a loss or a reason for regret. You will likely make important connections and learn a lot of useful information and skills. This career capital can be useful as you transition into another role, perhaps pursuing a complementary approach to AI governance.
We’ve been concerned about risks posed by AI for years. Based on the arguments that this technology could potentially cause a global catastrophe, and otherwise have a dramatic impact on future generations, we’ve advised many people to work to mitigate the risks.
The arguments for the risk aren’t completely conclusive, in our view. But the arguments are worth taking seriously, and given the fact that few others in the world seemed to be devoting much time to even figuring out how big the threat was or how to mitigate it (while at the same time progress in making AI systems more powerful was accelerating) we concluded it was worth ranking among our top priorities.
Now that there’s increased attention on AI, some might conclude that it’s less neglected and thus less pressing to work on. However, the increased attention on AI also makes many interventions potentially more tractable than they had been previously, as policymakers and others are more open to the idea of crafting AI regulations.
And while more attention is now being paid to AI, it’s not clear it will be focused on the most important risks. So there’s likely still a lot of room for important and pressing work positively shaping the development of AI policy.
If you’re interested in this career path, we recommend checking out some of the following articles next.
These degrees are highly valuable for those hoping to take on important roles in the US federal government.
The US government is likely to be a key actor in how advanced AI is developed and used in society, whether directly or indirectly.
Working at a leading AI company is an important career option to consider, but the impact of any given role is complex to assess.
AI might bring huge benefits — if we avoid the risks.
Want to consider more paths? See our list of the highest-impact career paths according to our research.
The post AI governance and coordination appeared first on 80,000 Hours.
]]>