#159 – Jan Leike on OpenAI's massive push to make superintelligence safe in 4 years or less

In July, OpenAI announced a new team and project: Superalignment. The goal is to figure out how to make superintelligent AI systems aligned and safe to use within four years, and the lab is putting a massive 20% of its computational resources behind the effort.

Today’s guest, Jan Leike, is Head of Alignment at OpenAI and will be co-leading the project. As OpenAI puts it, “…the vast power of superintelligence could be very dangerous, and lead to the disempowerment of humanity or even human extinction. … Currently, we don’t have a solution for steering or controlling a potentially superintelligent AI, and preventing it from going rogue.”

Given that OpenAI is in the business of developing superintelligent AI, it sees that as a scary problem that urgently has to be fixed. So it’s not just throwing compute at the problem — it’s also hiring dozens of scientists and engineers to build out the Superalignment team.

Plenty of people are pessimistic that this can be done at all, let alone in four years. But Jan is guardedly optimistic. As he explains:

Honestly, it really feels like we have a real angle of attack on the problem that we can actually iterate on… and I think it’s pretty likely going to work, actually. And that’s really, really wild, and it’s really exciting. It’s like we have this hard problem that we’ve been talking about for years and years and years, and now we have a real shot at actually solving it. And that’d be so good if we did.

Jan thinks that this work is actually the most scientifically interesting part of machine learning. Rather than just throwing more chips and more data at a training run, this work requires actually understanding how these models work and how they think. The answers are likely to be breakthroughs on the level of solving the mysteries of the human brain.

The plan, in a nutshell, is to get AI to help us solve alignment. That might sound a bit crazy — as one person described it, “like using one fire to put out another fire.”

But Jan’s thinking is this: the core problem is that AI capabilities will keep getting better and the challenge of monitoring cutting-edge models will keep getting harder, while human intelligence stays more or less the same. To have any hope of ensuring safety, we need our ability to monitor, understand, and design ML models to advance at the same pace as the complexity of the models themselves.

And there’s an obvious way to do that: get AI to do most of the work, such that the sophistication of the AIs that need aligning, and the sophistication of the AIs doing the aligning, advance in lockstep.

Jan doesn’t want to produce machine learning models capable of doing ML research. But such models are coming, whether we like it or not. And at that point Jan wants to make sure we turn them towards useful alignment and safety work, as much or more than we use them to advance AI capabilities.

Jan thinks it’s so crazy it just might work. But some critics think it’s simply crazy. They ask a wide range of difficult questions, including:

  • If you don’t know how to solve alignment, how can you tell that your alignment assistant AIs are actually acting in your interest rather than working against you? Especially as they could just be pretending to care about what you care about.
  • How do you know that these technical problems can be solved at all, even in principle?
  • At the point that models are able to help with alignment, won’t they also be so good at improving capabilities that we’re in the middle of an explosion in what AI can do?

In today’s interview host Rob Wiblin puts these doubts to Jan to hear how he responds to each, and they also cover:

  • OpenAI’s current plans to achieve ‘superalignment’ and the reasoning behind them
  • Why alignment work is the most fundamental and scientifically interesting research in ML
  • The kinds of people he’s excited to hire to join his team and maybe save the world
  • What most readers misunderstood about the OpenAI announcement
  • The three ways Jan expects AI to help solve alignment: mechanistic interpretability, generalization, and scalable oversight
  • What the standard should be for confirming whether Jan’s team has succeeded
  • Whether OpenAI should (or will) commit to stop training more powerful general models if they don’t think the alignment problem has been solved
  • Whether Jan thinks OpenAI has deployed models too quickly or too slowly
  • The many other actors who also have to do their jobs really well if we’re going to have a good AI future
  • Plenty more

Get this episode by subscribing to our podcast on the world’s most pressing problems and how to solve them: type ‘80,000 Hours’ into your podcasting app. Or read the transcript below.

Producer and editor: Keiran Harris
Audio Engineering Lead: Ben Cordell
Technical editing: Simon Monsour and Milo McGuire
Additional content editing: Katy Moore and Luisa Rodriguez
Transcriptions: Katy Moore

Continue reading →

What recent events mean for AI governance career paths

The idea this week: AI governance careers present some of the best opportunities to change the world for the better that we’ve found.

Last week, US Senator Richard Blumenthal gave a stark warning during a subcommittee hearing on artificial intelligence.

He’s become deeply concerned about the potential for an “intelligence device out of control, autonomous, self-replicating, potentially creating diseases, pandemic-grade viruses, or other kinds of evils — purposely engineered by people, or simply the result of mistakes, no malign intention.”

We’ve written about these kinds of dangers — potentially rising to the extreme of an extinction-level event — in our problem profile on preventing an AI-related catastrophe.

“These fears need to be addressed, and I think can be addressed,” the senator continued. “I’ve come to the conclusion that we need some kind of regulatory agency.”

And the senator from Connecticut isn’t the only one:

  • The White House has led a coalition of the top AI companies to coordinate on risk-reducing measures, and they recently announced a joint voluntary commitment to some key safety principles. President Joe Biden and Vice President Kamala Harris have been directly involved in these efforts, with the president himself saying the technology will require “new laws, regulation, and oversight.”
  • Four top companies developing advanced AI systems — Anthropic,

Continue reading →

    #158 – Holden Karnofsky on how AIs might take over even if they're no smarter than humans, and his 4-part playbook for AI risk

    Back in 2007, Holden Karnofsky cofounded GiveWell, where he sought out the charities that most cost-effectively helped save lives. He then cofounded Open Philanthropy, where he oversaw a team making billions of dollars’ worth of grants across a range of areas: pandemic control, criminal justice reform, farmed animal welfare, and making AI safe, among others. This year, having learned about AI for years and observed recent events, he’s narrowing his focus once again, this time on making the transition to advanced AI go well.

    In today’s conversation, Holden returns to the show to share his overall understanding of the promise and the risks posed by machine intelligence, and what to do about it. That understanding has accumulated over around 14 years, during which he went from being sceptical that AI was important or risky, to making AI risks the focus of his work.

    (As Holden reminds us, his wife is also the president of one of the world’s top AI labs, Anthropic, giving him both conflicts of interest and a front-row seat to recent events. For our part, Open Philanthropy is 80,000 Hours’ largest financial supporter.)

    One point he makes is that people are too narrowly focused on AI becoming ‘superintelligent.’ While that could happen and would be important, it’s not necessary for AI to be transformative or perilous. Rather, machines with human levels of intelligence could end up being enormously influential simply if the amount of computer hardware globally were able to operate tens or hundreds of billions of them, in a sense making machine intelligences a majority of the global population, or at least a majority of global thought.

    As Holden explains, he sees four key parts to the playbook humanity should use to guide the transition to very advanced AI in a positive direction: alignment research, standards and monitoring, creating a successful and careful AI lab, and finally, information security.

    In today’s episode, host Rob Wiblin interviews return guest Holden Karnofsky about that playbook, as well as:

    • Why we can’t rely on just gradually solving those problems as they come up, the way we usually do with new technologies.
    • What multiple different groups can do to improve our chances of a good outcome — including listeners to this show, governments, computer security experts, and journalists.
    • Holden’s case against ‘hardcore utilitarianism’ and what actually motivates him to work hard for a better world.
    • What the ML and AI safety communities get wrong in Holden’s view.
    • Ways we might succeed with AI just by dumb luck.
    • The value of laying out imaginable success stories.
    • Why information security is so important and underrated.
    • Whether it’s good to work at an AI lab that you think is particularly careful.
    • The track record of futurists’ predictions.
    • And much more.

    Get this episode by subscribing to our podcast on the world’s most pressing problems and how to solve them: type ‘80,000 Hours’ into your podcasting app. Or read the transcript below.

    Producer: Keiran Harris
    Audio Engineering Lead: Ben Cordell
    Technical editing: Simon Monsour and Milo McGuire
    Transcriptions: Katy Moore

    Continue reading →

    Why many people underrate investigating the problem they work on

    The idea this week: thinking about which world problem is most pressing may matter more than you realise.

    I’m an advisor for 80,000 Hours, which means I talk to a lot of thoughtful people who genuinely want to have a positive impact with their careers. One piece of advice I consistently find myself giving is to consider working on pressing world problems you might not have explored yet.

    Should you work on climate change or AI risk? Mitigating antibiotic resistance or preventing bioterrorism? Preventing disease in low-income countries or reducing the harms of factory farming?

    Your choice of problem area can matter a lot. But I think a lot of people under-invest in building a view of which problems they think are most pressing.

    I think there are three main reasons for this:

    1. They think they can’t get a job working on a certain problem, so the argument that it’s important doesn’t seem relevant.

    I see this most frequently with AI. People think that they don’t have aptitude or interest in machine learning, so they wouldn’t be able to contribute to mitigating catastrophic risks from AI.

    But I don’t think this is true.

    Continue reading →

    #157 – Ezra Klein on existential risk from AI and what DC could do about it

    In Oppenheimer, scientists detonate a nuclear weapon despite thinking there’s some ‘near zero’ chance it would ignite the atmosphere, putting an end to life on Earth. Today, scientists working on AI think the chance their work puts an end to humanity is vastly higher than that.

    In response, some have suggested we launch a Manhattan Project to make AI safe via enormous investment in relevant R&D. Others have suggested that we need international organisations modelled on those that slowed the proliferation of nuclear weapons. Others still seek a research slowdown by labs while an auditing and licencing scheme is created.

    Today’s guest — journalist Ezra Klein of The New York Times — has watched policy discussions and legislative battles play out in DC for 20 years. Like many people he has also taken a big interest in AI this year, writing articles such as “This changes everything.” In his first interview on the show in 2021, he flagged AI as one topic that DC would regret not having paid more attention to.

    So we invited him on to get his take on which regulatory proposals have promise, and which seem either unhelpful or politically unviable.

    Out of the ideas on the table right now, Ezra favours a focus on direct government funding — both for AI safety research and to develop AI models designed to solve problems other than making money for their operators. He is sympathetic to legislation that would require AI models to be legible in a way that none currently are — and embraces the fact that that will slow down the release of models while businesses figure out how their products actually work.

    By contrast, he’s pessimistic that it’s possible to coordinate countries around the world to agree to prevent or delay the deployment of dangerous AI models — at least not unless there’s some spectacular AI-related disaster to create such a consensus. And he fears attempts to require licences to train the most powerful ML models will struggle unless they can find a way to exclude and thereby appease people working on relatively safe consumer technologies rather than cutting-edge research.

    From observing how DC works, Ezra expects that even a small community of experts in AI governance can have a large influence on how the the US government responds to AI advances. But in Ezra’s view, that requires those experts to move to DC and spend years building relationships with people in government, rather than clustering elsewhere in academia and AI labs.

    In today’s brisk conversation, Ezra and host Rob Wiblin cover the above as well as:

    • Whether it’s desirable to slow down AI research
    • The value of engaging with current policy debates even if they don’t seem directly important
    • Which AI business models seem more or less dangerous
    • Tensions between people focused on existing vs emergent risks from AI
    • Two major challenges of being a new parent

    Get this episode by subscribing to our podcast on the world’s most pressing problems and how to solve them: type ‘80,000 Hours’ into your podcasting app. Or read the transcript below.

    Producer: Keiran Harris
    Audio Engineering Lead: Ben Cordell
    Technical editing: Milo McGuire
    Transcriptions: Katy Moore

    Continue reading →

    How many lives does a doctor save? (Part 3)

    This is Part 3 of an updated version of a classic three-part series of 80,000 Hours blog posts. You can also read updated versions of Part 1 and Part 2. You can still read the original version of the series published in 2012.

    It’s fair to say working as a doctor does not look that great so far. In general, the day-to-day work of medicine has had a relatively minor role in why people are living longer and healthier now than they did historically. When we try and quantify the benefit of someone becoming a doctor, the figure gets lower the better the method of estimation and already is low enough such that a 40-year medical career somewhere like the UK would be on a rough par with giving $20,000 dollars to a GiveWell top charity in terms of saving lives.

    Yet there is more to say. The tools we have used to arrive at estimates are general, so they are estimating something like the impact of the modal, median, or typical medical career. There are doctors who have plainly done much more good than my estimates of the impact of a typical doctor.

    So, what could a doctor do to really save a lot of lives?

    Doing doctoring better

    What about just being really, really good? Even if the typical doctor’s work makes a worthwhile — but modest and fairly replaceable — contribution,

    Continue reading →

    How many lives does a doctor save? (Part 2)

    This is Part 2 of an updated version of a classic three-part series of 80,000 Hours blog posts. You can also read updated versions of Part 1 and Part 3. You can still read the original version of the series published in 2012.

    In the last post, we saw that although the reasons people live longer and healthier now have more to do with higher living standards than more medical care, medicine still plays a part. If you try and quantify how much medicine contributes to our increased longevity and health, then divide that amount by the number of doctors providing it, you get an estimate that a UK doctor saves ~70 lives over the course of their career.

    Yet this won’t be a good model of how much good you would actually do if you became a doctor in the UK.

    For one thing, the relationship between more doctors and better health is non-linear. Here’s a scatterplot for each country with doctors per capita on the x-axis and DALYs per capita on the y-axis (since you ‘gain’ DALYs for dying young or being sick, less is better):

    The association shows an initial steep decline between 0–50 doctors per 100,000 people, then levels off abruptly and is basically flat when you get to physician densities in richer countries (e.g. the UK has 300 doctors per 100,000 people). Assuming this is causation rather than correlation (more on that later),

    Continue reading →

    How many lives does a doctor save? (Part 1)

    This is Part 1 of an updated version of a classic three-part series of 80,000 Hours blog posts. You can also read updated versions of Part 2 and Part 3. You can still read the original version of the series published in 2012.

    Doctors have a reputation as do-gooders. So when I was a 17-year-old kid wanting to make a difference, it seemed like a natural career path. I wrote this on my medical school application:

    I want to study medicine because of a desire I have to help others, and so the chance of spending a career doing something worthwhile I can’t resist. Of course, Doctors [sic] don’t have a monopoly on altruism, but I believe the attributes I have lend themselves best to medicine, as opposed to all the other work I could do instead.

    They still let me in.

    When I show this to others in medicine, I get a mix of laughs and groans of recognition. Most of them wrote something similar. The impression I get from senior doctors who have to read this stuff is they see it a bit like a toddler zooming around on their new tricycle: a mostly endearing (if occasionally annoying) work in progress. Season them enough with the blood, sweat, and tears of clinical practice, and they’ll generally turn out as wiser, perhaps more cantankerous, but ultimately humane doctors.

    Yet more important than me being earnest — and even me being trite — was that I was wrong.

    Continue reading →

    Hannah Boettcher on the mental health challenges that come with trying to have a big impact

    In this episode of 80k After Hours, Luisa Rodriguez and Hannah Boettcher discuss various approaches to therapy, and how to use them in practice — focusing specifically on people trying to have a big impact.

    They cover:

    • The effectiveness of therapy, and tips for finding a therapist
    • Moral demandingness
    • Internal family systems-style therapy
    • Motivation and burnout
    • Exposure therapy
    • Grappling with world problems and x-risk
    • Perfectionism and imposter syndrome
    • And the risk of over-intellectualising

    Who this episode is for:

    • High-impact focused people who struggle with moral demandingness, perfectionism, or imposter syndrome
    • People who feel anxious thinking about the end of the world
    • 80,000 Hours Podcast hosts with the initials LR

    Who this episode isn’t for:

    • People who aren’t focused on having a big impact
    • People who don’t struggle with any mental health issues
    • Founders of Scientology with the initials LRH

    Get this episode by subscribing to our podcast on the world’s most pressing problems and how to solve them: type ‘80,000 Hours’ into your podcasting app. Or read the transcript below.

    Producer: Keiran Harris
    Audio Engineering Lead: Ben Cordell
    Technical editing: Dominic Armstrong
    Content editing: Katy Moore, Luisa Rodriguez, and Keiran Harris
    Transcriptions: Katy Moore

    Gershwin – Rhapsody in Blue, original 1924 version” by Jason Weinberger is licensed under creative commons

    Continue reading →

    #156 – Markus Anderljung on how to regulate cutting-edge AI models

    In today’s episode, host Luisa Rodriguez interviews the Head of Policy at the Centre for the Governance of AI — Markus Anderljung — about all aspects of policy and governance of superhuman AI systems.

    They cover:

    • The need for AI governance, including self-replicating models and ChaosGPT
    • Whether or not AI companies will willingly accept regulation
    • The key regulatory strategies including licencing, risk assessment, auditing, and post-deployment monitoring
    • Whether we can be confident that people won’t train models covertly and ignore the licencing system
    • The progress we’ve made so far in AI governance
    • The key weaknesses of these approaches
    • The need for external scrutiny of powerful models
    • The emergent capabilities problem
    • Why it really matters where regulation happens
    • Advice for people wanting to pursue a career in this field
    • And much more.

    Get this episode by subscribing to our podcast on the world’s most pressing problems and how to solve them: type ‘80,000 Hours’ into your podcasting app. Or read the transcript below.

    Producer: Keiran Harris
    Audio Engineering Lead: Ben Cordell
    Technical editing: Simon Monsour and Milo McGuire
    Transcriptions: Katy Moore

    Continue reading →

    What the war in Ukraine shows us about catastrophic risks

    A new great power war could be catastrophic for humanity — but there are meaningful ways to reduce the risk.

    We’re now in the 17th month of the war in Ukraine. But at the start, it was hard to foresee it would last this long. Many expected Russian troops to take Ukraine’s capital, Kyiv, in weeks. Already, more than 100,000 people, including civilians, have been killed and over 300,000 more injured. Many more will die before the war ends.

    The sad and surprising escalation of the war shows why international conflict remains a major global risk. I explain why working to lower the danger is a potentially high-impact career choice in a new problem profile on great power war.

    As Russia’s disastrous invasion demonstrates, it’s hard to predict how much a conflict will escalate. Most wars remain relatively small, but a few will become terrifyingly large. US officials estimate about 70,000 Russian and Ukrainian soldiers have died in battle so far. That means this war is already worse than 80% of all the wars humanity has experienced in the last 200 years.

    But the worst wars humanity has fought are hundreds of times larger than the war in Ukraine currently is. World War II killed 66 million people, for example — perhaps the single deadliest event in human history.


    Author’s figure. See the data here. Data source: Sarkees,

    Continue reading →

    #155 – Lennart Heim on the compute governance era and what has to come after

    As AI advances ever more quickly, concerns about potential misuse of highly capable models are growing. From hostile foreign governments and terrorists to reckless entrepreneurs, the threat of AI falling into the wrong hands is top of mind for the national security community.

    With growing concerns about the use of AI in military applications, the US has banned the export of certain types of chips to China.

    But unlike the uranium required to make nuclear weapons, or the material inputs to a bioweapons programme, computer chips and machine learning models are absolutely everywhere. So is it actually possible to keep dangerous capabilities out of the wrong hands?

    In today’s interview, Lennart Heim — who researches compute governance at the Centre for the Governance of AI — explains why limiting access to supercomputers may represent our best shot.

    As Lennart explains, an AI research project requires many inputs, including the classic triad of compute, algorithms, and data.

    If we want to limit access to the most advanced AI models, focusing on access to supercomputing resources — usually called ‘compute’ — might be the way to go. Both algorithms and data are hard to control because they live on hard drives and can be easily copied. By contrast, advanced chips are physical items that can’t be used by multiple people at once and come from a small number of sources.

    According to Lennart, the hope would be to enforce AI safety regulations by controlling access to the most advanced chips specialised for AI applications. For instance, projects training ‘frontier’ AI models — the newest and most capable models — might only gain access to the supercomputers they need if they obtain a licence and follow industry best practices.

    We have similar safety rules for companies that fly planes or manufacture volatile chemicals — so why not for people producing the most powerful and perhaps the most dangerous technology humanity has ever played with?

    But Lennart is quick to note that the approach faces many practical challenges. Currently, AI chips are readily available and untracked. Changing that will require the collaboration of many actors, which might be difficult, especially given that some of them aren’t convinced of the seriousness of the problem.

    Host Rob Wiblin is particularly concerned about a different challenge: the increasing efficiency of AI training algorithms. As these algorithms become more efficient, what once required a specialised AI supercomputer to train might soon be achievable with a home computer.

    By that point, tracking every aggregation of compute that could prove to be very dangerous would be both impractical and invasive.

    With only a decade or two left before that becomes a reality, the window during which compute governance is a viable solution may be a brief one. Top AI labs have already stopped publishing their latest algorithms, which might extend this ‘compute governance era’, but not for very long.

    If compute governance is only a temporary phase between the era of difficult-to-train superhuman AI models and the time when such models are widely accessible, what can we do to prevent misuse of AI systems after that point?

    Lennart and Rob both think the only enduring approach requires taking advantage of the AI capabilities that should be in the hands of police and governments — which will hopefully remain superior to those held by criminals, terrorists, or fools. But as they describe, this means maintaining a peaceful standoff between AI models with conflicting goals that can act and fight with one another on the microsecond timescale. Being far too slow to follow what’s happening — let alone participate — humans would have to be cut out of any defensive decision-making.

    Both agree that while this may be our best option, such a vision of the future is more terrifying than reassuring.

    Lennart and Rob discuss the above as well as:

    • How can we best categorise all the ways AI could go wrong?
    • Why did the US restrict the export of some chips to China and what impact has that had?
    • Is the US in an ‘arms race’ with China or is that more an illusion?
    • What is the deal with chips specialised for AI applications?
    • How is the ‘compute’ industry organised?
    • Downsides of using compute as a target for regulations
    • Could safety mechanisms be built into computer chips themselves?
    • Who would have the legal authority to govern compute if some disaster made it seem necessary?
    • The reasons Rob doubts that any of this stuff will work
    • Could AI be trained to operate as a far more severe computer worm than any we’ve seen before?
    • What does the world look like when sluggish human reaction times leave us completely outclassed?
    • And plenty more

    Get this episode by subscribing to our podcast on the world’s most pressing problems and how to solve them: type ‘80,000 Hours’ into your podcasting app. Or read the transcript below.

    Producer: Keiran Harris
    Audio mastering: Milo McGuire, Dominic Armstrong, and Ben Cordell
    Transcriptions: Katy Moore

    Continue reading →

    Great power war

    Economic growth and technological progress have bolstered the arsenals of the world’s most powerful countries. That means the next war between them could be far worse than World War II, the deadliest conflict humanity has yet experienced.

    Could such a war actually occur? We can’t rule out the possibility. Technical accidents or diplomatic misunderstandings could spark a conflict that quickly escalates. Or international tension could cause leaders to decide they’re better off fighting than negotiating.

    It seems hard to make progress on this problem. It’s also less neglected than some of the problems that we think are most pressing. There are certain issues, like making nuclear weapons or military artificial intelligence systems safer, which seem promising — although it may be more impactful to work on reducing risks from AI, bioweapons or nuclear weapons directly. You might also be able to reduce the chances of misunderstandings and miscalculations by developing expertise in one of the most important bilateral relationships (such as that between the United States and China).

    Finally, by making conflict less likely, reducing competitive pressures on the development of dangerous technology, and improving international cooperation, you might be helping to reduce other risks, like the chance of future pandemics.

    Continue reading →

    How to cope with rejection in your career

    The idea this week: getting rejected from jobs can be crushing — but learning how to deal with rejection productively is an incredibly valuable skill.

    I’ve been rejected many, many times. In 2015, I applied to ten PhD programs and was rejected from nine. After doing a summer internship with GiveWell in 2016, I wasn’t offered a full-time role. In 2017, I was rejected by J-PAL, IDinsight, and Founders Pledge (among others). Around the same time, I was so afraid of being rejected by Open Philanthropy, I dropped out of their hiring round.

    I now have what I consider a dream job at 80,000 Hours: I get to host a podcast about the world’s most pressing problems and how to solve them. But before getting a job offer from 80,000 Hours in 2020, I got rejected by them for a role in 2018. That rejection hurt the most.

    I still remember compulsively checking my phone after my work trial to see if 80,000 Hours had made me an offer. And I still remember waking up at 5:00 AM, checking my email, and finding the kind and well-written — but devastating — rejection: “Unfortunately we don’t think the role is the right fit right now.”

    And I remember being so sad that I took a five-hour bus ride to stay with a friend so I wouldn’t have to be alone.

    Continue reading →

    #154 – Rohin Shah on DeepMind and trying to fairly hear out both AI doomers and doubters

    Can there be a more exciting and strange place to work today than a leading AI lab? Your CEO has said they’re worried your research could cause human extinction. The government is setting up meetings to discuss how this outcome can be avoided. Some of your colleagues think this is all overblown; others are more anxious still.

    Today’s guest — machine learning researcher Rohin Shah — goes into the Google DeepMind offices each day with that peculiar backdrop to his work.

    He’s on the team dedicated to maintaining ‘technical AI safety’ as these models approach and exceed human capabilities: basically that the models help humanity accomplish its goals without flipping out in some dangerous way. This work has never seemed more important.

    In the short-term it could be the key bottleneck to deploying ML models in high-stakes real-life situations. In the long-term, it could be the difference between humanity thriving and disappearing entirely.

    For years Rohin has been on a mission to fairly hear out people across the full spectrum of opinion about risks from artificial intelligence — from doomers to doubters — and properly understand their point of view. That makes him unusually well placed to give an overview of what we do and don’t understand. He has landed somewhere in the middle — troubled by ways things could go wrong, but not convinced there are very strong reasons to expect a terrible outcome.

    Today’s conversation is wide-ranging and Rohin lays out many of his personal opinions to host Rob Wiblin, including:

    • What he sees as the strongest case both for and against slowing down the rate of progress in AI research.
    • Why he disagrees with most other ML researchers that training a model on a sensible ‘reward function’ is enough to get a good outcome.
    • Why he disagrees with many on LessWrong that the bar for whether a safety technique is helpful is “could this contain a superintelligence.”
    • That he thinks nobody has very compelling arguments that AI created via machine learning will be dangerous by default, or that it will be safe by default. He believes we just don’t know.
    • That he understands that analogies and visualisations are necessary for public communication, but is sceptical that they really help us understand what’s going on with ML models, because they’re different in important ways from every other case we might compare them to.
    • Why he’s optimistic about DeepMind’s work on scalable oversight, mechanistic interpretability, and dangerous capabilities evaluations, and what each of those projects involves.
    • Why he isn’t inherently worried about a future where we’re surrounded by beings far more capable than us, so long as they share our goals to a reasonable degree.
    • Why it’s not enough for humanity to know how to align AI models — it’s essential that management at AI labs correctly pick which methods they’re going to use and have the practical know-how to apply them properly.
    • Three observations that make him a little more optimistic: humans are a bit muddle-headed and not super goal-orientated; planes don’t crash; and universities have specific majors in particular subjects.
    • Plenty more besides.

    Get this episode by subscribing to our podcast on the world’s most pressing problems and how to solve them: type ‘80,000 Hours’ into your podcasting app. Or read the transcript below.

    Producer: Keiran Harris
    Audio mastering: Milo McGuire, Dominic Armstrong, and Ben Cordell
    Transcriptions: Katy Moore

    Continue reading →

    Information security in high-impact areas

    As the 2016 US presidential campaign was entering a fractious round of primaries, Hillary Clinton’s campaign chair, John Podesta, opened a disturbing email. The March 19 message warned that his Gmail password had been compromised and that he urgently needed to change it.

    The email was a lie. It wasn’t trying to help him protect his account — it was a phishing attack trying to gain illicit access.

    Podesta was suspicious, but the campaign’s IT team erroneously wrote the email was “legitimate” and told him to change his password. The IT team provided a safe link for Podesta to use, but it seems he or one of his staffers instead clicked the link in the forged email. That link was used by Russian intelligence hackers known as “Fancy Bear,” and they used their access to leak private campaign emails for public consumption in the final weeks of the 2016 race, embarrassing the Clinton team.

    While there are plausibly many critical factors in any close election, it’s possible that the controversy around the leaked emails played a non-trivial role in Clinton’s subsequent loss to Donald Trump. This would mean the failure of the campaign’s security team to prevent the hack — which might have come down to a mere typo — was extraordinarily consequential.

    These events vividly illustrate how careers in infosecurity at key organisations have the potential for outsized impact. Ideally, security professionals can develop robust practices that reduce the likelihood that a single slip-up will result in a significant breach.

    Continue reading →

    Practical steps to take now that AI risk is mainstream

    AI risk has gone mainstream. So what’s next?

    Last Tuesday’s statement on AI risk has hit headlines across the world. Hundreds of leading AI scientists and other prominent figures — including the CEOs of OpenAI, Anthropic and Google DeepMind — signed the one-sentence statement:

    Mitigating the risk of extinction from AI should be a global priority alongside other societal-scale risks such as pandemics and nuclear war.

    This mainstreaming of concerns about the risk of extinction from AI represents a substantial shift to the strategic landscape — and should, as a result, have implications on how best to reduce the risk.

    How has the landscape shifted?

    Pictures from the White House Press Briefing. Meme from @kristjanmoore. The relevant video is here.

    So far, I think the most significant effect of the changes in the way these risks are viewed can be seen in changes in political activity.

    World leaders — including Joe Biden, Rishi Sunak, Emmanuel Macron — have all met leaders in AI in the last few months. AI regulation was a key topic of discussion at the G7. And now it’s been announced that Biden and Sunak will discuss extinction risks from AI as part of talks in DC next week.

    At the moment, it’s extremely unclear where this discussion will go.

    Continue reading →

      #153 – Elie Hassenfeld on two big picture critiques of GiveWell's approach, and six lessons from their recent work

      GiveWell is one of the world’s best-known charity evaluators, with the goal of “searching for the charities that save or improve lives the most per dollar.” It mostly recommends projects that help the world’s poorest people avoid easily prevented diseases, like intestinal worms or vitamin A deficiency.

      But should GiveWell, as some critics argue, take a totally different approach to its search, focusing instead on directly increasing subjective wellbeing, or alternatively, raising economic growth?

      Today’s guest — cofounder and CEO of GiveWell, Elie Hassenfeld — is proud of how much GiveWell has grown in the last five years. Its ‘money moved’ has quadrupled to around $600 million a year.

      Its research team has also more than doubled, enabling them to investigate a far broader range of interventions that could plausibly help people an enormous amount for each dollar spent. That work has led GiveWell to support dozens of new organisations, such as Kangaroo Mother Care, MiracleFeet, and Dispensers for Safe Water.

      But some other researchers focused on figuring out the best ways to help the world’s poorest people say GiveWell shouldn’t just do more of the same thing, but rather ought to look at the problem differently.

      Currently, GiveWell uses a range of metrics to track the impact of the organisations it considers recommending — such as ‘lives saved,’ ‘household incomes doubled,’ and for health improvements, the ‘quality-adjusted life year.’ To compare across opportunities, it then needs some way of weighing these different types of benefits up against one another. This requires estimating so-called “moral weights,” which Elie agrees is far from the most mature part of the project.

      The Happier Lives Institute (HLI) has argued that instead, GiveWell should try to cash out the impact of all interventions in terms of improvements in subjective wellbeing. According to HLI, it’s improvements in wellbeing and reductions in suffering that are the true ultimate goal of all projects, and if you quantify everyone on this same scale, using some measure like the wellbeing-adjusted life year (WELLBY), you have an easier time comparing them.

      This philosophy has led HLI to be more sceptical of interventions that have been demonstrated to improve health, but whose impact on wellbeing has not been measured, and to give a high priority to improving lives relative to extending them.

      An alternative high-level critique is that really all that matters in the long run is getting the economies of poor countries to grow. According to this line of argument, hundreds of millions fewer people live in poverty in China today than 50 years ago, but is that because of the delivery of basic health treatments? Maybe a little), but mostly not.

      Rather, it’s because changes in economic policy and governance in China allowed it to experience a 10% rate of economic growth for several decades. That led to much higher individual incomes and meant the country could easily afford all the basic health treatments GiveWell might otherwise want to fund, and much more besides.

      On this view, GiveWell should focus on figuring out what causes some countries to experience explosive economic growth while others fail to, or even go backwards. Even modest improvements in the chances of such a ‘growth miracle’ will likely offer a bigger bang-for-buck than funding the incremental delivery of deworming tablets or vitamin A supplements, or anything else.

      Elie sees where both of these critiques are coming from, and notes that they’ve influenced GiveWell’s work in some ways. But as he explains, he thinks they underestimate the practical difficulty of successfully pulling off either approach and finding better opportunities than what GiveWell funds today.

      In today’s in-depth conversation, Elie and host Rob Wiblin cover the above, as well as:

      • The research that caused GiveWell to flip from not recommending chlorine dispensers as an intervention for safe drinking water to spending tens of millions of dollars on them.
      • What transferable lessons GiveWell learned from investigating different kinds of interventions, like providing medical expertise to hospitals in very poor countries to help them improve their practices.
      • Why the best treatment for premature babies in low-resource settings may involve less rather than more medicine.
      • The high prevalence of severe malnourishment among children and what can be done about it.
      • How to deal with hidden and non-obvious costs of a programme, like taking up a hospital room that might otherwise have been used for something else.
      • Some cheap early treatments that can prevent kids from developing lifelong disabilities, which GiveWell funds.
      • The various roles GiveWell is currently hiring for, and what’s distinctive about their organisational culture.

      Get this episode by subscribing to our podcast on the world’s most pressing problems and how to solve them: type ‘80,000 Hours’ into your podcasting app. Or read the transcript below.

      Producer: Keiran Harris
      Audio mastering: Simon Monsour and Ben Cordell
      Transcriptions: Katy Moore

      Continue reading →

      The public is more concerned about AI causing extinction than we thought

      What does the public think about risks of human extinction?

      We care a lot about reducing extinction risks and think doing so is one of the best ways you can have a positive impact with your career. But even before considering career impact, it can be natural to worry about these risks — and as it turns out, many people do!

      In April 2023, the US firm YouGov polled 1,000 American adults on how worried they were about nine different potential extinction threats. It found the following percentages of respondents were either “concerned” or “very concerned” about extinction from each threat:

      We’re particularly interested in this poll now because we have recently updated our page on the world’s most pressing problems, which includes several of these extinction risks at the top.

      Knowing how the public feels about these kinds of threats can impact how we communicate about them.

      For example, if we take the results at face value, 46% of the poll’s respondents are concerned about human extinction caused by artificial intelligence. Maybe this surprisingly high figure means we don’t need to worry as much as we have over the last 10 years about sounding like ‘sci fi’ when we talk about existential risks from AI, since it’s quickly becoming a common concern!

      How does our view of the world’s most pressing problems compare?

      Continue reading →

      Give feedback on the new 80,000 Hours career guide

      We’ve spent the last few months updating 80,000 Hours’ career guide (which we previously released in 2017 and which you’ve been able to get as a physical book). This week, we’ve put our new career guide live on our website. Before we formally launch and promote the guide — and republish the book — we’d like to gather feedback from our readers!

      How can you help?

      First, take a look at the new career guide.

      Note that our target audience for this career guide is approximately the ~100k young adults most likely to have high-impact careers, in the English-speaking world. Many of them may not yet be familiar with many of the ideas that are widely discussed in the effective altruism community. Also, this guide is primarily aimed at people aged 18–24.

      When you’re ready, there’s a simple form to fill in:

      Give feedback

      Thank you so much!

      Extra context: why are we making this change?

      In 2018, we deprioritised 80,000 Hours’ career guide in favour of our key ideas series.

      Our key ideas series had a more serious tone, and was more focused on impact. It represented our best and most up-to-date advice. We expected that this switch would reduce engagement time on our site, but that the key ideas series would better appeal to people more likely to change their careers to do good.

      Continue reading →