#162 – Mustafa Suleyman on getting Washington and Silicon Valley to tame AI

By Robert Wiblin and Keiran Harris · Published September 1st, 2023 ·

Mustafa Suleyman was part of the trio that founded DeepMind, and his new AI project is building one of the world’s largest supercomputers to train a large language model on 10–100x the compute used to train ChatGPT.

But far from the stereotype of the incorrigibly optimistic tech founder, Mustafa is deeply worried about the future, for reasons he lays out in his new book The Coming Wave: Technology, Power, and the 21st Century’s Greatest Dilemma (coauthored with Michael Bhaskar). The future could be really good, but only if we grab the bull by the horns and solve the new problems technology is throwing at us.

On Mustafa’s telling, AI and biotechnology will soon be a huge aid to criminals and terrorists, empowering small groups to cause harm on previously unimaginable scales. Democratic countries have learned to walk a ‘narrow path’ between chaos on the one hand and authoritarianism on the other, avoiding the downsides that come from both extreme openness and extreme closure. AI could easily destabilise that present equilibrium, throwing us off dangerously in either direction. And ultimately, within our lifetimes humans may not need to work to live any more — or indeed, even have the option to do so.

And those are just three of the challenges confronting us. In Mustafa’s view, ‘misaligned’ AI that goes rogue and pursues its own agenda won’t be an issue for the next few years, and it isn’t a problem for the current style of large language models. But he thinks that at some point — in eight, ten, or twelve years — it will become an entirely legitimate concern, and says that we need to be planning ahead.

In The Coming Wave, Mustafa lays out a 10-part agenda for ‘containment’ — that is to say, for limiting the negative and unforeseen consequences of emerging technologies:

Developing an Apollo programme for technical AI safety
Instituting capability audits for AI models
Buying time by exploiting hardware choke points
Getting critics involved in directly engineering AI models
Getting AI labs to be guided by motives other than profit
Radically increasing governments’ understanding of AI and their capabilities to sensibly regulate it
Creating international treaties to prevent proliferation of the most dangerous AI capabilities
Building a self-critical culture in AI labs of openly accepting when the status quo isn’t working
Creating a mass public movement that understands AI and can demand the necessary controls
Not relying too much on delay, but instead seeking to move into a new somewhat-stable equilibria

As Mustafa put it, “AI is a technology with almost every use case imaginable” and that will demand that, in time, we rethink everything.

Rob and Mustafa discuss the above, as well as:

Whether we should be open sourcing AI models
Whether Mustafa’s policy views are consistent with his timelines for transformative AI
How people with very different views on these issues get along at AI labs
The failed efforts (so far) to get a wider range of people involved in these decisions
Whether it’s dangerous for Mustafa’s new company to be training far larger models than GPT-4
Whether we’ll be blown away by AI progress over the next year
What mandatory regulations government should be imposing on AI labs right now
Appropriate priorities for the UK’s upcoming AI safety summit

Get this episode by subscribing to our podcast on the world’s most pressing problems and how to solve them: type ‘80,000 Hours’ into your podcasting app. Or read the transcript.

Producer and editor: Keiran Harris
Audio Engineering Lead: Ben Cordell
Technical editing: Milo McGuire
Transcriptions: Katy Moore

Highlights

How to get sceptics to take safety seriously

Mustafa Suleyman: The first part of the book mentions this idea of “pessimism aversion,” which is something that I’ve experienced my whole career; I’ve always felt like the weirdo in the corner who’s raising the alarm and saying, “Hold on a second, we have to be cautious.” Obviously lots of people listening to this podcast will probably be familiar with that, because we’re all a little bit more fringe. But certainly in Silicon Valley, that kind of thing… I get called a “decel” sometimes, which I actually had to look up. I guess it’s a play on me being an incel, which obviously I’m not, and some kind of decelerationist or Luddite or something — which is obviously also bananas, given what I’m actually doing with my company.
Rob Wiblin: It’s an extraordinary accusation.
Mustafa Suleyman: It’s funny, isn’t it? So people have this fear, particularly in the US, of pessimistic outlooks. I mean, the number of times people come to me like, “You seem to be quite pessimistic.” No, I just don’t think about things in this simplistic “Are you an optimist or are you a pessimist?” terrible framing. It’s BS. I’m neither. I’m just observing the facts as I see them, and I’m doing my best to share for critical public scrutiny what I see. If I’m wrong, rip it apart and let’s debate it — but let’s not lean into these biases either way.
So in terms of things that I found productive in these conversations: frankly, the national security people are much more sober, and the way to get their head around things is to talk about misuse. They see things in terms of bad actors, non-state actors, threats to the nation-state. In the book, I’ve really tried to frame this as implications for the nation-state and stability — because at one level, whether you’re progressive or otherwise, we care about the ongoing stability of our current order. We really don’t want to live in this Mad Maxian, hyper-libertarian, chaos post-nation-state world.
The nation-state, I think we can all agree that a shackled Leviathan does a good job of putting constraints on the chaotic emergence of bad power, and uses that to do redistribution in a way that keeps peace and prosperity going. So I think that there’s general alignment around that. And if you make clear that this has the potential to be misused, I think that’s effective.
What wasn’t effective, I can tell you, was the obsession with superintelligence. I honestly think that did a seismic distraction — if not disservice — to the actual debate. There were many more practical things. because I think a lot of people who heard that in policy circles just thought, well, this is not for me. This is completely speculative. What do you mean, ‘recursive self-improvement’? What do you mean, ‘AGI superintelligence taking over’?” The number of people who barely have heard the phrase “AGI” but know about paperclips is just unbelievable. Completely nontechnical people would be like, “Yeah, I’ve heard about the paperclip thing. What, you think that’s likely?” Like, “Oh, geez, that is… Stop talking about paperclips!” So I think avoid that side of things: focus on misuse.

Is there a risk that Mustafa's company could speed up the race towards dangerous capabilities?

Rob Wiblin: On that general theme, a recurring question submitted by listeners was along these lines, basically: that you’re clearly alarmed about advances in AI capabilities in the book, and you’re worried that policy is lagging behind. And in the book you propose all kinds of different policies for containment, like auditing and using choke points to slow things down. And you say we need to find ways of, a literal quote: “Finding ways of buying time, slowing down, giving space for more work on the answers.”
But at the same time, your company is building one of the largest supercomputers in the world, and you think over the next 18 months you might do a language model training run that’s 10x or 100x larger than the one that produced GPT-4. Isn’t it possible that your own actions are helping to speed up the race towards dangerous capabilities that you wish were not going on?
Mustafa Suleyman: I don’t think that’s correct for a number of reasons. First, I think the primary threat to the stability of the nation-state is not the existence of these models themselves, or indeed the existence of these models with the capabilities that I mentioned. The primary threat to the nation-state is the proliferation of power. It’s the proliferation of power which is likely to cause catastrophe and chaos. Centralised power has a different threat — which is also equally bad and needs to be taken care of — which is authoritarianism and the misuse of that centralised power, which I care very deeply about. So that’s for sure.
But as we said earlier, I’m not in the AGI intelligence explosion camp that thinks that just by developing models with these capabilities, suddenly it gets out of the box, deceives us, persuades us to go and get access to more resources, gets to inadvertently update its own goals. I think this kind of anthropomorphism is the wrong metaphor. I think it is a distraction. So the training run in itself, I don’t think is dangerous at that scale. I really don’t.
And the second thing to think about is there are these overwhelming incentives which drive the creation of these models: these huge geopolitical incentives, the huge desire to research these things in open source, as we’ve just discussed. So the entire ecosystem of creation defaults to production. Me not participating certainly doesn’t reduce the likelihood that these models get developed. So I think the best thing that we can do is try to develop them and do so safely. And at the moment, when we do need to step back from specific capabilities like the ones I mentioned — recursive self-improvement and autonomy — then I will. And we should.
And the fact that we’re at the table — for example, at the White House recently, signing up to the voluntary commitments, one of seven companies in the US signing up to those commitments — means that we’re able to shape the distribution of outcomes, to put the question of ethics and safety at the forefront in those kinds of discussions. So I think you get to shape the Overton window when it’s available to you, because you’re a participant and a player. And I think that’s true for everybody. I think everybody who is thinking about AI safety and is motivated by these concerns should be trying to operationalise their alignment intentions, their alignment goals. You have to actually make it in practice to prove that it’s possible, I think.

Open sourcing frontier ML models

Mustafa Suleyman: I think I’ve come out quite clearly pointing out the risks of large-scale access. I think I called it “naive open source – in 20 years’ time.” So what that means is if we just continue to open source absolutely everything for every new generation of frontier models, then it’s quite likely that we’re going to see a rapid proliferation of power. These are state-like powers which enable small groups of actors, or maybe even individuals, to have an unprecedented one-to-many impact in the world.
Just as the last wave of social media enabled anybody to have broadcast powers, anybody to essentially function as an entire newspaper from the ’90’s: by the 2000’s, you could have millions of followers on Twitter or Instagram or whatever, and you’re really influencing the world — in a way that was previously the preserve of a publisher, that in most cases was licenced and regulated, that was an authority that could be held accountable if it really did something egregious. And all of that has now kind of fallen away — for good reasons, by the way, and in some cases with bad consequences.
We’re going to see the same trajectory with respect to access to the ability to influence the world. You can think of it as related to my Modern Turing Test that I proposed around artificial capable AI: like machines that go from being evaluated on the basis of what they say — you know, the imitation test of the original Turing test — to evaluating machines on the basis of what they can do. Can they use APIs? How persuasive are they of other humans? Can they interact with other AIs to get them to do things?
So if everybody gets that power, that starts to look like individuals having the power of organisations or even states. I’m talking about models that are two or three or maybe four orders of magnitude on from where we are. And we’re not far away from that. We’re going to be training models that are 1,000x larger than they currently are in the next three years. Even at Inflection, with the compute that we have, will be 100x larger than the current frontier models in the next 18 months.
Although I took a lot of heat on the open source thing, I clearly wasn’t talking about today’s models: I was talking about future generations. And I still think it’s right, and I stand by that — because I think that if we don’t have that conversation, then we end up basically putting massively chaotic destabilising tools in the hands of absolutely everybody. How you do that in practise, somebody referred to it as like trying to catch rainwater or trying to stop rain by catching it in your hands. Which I think is a very good rebuttal; it’s absolutely spot on: of course this is insanely hard. I’m not saying that it’s not difficult. I’m saying that it’s the conversation that we have to be having.

Voluntary vs mandatory commitments for AI labs

Rob Wiblin: In July, Inflection signed on to eight voluntary commitments with the White House, including things like committing to internal and external security testing, investing in cybersecurity and insider threat safeguards, and facilitating third-party discovery and reporting of vulnerabilities. Those are all voluntary, though. What commitments would you like to become legally mandatory for all major AI labs in the US and UK?
Mustafa Suleyman: That is a good question. I think some of those voluntary commitments should become legally mandated.
Number one would be scale audits: What size is your latest model?
Number two: There needs to be a framework for harmful model capabilities, like bioweapons coaching, nuclear weapons, chemical weapons, general bomb-making capabilities. Those things are pretty easy to document, and it just should not be possible to reduce the barriers to entry for people who don’t have specialist knowledge to go off and manufacture those things more easily.
The third one — that I have said publicly and that I care a lot about — is that we should just declare that these models shouldn’t be used for electioneering. They just shouldn’t be part of the political process. You shouldn’t be able to ask Pi who Pi would vote for, or what the difference is between these two candidates. Now, the counterargument is that many people will say that this might be able to provide useful and accurate and valuable information to educate people about elections, et cetera. Look, there is never going to be a perfect solution here: you have to take benefits away in order to avoid harms, and that’s always a tradeoff. You can’t have perfect benefits without any harms. That’s just a tradeoff. I would rather just take it all off the table and say that we —
Rob Wiblin: We can put some of it back later on, once we understand how to do it safely.
Mustafa Suleyman: That’s the best way. That is totally the best way. Now, obviously, a lot of people say that I’m super naive in claiming that this is possible because models like Stable Diffusion and Llama 2 are already out in open source, and people will certainly use that for electioneering. Again, this isn’t trying to resolve every single threat vector to our democracy, it’s just trying to say, at least the large-scale hyperscaler model providers — like Amazon, Microsoft, Google, and others — should just say, “This is against our terms of service.” So you’re just making it a little bit more difficult, and maybe even a little bit more taboo, if you don’t declare that your election materials are human-generated only.

Articles, books, and other media discussed in the show

Mustafa’s work:

The coming wave: Technology, power, and the 21st century’s greatest dilemma — Mustafa’s new book with Michael Bhaskar, published September 2023
The world isn’t ready for the next decade of AI — podcast interview with WIRED
The AI power paradox: Can states learn to govern artificial intelligence—before it’s too late? — in Foreign Affairs (with Ian Bremmer)
My new Turing test would see if AI can make $1 million — in MIT Technology Review

Work at Inflection:

Pi — Inflection‘s personal AI
The precautionary principle: partnering with the White House on AI safety — an Inflection blog post, also discussed in this White House fact sheet
Safety sits at the heart of our mission and culture.
Inflection-1 tech report

Everything else:

Llama 2: Why is Meta releasing open-source AI model and are there any risks? by Dan Milmo
Google cancels AI ethics board in response to outcry by Kelsey Piper
Nvidia tweaks flagship H100 chip for export to China as H800 by Stephen Nellis and Jane Lee
Gallium and germanium: What China’s new move in microchip war means for world by Annabelle Liang and Nick Marsh

Transcript

Table of Contents

1 Cold open [00:00:00]
2 Rob’s intro [00:00:58]
3 The interview begins [00:02:05]
4 Mustafa’s thoughts on timelines for AI capabilities [00:03:47]
5 Open sourcing frontier ML models [00:11:51]
6 The challenge of getting a broader range of people involved in decision making [00:21:43]
7 How to get sceptics to take safety seriously [00:30:18]
8 Internal politics at AI labs [00:36:01]
9 Is there a risk that Mustafa’s company could speed up the race towards dangerous capabilities? [00:39:41]
10 Voluntary vs mandatory commitments for AI labs [00:52:54]
11 The importance of taking misalignment seriously [00:56:47]
12 Rob’s outro [00:58:35]

Cold open [00:00:00]

Mustafa Suleyman: I think I’ve come out quite clearly pointing out the risks of large-scale access. I think I called it “naive open source – in 20 years’ time.” So what that means is if we just continue to open source absolutely everything for every new generation of frontier models, then it’s quite likely that we’re going to see a rapid proliferation of power. These are state-like powers which enable small groups of actors, or maybe even individuals, to have an unprecedented one-to-many impact in the world.

Rob’s intro [00:00:58]

Rob Wiblin: Hey listeners, Rob here, head of research at 80,000 Hours.

I expect you’re going to be hearing a lot about the book we’re discussing today, The Coming Wave: Technology, Power, and the 21st Century’s Greatest Dilemma, which is just coming out this week.

Mustafa Suleyman was a cofounder of DeepMind, and on top of promoting the book he also has a new enormous AI company to run, so we only got an hour with him.

Given the time constraint, I tried to focus on topics where Mustafa’s 10 years at DeepMind might give him a unique perspective — as well as questions related to his new AI company, such as whether it’s making risks from AI worse, and how he hopes it might help.

We also cover:

Whether we’re about to be blown away again by models coming out over the next year
What mandatory regulations of AI companies he wants to see imposed ASAP
And whether it should be legal to open source frontier AI models

Without further ado, I bring you Mustafa Suleyman.

The interview begins [00:02:05]

Rob Wiblin: Today I’m speaking with Mustafa Suleyman.

When young, Mustafa went to Oxford University but dropped out to help start the Muslim Youth Helpline and worked on human rights policy with the Mayor of London.

But in 2010 he helped found one of the world’s top AI labs, DeepMind, with his childhood friend Demis Hassabis. In 2014, DeepMind was acquired by Google and he became head of applied AI at DeepMind, while in 2019 he left DeepMind to take up a policy role in the Google parent company.

In 2022, he left Google to found Inflection AI along with LinkedIn founder Reid Hoffman. Inflection has received over a billion dollars in investment and is working to build one of the largest supercomputers in the world.

It’s focused on building a helpful chatbot that is more personal, offering emotional support and humour more than other alternatives. It should also remember past conversations and gradually gain more context about you and your activities. The hope is that it will grow into being a mix of a therapist, a supportive friend, a business consultant, and an executive assistant. They call this project and service Pi for “personal AI,” and you could try it for yourself at personal.ai if you like.

This month, Mustafa and coauthor Michael Bhaskar are publishing their new book, The Coming Wave: Technology, Power, and the 21st Century’s Greatest Dilemma, which I have to say is an absolutely bracing read. It makes the case that we’re about to enter a period of dizzying technological change — which, on the one hand, could greatly improve the human condition, but on the other, could also lead humanity towards a disaster in any one of a dozen different ways: themes that will be very familiar to regular listeners to the show. Thanks for coming on the podcast, Mustafa.

Mustafa Suleyman: Thanks very much for having me. It’s great to be here. I’ve long been a fan of the podcast and the movement.

Rob Wiblin: Oh, wonderful.

Mustafa’s thoughts on timelines for AI capabilities [00:03:47]

Rob Wiblin: I hope to talk about how you intend for Inflection AI to help solve the problems described in The Coming Wave, and what you would like governments to require of AI labs.

But first, I asked our audience to send in questions for you, and a recurring one was wanting to clarify when you think we’ll have different AI capabilities. It’s a very difficult question, but you’ve been quoted in some articles as saying that it’s plausible that within two years, ML models will be able to autonomously operate an online business and turn $100,000 into $1 million over the period of a couple of months, which is a potentially super-significant threshold that you highlight in the book, and which you call the Modern Turing Test, now that I guess we’ve just completely blasted past the normal old Turing test.

To me, being able to run a business that turns $100,000 into $1 million sounds super impressive. And it’s a very general task that involves engaging in a wide range of different activities and figuring out how to do them in some sensible order and I guess having superhuman performance, at least when it comes to the bottom line. At the same time, there’s another interview where you said you agreed that we might well want to slow down advances in AI capabilities once they were getting close to getting dangerous, but you didn’t foresee that being necessary for maybe another 10 years or so.

Those two views feel in some tension to me and some listeners. Can you clarify what you think about all of that?

Mustafa Suleyman: So I think there’s an important clarification — which is a sort of cheeky addition, but is very significant — which I put in the Modern Turing Test, both in my articles, my tweets, and also in the book, which is that I said it’s plausible that could happen in two years with minor human oversight. So there would be significant steps along that path where the human would have to register a company, manage the bank account; ultimately, there would be a bunch of things that wouldn’t be done completely autonomously.

The individual components could be done autonomously. You could imagine the AI being given a general instruction to make up a product that was likely to be valuable and useful to people, to generate that, to write all the communications required to go off and have that manufactured, to negotiate over the price, to identify a drop shipper, et cetera. All of that could potentially happen.

Rob Wiblin: Yeah. What’s the thing that would potentially make a model dangerous but you think is likely to be lacking in two years’ time, or whenever we have the kind of model that you think has the capability to run a business in that way?

Mustafa Suleyman: Well, I think it’s really important, especially for this audience, to distinguish between the model itself being dangerous and the potential uses of these technologies enabling people who have bad intentions to do serious harm at scale. And they’re really fundamentally different. Because going back to your first question, the reason I said that I don’t see any evidence that we’re on a trajectory where we have to slow down capabilities development because there’s a chance of runaway intelligence explosion, or runaway recursive self-improvement, or some inherent property of the model on a standalone basis having the potential in and of itself to cause mass harm: I still don’t see that, and I stand by a decade timeframe.

I mean, I know that we in the AGI safety community are obsessed with timelines, like the number one discussion whenever I go to any of these is, “So what is your timeline? Has it updated?” Et cetera, et cetera. I remember going to the Winter Intelligence Conference in Oxford in 2011. I think it might have been the first proper convening of an AGI safety kind of conference. At the end, it was a day-long event, and people handed around this scrappy sheet of paper. I think it was one paper that got passed around and everybody hand-wrote their timelines for AGI. And obviously the spectrum was huge. It was kind of a funny test.

So the timelines question obviously is super important. I don’t mean to trivialise it. It is just funny how obsessed we get about it. And I think that we’re actually really bad at making these estimates. So when I say 10 years, what I’m actually saying is I’m not saying 10 years instead of eight or instead of 12: I’m saying it’s a long enough time horizon that I would consider it medium term, and hard for me to predict in anything other than blocks of time, like short, medium or very far out — “very far out” being 20 years plus. But I think it’s medium, and I think that’s in itself a serious risk that I’m giving a nontrivial percentage to some kind of existential threat over a decade.

So I just take it very seriously. I’m not trying to trivialise it or anything.

Rob Wiblin: OK, so maybe the idea is in the short term, over the next couple of years, we need to worry about misuse: a model with human assistance directed to do bad things, that’s an imminent issue. Whereas a model running somewhat out of control and acting more autonomously without human support and against human efforts to control it, that is more something that we might think about in 10 years’ time and beyond. That’s your guess?

Mustafa Suleyman: That’s definitely my take. That is the key distinction between misuse and autonomy. And I think that there are some capabilities which we need to track, because those capabilities increase the likelihood that that 10-year event might be sooner.

For example, if models are designed to have the ability to operate autonomously by default: so as an inherent design requirement, we’re engineering the ability to go off and design its own goals, to learn to use arbitrary tools to make decisions completely independently of human oversight. And then the second capability related to that is obviously recursive self-improvement: if models are designed to update their own code, to retrain themselves, and produce fresh weights as a result of new fine-tuning data or new interaction data of any kind from their environment, be it simulated or real world. These are the kinds of capabilities that should give us pause for thought.

Rob Wiblin: I guess you would know better than me, but my feeling is that quite a lot of people are working on trying to figure out how they can turn these models into autonomous agents that can act with progressively less human oversight. What do you think is going to hold it back that means that we won’t have really useful examples of that for another 10 years?

Mustafa Suleyman: Well, I don’t think we’ll have useful examples of that. I think that we may be working on those capabilities, but they won’t necessarily represent an existential threat. I think what I’m saying is they indicate the beginning of a trajectory towards a greater threat.

And at Inflection, we’re actually not working on either of those capabilities, recursive self-improvement and autonomy. I’ve chosen a product direction which I think can enable us to be extremely successful without needing to work on that. I mean, we’re not an AGI company; we’re not trying to build a superintelligence. We’re trying to build a personal AI. Now, that is going to have very capable AI-like qualities; it is going to learn from human feedback; it is going to synthesise information for you in ways that seem magical and surprising; it’s going to have a lot of access to your personal information.

But I think the quest to build general-purpose learning agents which have the ability to perform well in a wide range of environments, that can operate autonomously, that can formulate their own goals, that can identify new information in environments, new reward signals, and learn to use that as self supervision to update their own weights over time: this is a completely different quality of agent, that is quite different, I think, to a personal AI product.

Open sourcing frontier ML models [00:11:51]

Rob Wiblin: Recently there’s been this big debate over the open sourcing of frontier ML models. Facebook has persisted in publishing the weights for progressively more advanced large language models, despite worries from US lawmakers about how they might be misused and whether it’s so smart to make a habit of handing over strategic technology to American adversaries like China. And while Facebook tuned their model to try to make them less likely to be willing to help with criminal behaviour, once you have the raw weights, it’s really trivial to get rid of any tuning like that so you can kind of get the model to do whatever you want. And once you’ve given out the weights, you’ve kind of ceded any control over that.

What’s your personal take on a model open sourcing?

Mustafa Suleyman: Well, you’ve raised three different things there. I’m not sure I would agree that it’s trivial to remove the fine-tuning and alignment. So that’s one question. Second thing is, what is the logic of denying China access to frontier technologies? What are the consequences of that? What does that mean for global stability and the potential of real conflict? And then third is your question around open source.

On the open source thing: I think I’ve come out quite clearly pointing out the risks of large-scale access. I think I called it “naive open source – in 20 years’ time.” So what that means is if we just continue to open source absolutely everything for every new generation of frontier models, then it’s quite likely that we’re going to see a rapid proliferation of power. These are state-like powers which enable small groups of actors, or maybe even individuals, to have an unprecedented one-to-many impact in the world.

Just as the last wave of social media enabled anybody to have broadcast powers, anybody to essentially function as an entire newspaper from the ’90’s: by the 2000’s, you could have millions of followers on Twitter or Instagram or whatever, and you’re really influencing the world — in a way that was previously the preserve of a publisher, that in most cases was licenced and regulated, that was an authority that could be held accountable if it really did something egregious. And all of that has now kind of fallen away — for good reasons, by the way, and in some cases with bad consequences.

We’re going to see the same trajectory with respect to access to the ability to influence the world. You can think of it as related to my Modern Turing Test that I proposed around artificial capable AI: like machines that go from being evaluated on the basis of what they say — you know, the imitation test of the original Turing test — to evaluating machines on the basis of what they can do. Can they use APIs? How persuasive are they of other humans? Can they interact with other AIs to get them to do things?

So if everybody gets that power, that starts to look like individuals having the power of organisations or even states. I’m talking about models that are two or three or maybe four orders of magnitude on from where we are. And we’re not far away from that. We’re going to be training models that are 1,000x larger than they currently are in the next three years. Even at Inflection, with the compute that we have, will be 100x larger than the current frontier models in the next 18 months.

Although I took a lot of heat on the open source thing, I clearly wasn’t talking about today’s models: I was talking about future generations. And I still think it’s right, and I stand by that — because I think that if we don’t have that conversation, then we end up basically putting massively chaotic destabilising tools in the hands of absolutely everybody. How you do that in practise, somebody referred to it as like trying to catch rainwater or trying to stop rain by catching it in your hands. Which I think is a very good rebuttal; it’s absolutely spot on: of course this is insanely hard. I’m not saying that it’s not difficult. I’m saying that it’s the conversation that we have to be having.

Rob Wiblin: Yeah. How difficult is it in practice to remove the fine-tuning? I guess I was overstating it when I said it was trivial. I suppose it requires a bunch of technical chops, and you have to do a bunch of reinforcement learning from human feedback to kind of undo the constraints that have been put on the models. Is that the picture?

Mustafa Suleyman: Correct. Yeah, I think it’s pretty hard. It’s certainly not trivial, and I think it requires significant expertise.

And I think the other thing to think about is, one of the examples that we all have surfaced to the authorities is to do with bioweapons and chemical weapons development. This clearly lowers the barrier to entry to being able to develop a potentially dangerous synthetic compound of some type, maybe a weapon or maybe a pathogen or something like this. That’s for sure true. It can act as a coach trying to nudge you along your path as you actually put this together: where to get the tools from when you run into technical challenges in the lab, and so on.

I think it’s for sure possible to remove that content both from pretraining, to align it out, et cetera, and really lower the risk of people being able to do that. And I think it will be hard to re-expose those capabilities in models after the fact, even from open source. I’m not 100% sure on that, but I think it’s going to be pretty hard. It definitely makes it much harder than just leaving it in there.

I think the second thing to think about is this knowledge and expertise is already available all over the web. So with bad actors, all we’re trying to do is just make it as hard as we possibly can. You can’t completely eliminate the risk, so at some point you have to ask yourself, “What is the new risk that we have exposed by making a model like this available, that isn’t already a risk that we are exposed to?” — given the accessibility of this information on the open web, which is clearly there. And I think that for open sourcing Llama 2, I personally don’t see that we’ve increased the existential risk to the world or any catastrophic harm to the world in a material way whatsoever. I think it’s actually good that they’re out there.

Rob Wiblin: Interesting. Yeah, I agree with that. I think that the current models, the bad outcomes would mostly be that they could be a nuisance in some way. They could help with scamming people or something. But I’m concerned about having this precedent where people just say we have to open source everything. Then where does that leave us in five years or 10 years or 15 years or 20 years? They’re going to just keep getting more powerful. And currently it’s not really any help with designing a bioweapon, but in 10 years’ time or 15 years’ time, it might be able to make a really substantial difference. And yeah, I’m just not sure. I feel like we have to start putting some restrictions on open sourcing now, basically in anticipation of that.

Mustafa Suleyman: I think that’s totally correct. And I think that, unfortunately, there’s a lot of extreme anger in the open source community to that — which I can completely understand, because it’s affordable and anyone can experiment with it, it has been the engine of progress in the past. And it’s also not great coming from someone like me that has raised lots and lots of money and that has the opportunity to do this. So I totally appreciate the seemingly basically contradictory position, where I’m just reinforcing my own success kind of thing. So I totally accept that. Unfortunately, I still believe I’m right. So I’m genuinely, sincerely committed to the right thing, even though it is a bit of a total conflict of interest. So hopefully other people can make this argument too.

Rob Wiblin: Yeah. Well, I don’t run an AI company, and I think you’re right.

Mustafa Suleyman: Thank you.

Rob Wiblin: Maybe that’ll make it more credible. What about the geopolitical angle? Do you think that it’s possible that national security folks who think that AI is very strategic are going to take an interest in reducing open sourcing because they see it as handing over a strategically important technology to other countries?

Mustafa Suleyman: I think that’s going to be quite likely. From all of my conversations over the last couple of years in the US, and to some extent also in the UK, I can see that we’ve shifted from seeing China as a strategic adversary — which is a phrase that implies that we can get along with one another, but it will be a little bit of jostling and we’ll be competitors — to seeing it as a fundamental threat.

So the export controls from last year were really a declaration of economic war. We can haggle over whether the H800 enables them to do just about as much training compute as we can with an H100. I think in practice it probably slows them down by 30% to 50% because the H800 can still be daisy-chained together like the rest of the chips, so you can really just buy 50% more of them — which I think is what a number of these companies have done. And I don’t see them being held back a great deal by this.

However, they are going to be flatly denied access to the next generation. Hopper-Next is going to be completely nuts. I mean, it is a really, really powerful chip, so I think that’s where people should focus their attention. It is a really significant block on their progress and it’s very difficult for them to catch up by building other chips from scratch and so on. So yeah, it’s really going to slow them down, and it’s not going to go unpunished. I mean, they’ve already hit back with sanctions of their own on some of the raw materials [like Micron Technology], so I expect to see more of that.

The challenge of getting a broader range of people involved in decision making [00:21:43]

Rob Wiblin: Yeah. While you were involved with DeepMind and Google, you tried to get a broader range of people involved in decision making on AI, at least inasmuch as it affected broader society. But in the book you describe how those efforts more or less came to naught. How high a priority is solving that problem relative to the other challenges that you talk about in the book?

Mustafa Suleyman: It’s a good question. I honestly spent a huge amount of my time over the 10 years that I was at DeepMind trying to put more external oversight as a core function of governance in the way that we build these technologies. And it was a pretty painful exercise. Naturally, power doesn’t want that. And although I think Google is sort of well-intentioned, it still functions as a kind of traditional bureaucracy.

Unfortunately, when we set up the Google ethics board, it was really in a climate when cancel culture was at its absolute peak. And our view was that we would basically have these nine independent members that, although they didn’t have legal powers to block a technology or to investigate beyond their scope, and they were dependent on what we, as Google DeepMind, showed them, it still was a significant step to providing external oversight on sensitive technologies that we were developing.

But I think some people on Twitter and elsewhere felt that because we had appointed a conservative, the president of the Heritage Foundation, and she had made some transphobic and homophobic remarks in the past, quite serious ones, that meant that she should be cancelled, and she should be withdrawn from the board. And so within a few days of announcing it, people started campaigning on university campuses to force other people to step down from the board, because their presence on the board was complicit and implied that they condoned her views and stuff like this.

And I just think that was a complete travesty, and really upsetting because we’d spent two years trying to get this board going, and it was a first step towards real outside scrutiny over very sensitive technologies that were being developed. And unfortunately, it all ended within a week, as three members of the nine stood down, and then eventually she stood down, and then we lost half the board in a week and it was just completely untenable. And then the company turned around and were like, “Why are we messing around with this? This is a waste of time.”

Rob Wiblin: “What a pain in the butt.”

Mustafa Suleyman: “Why would we bother? What a pain in the ass.”

Rob Wiblin: It was a very striking story to me, reading that in the book. People complain, I think correctly, that decisions of enormous global importance, historical importance, are potentially going to be made inside these AI labs — and the kinds of people who work at these labs are a very small fraction of the people in the world in terms of their political views, the values that they have, the things that they’ve studied, the kind of information that they happen to know. So it would be good if we could get a wider cross-section of the human population involved in scrutinising or having some input on these.

But to share power with the general public in the Global South, or even just outside big cities in the US or UK, will inevitably involve giving influence to people with views that I imagine are very offensive to Google staff — probably more offensive than Kay Coles James, who had to resign from the board for having more conservative, traditional views on gender. So based on that experience, it just seems like it’s very unlikely to happen, when we might just be flat out trying to get acceptance for having more input from a broader cross-section of educated people in the UK into Google. Like, that’s going to be the most that people will tolerate.

Mustafa Suleyman: Yeah, totally. This is part of the problem, right? I mean, 40% of people in the US believe that trans rights are moving too quickly; 30% believe that abortion should be made illegal; 30-odd% are against gay marriage.

Rob Wiblin: And then think globally, right?

Mustafa Suleyman: Right, exactly, then think globally. So I think we have to just learn to sit down with people who we fundamentally disagree with. That goes for China and the Taliban and all these people who hold these views — because if we can’t do that, then we’ve really got no chance of actually hearing one another out and changing one another’s views and making progress. I think in the last two or three years, I feel like we’ve really taken a few backward steps in that direction, and it’s super problematic because it just demonises the other and then we just end up hating on one another.

And it’s very frustrating for us, because we put in a huge amount of effort to make that happen. Before that, when we were acquired, we made it a condition of the acquisition that we have an ethics and safety board. That in itself was a first step towards this kind of broader public effort. Then after the ethics and safety board, we actually tried to spin DeepMind out as a global interest company: one that was legally governed by the requirement to consider all of the stakeholders when making decisions. So it was a company limited by guarantee. And then the charter definition had an ethics and safety mission for AGI development; we actually had the ability to spend vast amounts of our income on scientific and social mission.

So it was a really creative and experimental structure. But when Alphabet saw what happened with that board, they basically just got cold feet. That was the bottom line. They saw what happened there and they were just like, “This is totally nuts. The same thing’s going to happen for your global interest company. Why do that?” And then eventually we pulled DeepMind into Google, and in a way DeepMind was never independent — and isn’t independent now; obviously now it’s completely part of Google.

Rob Wiblin: Yeah. The core of trying to address representativeness is that you will be ceding power to people who don’t share your values, and if people are not willing to make that compromise, then it’s not going to happen.

What do you think might be an incremental step that is realistic for some of these labs to get more input from broader society?

Mustafa Suleyman: Well, I’m really stuck. I think it’s really hard. There is another direction which involves the academic groups getting more access, and actually doing red teaming or audits of scale or audits of model capabilities: they’re the three proposals that I’ve heard made, and I’ve been very supportive of, and have certainly explored with people at Stanford and elsewhere.

But I think there’s a real problem there, which is: If you take the average PhD student or postdoctoral researcher that might work on this in a couple of years, they may well go to a commercial lab, right? And so if we’re to give them access, then they’ll probably take that knowledge and expertise elsewhere, potentially to a competitor — it’s an open labour market, after all. So that isn’t really a sustainable way of doing it.

I met with Jen Easterly a few weeks ago, who runs the US cybersecurity agencies, and she and I were talking about maybe using more traditional pentesting consultants to do red teaming, because they have a commercial incentive in keeping the information top secret, they’ve been cleared for a long time, they’re trusted — but at the same time, they can make independent public statements about compliance with various standards or not. And I kind of prefer that direction in a way, because they’re clearly incentivised not to leak that information, and it’s a more commercial outfit.

We did have a pretty cool hybrid thing going at DeepMind for a while with Toby Ord. But I think Toby is an exceptional individual. I mean, he’s clearly been an EA since before EAs were a thing, and we’ve let him come and visit DeepMind when I was there, at least pretty much every week for years. But he’s not an engineer. He’s committed his life to being an EA monk. And also, I’m not even sure how much impact that actually has in practice. I mean, I think he’s a good person to have around, but I don’t think that’s a practical oversight mechanism.

Rob Wiblin: We need more than that. More than one guy.

Mustafa Suleyman: That’s what I’m saying. Totally. Yeah, exactly.

How to get sceptics to take safety seriously [00:30:18]

Rob Wiblin: Another experience that you share in The Coming Wave is trying to sound the alarm about the likely social effects of AI to your colleagues many years ago and being met with disinterest and blank stares. And some people in the tech industry, probably a shrinking number now, still seem to have the attitude that everything is just very likely to be fine and all we have to do is advance everything as quickly as possible. And the central theme of your book is just that it might not be as simple as that.

Over the years, have you found any arguments that are persuasive and get sceptics in your industry to kind of sit up straight and take seriously the idea that we’re closer to doing a tightrope walking act rather than just racing down a straight runway?

Mustafa Suleyman: That is a great question: strategies for persuading people to care more about this issue. The first part of the book mentions this idea of “pessimism aversion,” which is something that I’ve experienced my whole career; I’ve always felt like the weirdo in the corner who’s raising the alarm and saying, “Hold on a second, we have to be cautious.” Obviously lots of people listening to this podcast will probably be familiar with that, because we’re all a little bit more fringe. But certainly in Silicon Valley, that kind of thing… I get called a “decel” sometimes, which I actually had to look up. I guess it’s a play on me being an incel, which obviously I’m not, and some kind of decelerationist or Luddite or something — which is obviously also bananas, given what I’m actually doing with my company.

Rob Wiblin: It’s an extraordinary accusation.

Mustafa Suleyman: It’s funny, isn’t it? So people have this fear, particularly in the US, of pessimistic outlooks. I mean, the number of times people come to me like, “You seem to be quite pessimistic.” No, I just don’t think about things in this simplistic “Are you an optimist or are you a pessimist?” terrible framing. It’s BS. I’m neither. I’m just observing the facts as I see them, and I’m doing my best to share for critical public scrutiny what I see. If I’m wrong, rip it apart and let’s debate it — but let’s not lean into these biases either way.

So in terms of things that I found productive in these conversations: frankly, the national security people are much more sober, and the way to get their head around things is to talk about misuse. They see things in terms of bad actors, non-state actors, threats to the nation-state. In the book, I’ve really tried to frame this as implications for the nation-state and stability — because at one level, whether you’re progressive or otherwise, we care about the ongoing stability of our current order. We really don’t want to live in this Mad Maxian, hyper-libertarian, chaos post-nation-state world.

The nation-state, I think we can all agree that a shackled Leviathan does a good job of putting constraints on the chaotic emergence of bad power, and uses that to do redistribution in a way that keeps peace and prosperity going. So I think that there’s general alignment around that. And if you make clear that this has the potential to be misused, I think that’s effective.

What wasn’t effective, I can tell you, was the obsession with superintelligence. I honestly think that did a seismic distraction — if not disservice — to the actual debate. There were many more practical things. because I think a lot of people who heard that in policy circles just thought, well, this is not for me. This is completely speculative. What do you mean, ‘recursive self-improvement’? What do you mean, ‘AGI superintelligence taking over’?” The number of people who barely have heard the phrase “AGI” but know about paperclips is just unbelievable. Completely nontechnical people would be like, “Yeah, I’ve heard about the paperclip thing. What, you think that’s likely?” Like, “Oh, geez, that is… Stop talking about paperclips!” So I think avoid that side of things: focus on misuse.

Rob Wiblin: Yeah, I suppose it seems like that the paperclip thing is now more in the Overton window, or the superintelligence is. You get fewer people thinking that’s crazy today, having seen the advances in the last year. Just say, “Imagine that we saw the progress of the last year, but it happened another 10 times for the next 10 years.”

Mustafa Suleyman: 100%. Yeah, that’s the crazy thing. I certainly agree with that. The last year has been pretty crazy.

Rob Wiblin: In 2010, it was an idea ahead of its time, potentially. Maybe that was the thing to worry about happening in 2040 or 2050, but it was just causing too many people to bounce because it wasn’t clear what to do.

Mustafa Suleyman: Yeah. Frankly, I would say even in 2015, 2018, 2020, I think it was premature for those kinds of things, and isolated people that we wanted to get on our side.

Obviously now it’s kind of easy. I mean, that’s the easiest way to demonstrate and persuade people that this is important: make things available in open source, have a bunch of people play with it, identify the actual limitations and potential capabilities of these models in practice. And then we can all have a rational, sane debate about real things, rather than theoretical frameworks. I think we’re actually in a really good place on that front. In terms of AGI safety, I think it’s never been more well understood. I feel great relief at this point. I’m like, “Amazing. The cat’s out of the bag. Everybody can make their own mind up.” This tiny group of us don’t actually have to make the theoretical case or speculate, because everyone can just, as you say, 10x or 100x what they see in their favourite chatbot and take it from there.

Internal politics at AI labs [00:36:01]

Rob Wiblin: Yeah. From your many years in the industry, do you understand the internal politics of AI labs that have staff who range all the way from being incredibly worried about AI advances to people who just think that there’s no problem at all, and just want everything to go as quickly as possible? I would have, as an outsider, expected that these groups would end up in conflict over strategy pretty often. But at least from my vantage point, I haven’t heard about that happening very much. Things seem to run remarkably smoothly.

Mustafa Suleyman: Yeah. I don’t know. I think the general view of people who really care about AI safety inside labs — like myself, and others at OpenAI, and to a large extent DeepMind too — is that the only way that you can really make progress on safety is that you actually have to be building it. Unless you are at the coalface, really experimenting with the latest capabilities, and you have resources to actually try to mitigate some of the harms that you see arising in those capabilities, then you’re always going to be playing catchup by a couple of years.

I’m pretty confident that open source is going to consistently stay two to three years behind the frontier for quite a while, at least the next five years. I mean, at some point, there really will be mega multibillion-dollar training runs, but I actually think we’re farther away from that than people realise. I think people’s math is often wrong on these things.

Rob Wiblin: Can you explain that?

Mustafa Suleyman: People talk about us getting to a $10 billion training run. That math does not add up. We’re not getting to a single training run that costs $10 billion. I mean, that is many years away, five years away, at least.

Rob Wiblin: Interesting. Is it maybe that they’re thinking that it’ll have the equivalent compute of $10 billion in 2022 chips or something like that? Is maybe that where the confusion is coming in, that they’re thinking about it in terms of the compute increase? Because they may be thinking there’s going to be a training run that involves 100 times as much compute, but by the time that happens, it doesn’t cost anywhere near 100 times as much money.

Mustafa Suleyman: Well, partly it’s that. It could well be that, but then it’s not going to be 10x less: it’ll be 2-3x less, because each new generation of chip roughly gives you 2-3x more FLOPS per dollar. But yeah, I’ve heard that number bandied around, and I can’t figure out how you squeeze $10 billion worth of training into six months, unless you’re going to train for three years or something.

Rob Wiblin: That’s unlikely.

Mustafa Suleyman: Yeah, it’s pretty unlikely. But in any case, I think it is super interesting that open source is so close. And it’s not just open source as a result of open sourcing frontier models like Llama 2 or Falcon or these things. It is more interesting, actually, that these models are going to get smaller and more efficient to train. So if you consider that GPT-3 was 175 billion parameters in the summer of 2020, that was like three years ago, and people are now training GPT-3-like capabilities at 1.5 billion parameters or 2 billion parameters. Which still may cost a fair amount to train, because the total training compute doesn’t go down hugely, but certainly the serving compute goes down a lot and therefore many more people can use those models more cheaply, and therefore experiment with them. And I think that trajectory, to me, feels like it’s going to continue for at least the next three to five years.

Rob Wiblin: Yeah. And the broader point was that even people who are concerned feel like they need to be at the frontier in order to be understanding these models better and figuring out how to make them safer. So that’s the reason why everyone is able to get along, potentially, because they all have an intermediate goal in common.

Mustafa Suleyman: That’s exactly right.

Is there a risk that Mustafa’s company could speed up the race towards dangerous capabilities? [00:39:41]

Rob Wiblin: On that general theme, a recurring question submitted by listeners was along these lines, basically: that you’re clearly alarmed about advances in AI capabilities in the book, and you’re worried that policy is lagging behind. And in the book you propose all kinds of different policies for containment, like auditing and using choke points to slow things down. And you say we need to find ways of, a literal quote: “Finding ways of buying time, slowing down, giving space for more work on the answers.”

But at the same time, your company is building one of the largest supercomputers in the world, and you think over the next 18 months you might do a language model training run that’s 10x or 100x larger than the one that produced GPT-4. Isn’t it possible that your own actions are helping to speed up the race towards dangerous capabilities that you wish were not going on?

Mustafa Suleyman: I don’t think that’s correct for a number of reasons. First, I think the primary threat to the stability of the nation-state is not the existence of these models themselves, or indeed the existence of these models with the capabilities that I mentioned. The primary threat to the nation-state is the proliferation of power. It’s the proliferation of power which is likely to cause catastrophe and chaos. Centralised power has a different threat — which is also equally bad and needs to be taken care of — which is authoritarianism and the misuse of that centralised power, which I care very deeply about. So that’s for sure.

But as we said earlier, I’m not in the AGI intelligence explosion camp that thinks that just by developing models with these capabilities, suddenly it gets out of the box, deceives us, persuades us to go and get access to more resources, gets to inadvertently update its own goals. I think this kind of anthropomorphism is the wrong metaphor. I think it is a distraction. So the training run in itself, I don’t think is dangerous at that scale. I really don’t.

And the second thing to think about is there are these overwhelming incentives which drive the creation of these models: these huge geopolitical incentives, the huge desire to research these things in open source, as we’ve just discussed. So the entire ecosystem of creation defaults to production. Me not participating certainly doesn’t reduce the likelihood that these models get developed. So I think the best thing that we can do is try to develop them and do so safely. And at the moment, when we do need to step back from specific capabilities like the ones I mentioned — recursive self-improvement and autonomy — then I will. And we should.

Rob Wiblin: So just to clarify on the misalignment risk, or the risk of the model being dangerous even to train: You think that doing a 100x training run that’s just still producing a chatbot, like a better GPT-4, even though that would be a more impressive model and a more capable model, presumably, it’s not dangerous — because it’s lacking essential components like autonomy and the ability to act in the world. Just producing an extremely good and a much better GPT-4 is not dangerous yet; in order for it to be dangerous, we need to add other capabilities, like it acting in the world and having broader goals. And that’s like five, 10, 15, 20 years away. We don’t know exactly. But for that reason, it’s not dangerous right now.

And then in terms of encouraging other people to do stuff that’s dangerous, like advancing capabilities more quickly than you would like, you think that Inflection AI, even if it’s a big deal in this business ecosystem, the incentives are so clear that everyone already wants to race ahead, such that they’re not watching you and changing their behaviour based on that very much. They’re going to do the thing that they’re going to do, just because they think it’s profitable for them. And if you held back on doing that training run, it wouldn’t shift their behaviour.

Mustafa Suleyman: I think that’s absolutely right. And the fact that we’re at the table — for example, at the White House recently, signing up to the voluntary commitments, one of seven companies in the US signing up to those commitments — means that we’re able to shape the distribution of outcomes, to put the question of ethics and safety at the forefront in those kinds of discussions. So I think you get to shape the Overton window when it’s available to you, because you’re a participant and a player. And I think that’s true for everybody. I think everybody who is thinking about AI safety and is motivated by these concerns should be trying to operationalise their alignment intentions, their alignment goals. You have to actually make it in practice to prove that it’s possible, I think.

If people have an opportunity to play with Pi — Pi is our AI, our personal AI; it stands for “personal intelligence”; you can find it at pi.ai on the web and on the App Store — you’ll see that we’ve aligned it in very specific ways. It isn’t susceptible to any of the jailbreaks or prompt hacks, any of them. If anybody gets one, send it to me on Twitter. We don’t suffer any of them. Why is that? Because we’ve made safety and behaviour alignment our number one priority, and we’ve deliberately designed the model not to be general. So it doesn’t generate code; it isn’t designed to be the ultimate general-purpose API that anybody can “prompt”: it’s a personal AI that you can talk to.

Now, that doesn’t mean it doesn’t have any risks. It has other risks, which are, what are the values of this AI? Is it being persuasive? What kinds of conversations are people having with it? So there are other considerations that we have to be attentive to. But starting from a position of building these personal AIs with safety and ethics in mind is actually a core value of our company, and I think shows in the product that we build.

Rob Wiblin: Yeah. So Inflection AI primarily works on developing these hopefully much better chatbots, and it feels a little bit distant from many of the concerns that you lay out in The Coming Wave. Could you elaborate a bit more on what is the vision for how Inflection is going to help tackle those threats that you’re really worried about?

Mustafa Suleyman: Well, most of the threats that I’ve described are actually things that the nation-state has to address. I’ve long been an advocate of regulation. I don’t necessarily think that as a company we can really work on the threat of misinformation. We don’t make our model available as an API for other people to generate new types of content on, so most of the time we’re trying not to contribute to the harms, but we’re also not actively participating in releasing new moderation tools or things like that.

I think OpenAI’s release recently of the GPT-4 for moderation is excellent. Those kinds of things are awesome. I think Anthropic does lots of cool things like that. I think once we’re a little bit bigger and a bit more stable — we’re only 40 people at the moment — we’ll add more people doing those kinds of things. But for us, we’re trying to build a product that consumers absolutely love, that in and of itself is as safe as it can be, and is really useful and helps people and is super supportive and stuff like that. That’s really our goal. And then beyond that, we’re advocating for regulation. I think that’s the way that this is going to really change.

Rob Wiblin: Yeah, I’ll come back to regulation in just a second. But in the book, you talk very positively about the idea of having an Apollo programme for technical AI safety research, that we just need to put a lot more effort into this. Does Inflection intend to develop or apply any particular technical AI alignment methods in order to make its models safer, or to develop those methods more in a way that might scale up to be useful for even more capable models?

Mustafa Suleyman: Yes, definitely. Right now, what you see in Pi today is actually one of our smaller models, and soon it will be much larger. But the methods for alignment are the same ones that we’ve all been using to improve the controllability and performance of these models. So obviously, one of the reasons why we’ve been able to get it to behave so closely to our behaviour policy is that we basically really fine-tune it very aggressively, and we spend a lot of time doing RLHF on it, and using other methods and so on. So we’ve definitely advanced the state of the art there.

We published a tech report not detailing too many insights, but describing the results that we’ve gotten in our pretrained model. And we’ve achieved state-of-the art performance for our compute size: so compared to GPT-3.5 to Google’s PaLM to Claude 1 to Chinchilla, we beat all of those models on most of the public benchmarks, like MMLU and so on, on the pretraining side. And actually, we’ve done similar things, slightly harder to measure and compare, on the fine-tuning and alignment side. So we haven’t published anything on the alignment side, but we’ve achieved similar state-of-the-art progress there. And soon we’ll publish something else on our new pretraining run, which is much larger.

Rob Wiblin: Yeah. Many people, including me, were super blown away by the jump from GPT-3.5 to GPT-4. Do you think people are going to be blown away again in the next year by the leap to these 100x the compute of GPT-4 models?

Mustafa Suleyman: I think that what people forget is that the difference between 3.5 and 4 is 5x. So I guess just because of our human bias, we just assume that this is a tiny increment. It’s not. It’s a huge multiple of total training FLOPS. So the difference between 4 and 4.5 will itself be enormous. I mean, we’re going to be significantly larger than 4 in time as well, once we’re finished with our training run — and it really is much, much better.

The exciting thing is that one of the key emergent capabilities at each new order of magnitude in compute is alignment. You look back at GPT-3, and everyone said, “These models are always going to be racist and toxic and biased and…” Well, it turns out that the larger they get, the better job we can do at aligning them and constraining them and getting them to produce extremely nuanced and precise behaviours. That’s actually a great story, because that’s exactly what we want: we want them to behave as intended, and I think that’s one of the capabilities that emerge as they get bigger.

Rob Wiblin: What do you think of Anthropic’s approach to the arms race issue? I guess they’re doing a kind of middle-ground thing, where they try to kind of lead from second place. So if another lab trains something, then they’ll train it themselves in order to study it. And if another lab releases something publicly, then they’ll do so as well, because they figure the cat’s out of the bag. But they’re reluctant to be the first to train or release anything because they’re worried that could make the competitive race somewhat more fierce. Do you think that’s just kind of a mistake, or do you see where they’re coming from?

Mustafa Suleyman: I don’t think it’s true that they’re not attempting to be the first to train at scale. That’s not true.

Rob Wiblin: Interesting. OK, you don’t buy that. Have I misunderstood what they think that they’re doing?

Mustafa Suleyman: I don’t know. You have to ask them. I like them very much and I have huge respect for them, so I don’t want to say anything bad, if that’s what they’ve said. But also, I think Sam [Altman] said recently they’re not training GPT-5. Come on. I don’t know. I think it’s better that we’re all just straight about it. That’s why we disclose the total amount of compute that we’ve got. And obviously you can work out from that, roughly speaking, what order of magnitude of FLOPS we’re using.

It’s much better that we’re just transparent about it. We’re training models that are bigger than GPT-4, right? We have 6,000 H100s in operation today, training models. By December, we will have 22,000 H100s fully operational. And every month between now and then, we’re adding 1,000 to 2,000 H100s. So people can work out what that enables us to train by spring, by summer of next year, and we’ll continue training larger models. And I think that’s the right way to go about it. Just be super open and transparent. I think Google DeepMind should do the same thing. They should declare how many FLOPS Gemini is trained on.

Rob Wiblin: Yeah, cool. I’ll go back and check what Anthropic says that it’s doing in this respect. And I could add a cut in to clarify in case I’ve gotten the wrong end of the stick.

Rob’s source here was the Vox article ‘The $1 billion gamble to ensure AI doesn’t destroy humanity‘, by Dylan Matthews.

Voluntary vs mandatory commitments for AI labs [00:52:54]

Rob Wiblin: So as you mentioned earlier, in July, Inflection signed on to eight voluntary commitments with the White House, including things like committing to internal and external security testing, investing in cybersecurity and insider threat safeguards, and facilitating third-party discovery and reporting of vulnerabilities. Those are all voluntary, though. What commitments would you like to become legally mandatory for all major AI labs in the US and UK?

Mustafa Suleyman: That is a good question. I think some of those voluntary commitments should become legally mandated.

Number one would be scale audits: What size is your latest model?

Number two: There needs to be a framework for harmful model capabilities, like bioweapons coaching, nuclear weapons, chemical weapons, general bomb-making capabilities. Those things are pretty easy to document, and it just should not be possible to reduce the barriers to entry for people who don’t have specialist knowledge to go off and manufacture those things more easily.

The third one — that I have said publicly and that I care a lot about — is that we should just declare that these models shouldn’t be used for electioneering. They just shouldn’t be part of the political process. You shouldn’t be able to ask Pi who Pi would vote for, or what the difference is between these two candidates. Now, the counterargument is that many people will say that this might be able to provide useful and accurate and valuable information to educate people about elections, et cetera. Look, there is never going to be a perfect solution here: you have to take benefits away in order to avoid harms, and that’s always a tradeoff. You can’t have perfect benefits without any harms. That’s just a tradeoff. I would rather just take it all off the table and say that we —

Rob Wiblin: We can put some of it back later on, once we understand how to do it safely.

Mustafa Suleyman: That’s the best way. That is totally the best way. Now, obviously, a lot of people say that I’m super naive in claiming that this is possible because models like Stable Diffusion and Llama 2 are already out in open source, and people will certainly use that for electioneering. Again, this isn’t trying to resolve every single threat vector to our democracy, it’s just trying to say, at least the large-scale hyperscaler model providers — like Amazon, Microsoft, Google, and others — should just say, “This is against our terms of service.” So you’re just making it a little bit more difficult, and maybe even a little bit more taboo, if you don’t declare that your election materials are human-generated only.

Rob Wiblin: Yeah. If I remember correctly, Twitter just banned election advertising on their platform at some point, and I think that made their life a whole lot easier than trying to filter everything extremely carefully.

Speaking of Twitter, you’ve celebrated the UK government’s upcoming AI safety summit, which is in just a few months. What outcome do you think should be the top priority for the people organising it? Who I think listen to this show, by the way.

Mustafa Suleyman: They’re all good guys. They’re all doing a great job. I’m really glad to see the summit. I think it’s an opportunity to put in place some of the proposals that came out of the voluntary commitments, and maybe put that on a legislative path: audits, collaboration between the companies to share best practices.

That was one thing I didn’t list in the legal requirements thing, because I think that’s a bit complicated, but I think it would be good if there was a culture of sharing vulnerabilities and weaknesses, just zero-day exploits or other cybersecurity bugs get disclosed confidentially to the companies for 60 days or so, until there’s a public exposure of it. So those are the sorts of things that I think would be really helpful.

The importance of taking misalignment seriously [00:56:47]

Rob Wiblin: Yeah, I know you’re on the book tour and have got to go in just a minute, but maybe one final question or theme is: It seems like you don’t want to talk so much about misalignment, or deceptive alignment, or models having their own goals and running out of control. But at the same time, it seems like you agree with me and many other people that that could be an issue in 10 or 15 or 20 years’ time — we don’t know exactly, but that will be an issue at some point if the capabilities keep going up. In a sense, even if you agree that it’s not going to be an issue for 10 years, 10 years is not that far away, and it seems like it could be quite a difficult problem to solve. Isn’t it still quite urgent that people be taking misalignment seriously, and trying to figure out how we will address it when it becomes a problem?

Mustafa Suleyman: Yeah, forgive me, I didn’t mean to trivialise it. I was more talking about the tactics of talking about these things publicly.

Rob Wiblin: I see. Yeah.

Mustafa Suleyman: So yeah, 100%: this is a super critical issue. We need 10x more people focused on misalignment. In general, I’m a bit sensitive to the idea of deception because I think it’s in itself a kind of anthropomorphism, but that’s a technicality. In general, I think that absolutely the fundamental questions of misalignment, and in general AGI safety, and the 10-year risks, and the 20-year risks couldn’t be more important. I think more people should be researching it, and I’m always a big believer in supporting it.

Rob Wiblin: All right. My guest today has been Mustafa Suleyman, and the book is The Coming Wave. I think it’s a book that’s going to make a significant wave in the media over the next month or two. So best of luck with the book tour.

Mustafa Suleyman: Thanks a lot, Rob. I really appreciate it. And thanks for all your work on the podcast. It’s super awesome to see all the attention that these issues get, and I think that is in no small part because of the work that you and the rest of the community do to popularise it. So thank you. It’s a huge service.

Rob Wiblin: Thanks so much.

Rob’s outro [00:58:35]

Rob Wiblin: What a brisk one hour, was really talking fast to get in some extra questions there.

If such a short interview leaves you wanting more from 80,000 Hours, you can find all the new stuff from our research team on 80000hours.org/latest.

Recently they’ve written about:

All right, The 80,000 Hours Podcast is produced and edited by Keiran Harris.

The audio engineering team is led by Ben Cordell, with mastering and technical editing for this episode by Milo McGuire.

Katy Moore puts together full transcripts and an extensive collection of links to learn more — those are available on our site.

Thanks for joining, talk to you again soon.

Learn more

Risks from power-seeking AI systems

What could an AI-caused existential catastrophe actually look like?

Working in US AI policy

The 80,000 Hours Podcast on Artificial Intelligence

Related episodes

June 9, 2023

#154 – Rohin Shah on DeepMind and trying to fairly hear out both AI doomers and doubters

Listen now

August 23, 2023

#161 – Michael Webb on whether AI will soon cause job loss, lower incomes, and higher inequality — or the opposite

Listen now

August 7, 2023

#159 – Jan Leike on OpenAI’s massive push to make superintelligence safe in 4 years or less

Listen now

July 24, 2023

#157 – Ezra Klein on existential risk from AI and what DC could do about it

Listen now

July 31, 2023

#158 – Holden Karnofsky on how AIs might take over even if they’re no smarter than humans, and his 4-part playbook for AI risk

Listen now

July 10, 2023

#156 – Markus Anderljung on how to regulate cutting-edge AI models

Listen now

June 22, 2023

#155 – Lennart Heim on the compute governance era and what has to come after

Listen now

June 3, 2019

#58 – Pushmeet Kohli on DeepMind’s plan to make AI systems robust & reliable, why it’s a core issue in AI design, and how to succeed at AI research

Listen now

About the show

The 80,000 Hours Podcast features unusually in-depth conversations about the world's most pressing problems and how you can use your career to solve them. We invite guests pursuing a wide range of career paths — from academics and activists to entrepreneurs and policymakers — to analyse the case for and against working on different issues and which approaches are best for solving them.

Get in touch with feedback or guest suggestions by emailing [email protected].

Our crash course on transformative AI

We've carefully selected 10 key episodes to help listeners get to grips with the potential upsides and downsides of powerful, transformative AI.

Check out 'The 80,000 Hours Podcast on AI'

Listen here, or anywhere you get podcasts:

If you're new, see the podcast homepage for ideas on where to start, or browse our full episode archive.

On this page:

Highlights

How to get sceptics to take safety seriously

Is there a risk that Mustafa's company could speed up the race towards dangerous capabilities?

Open sourcing frontier ML models

Voluntary vs mandatory commitments for AI labs

Articles, books, and other media discussed in the show

Transcript

Cold open [00:00:00]

Rob’s intro [00:00:58]

The interview begins [00:02:05]

Mustafa’s thoughts on timelines for AI capabilities [00:03:47]

Open sourcing frontier ML models [00:11:51]

The challenge of getting a broader range of people involved in decision making [00:21:43]

How to get sceptics to take safety seriously [00:30:18]

Internal politics at AI labs [00:36:01]

Is there a risk that Mustafa’s company could speed up the race towards dangerous capabilities? [00:39:41]

Voluntary vs mandatory commitments for AI labs [00:52:54]

The importance of taking misalignment seriously [00:56:47]

Rob’s outro [00:58:35]

Learn more

Risks from power-seeking AI systems

What could an AI-caused existential catastrophe actually look like?

Working in US AI policy

The 80,000 Hours Podcast on Artificial Intelligence

Related episodes

About the show

Our crash course on transformative AI