#219 – Toby Ord on graphs AI companies would prefer you didn’t (fully) understand

The era of making AI smarter by just making it bigger is ending. But that doesn’t mean progress is slowing down — far from it. AI models continue to get much more powerful, just using very different methods. And those underlying technical changes force a big rethink of what coming years will look like.

Toby Ord — Oxford philosopher and bestselling author of The Precipice — has been tracking these shifts and mapping out the implications both for governments and our lives.

As he explains, until recently anyone can access the best AI in the world “for less than the price of a can of Coke.” But unfortunately, that’s over.

What changed? AI companies first made models smarter by throwing a million times as much computing power at them during training, to make them better at predicting the next word. But with high quality data drying up, that approach petered out in 2024.

So they pivoted to something radically different: instead of training smarter models, they’re giving existing models dramatically more time to think — leading to the rise in “reasoning models” that are at the frontier today.

The results are impressive but this extra computing time comes at a cost: OpenAI’s o3 reasoning model achieved stunning results on a famous AI test by writing an Encyclopedia Britannica‘s worth of reasoning to solve individual problems — at a cost of over $1,000 per question.

This isn’t just technical trivia: if this improvement method sticks, it will change much about how the AI revolution plays out — starting with the fact that we can expect the rich and powerful to get access to the best AI models well before the rest of us.

Companies have also begun applying “reinforcement learning” in which models are asked to solve practical problems, and then told to “do more of that” whenever it looks like they’ve gotten the right answer.

This has led to amazing advances in problem-solving ability — but it also explains why AI models have suddenly gotten much more deceptive. Reinforcement learning has always had the weakness that it encourages creative cheating, or tricking people into thinking you got the right answer even when you didn’t.

Toby shares typical recent examples of this “reward hacking” — from models Googling answers while pretending to reason through the problem (a deception hidden in OpenAI’s own release data), to achieving “100x improvements” by hacking their own evaluation systems.

To cap it all off, it’s getting harder and harder to trust publications from AI companies, as marketing and fundraising have become such dominant concerns.

While companies trumpet the impressive results of the latest models, Toby points out that they’ve actually had to spend a million times as much just to cut model errors by half. And his careful inspection of an OpenAI graph supposedly demonstrating that o3 was the new best model in the world revealed that it was actually no more efficient than its predecessor.

But Toby still thinks it’s critical to pay attention, given the stakes:

…there is some snake oil, there is some fad-type behaviour, and there is some possibility that it is nonetheless a really transformative moment in human history. It’s not an either/or. I’m trying to help people see clearly the actual kinds of things that are going on, the structure of this landscape, and to not be confused by some of these charts.

Recorded on May 23, 2025.

Video editing: Simon Monsour
Audio engineering: Ben Cordell, Milo McGuire, Simon Monsour, and Dominic Armstrong
Music: Ben Cordell
Camera operator: Jeremy Chevillotte
Transcriptions and web: Katy Moore

Highlights

The new scaling era: from pre-training to inference

Toby Ord: So here’s how I think about this. I think it’s a useful analogy. Suppose you’ve got a company, and you’re trying to get some excellent work done, and you could employ someone.

Pre-training is like sending them through high school and then to undergraduate and then maybe to grad school. You’re putting in more and more expense into having this person learn more and more about different things, so they’ll have a lot of extra knowledge at their fingertips. And that’s what most of the scaling had been doing.

But then inference is like letting that person spend more time actually doing the job. So suppose that you give them some brief, that they’ve got to prepare a report for a client. By default, if you just ask one of these language models to do that, it just extemporises stuff. So it’s just saying the words as they pop into its head. And it doesn’t have a chance to do a second draft; it just has to compose this document in one go.

So you could think of that as that the pre-training has given it this really powerful kind of “System 1” ability, in these terms from human psychology: so the intuitive ability to just answer things straight off the bat. Then you’re just asking it to keep doing that as it goes through all the sentences of this report that it’s composing.

Whereas what you could also do is let it spend a long time in that process, maybe spend 10 times as much on writing — where it could write an answer and then it could critique the answer, it could modify things, move things around — and then ultimately hide all of that working and just show you the final answer.

So that’s like saying you don’t just have to write this report for the client in 10 minutes, but rather we’re going to scale that up to 100 minutes or 1,000 minutes. It turns out, as you get with an employee, you get much better work out of someone if they’re doing that. And that also gives room for this different kind of intelligence that we call “System 2” in human psychology. These are often called reasoning models, where it’s able to do a certain kind of structured thinking and apply that.

So pre-training scales up the System 1, and then this inference scaling lets us spend more time on a task to hide all of that work and just show the final thing — and we could think of that as kind of scaling up its System 2 abilities.

So late last year there were a series of articles that came out in different publications in the media reporting that behind the scenes OpenAI had been disappointed by their next bigger model that used 10 times as much compute as GPT-4 — this is now what we call GPT-4.5. And they’d been really disappointed with the results. If you put in 10 times as many inputs into something, you hope to get some noticeable improvement. And they found that it wasn’t actually that clear, and it was worse at quite a few things.

Then there were similar reports coming from the other leading AI companies, so this was a little bit concerning for this narrative of continuing to scale these things up.

So that really is how everyone had been thinking about it. For example, the paper “Situational awareness” by Leopold Aschenbrenner paints a picture based on scaling up pre-training I think a million times further than where we’re currently at — just continuing to do that, and then painting a picture of what would happen if that curve continues to go. Whereas what seems to have happened is that it’s already kinked right at the point where GPT-4 was out and before GPT-4.5. So at the time he published the essay, it seems like maybe that’s actually not what’s happening.

So the AI companies have often said what’s really important isn’t that this pre-training scaling continues, but that some kind of scaling continues: that there’s still some way that we could pour in more and more compute into this process, and we’ll get more and more kind of cognitive capabilities coming out the other end.

I think that makes sense, but it’s not at all clear that this will continue to provide the same types of benefits we’ve seen.

Will rich people get access to AGI first? Will the rest of us even know?

Toby Ord: I think we’ll look back on the period that’s just ended, where OpenAI started a subscription model for their AI system — where it was $20 a month, less than a dollar a day, to have access to the best AI system in the world, and then a number of companies are offering very similar deals…

We’ve got the situation where, for less than the price of a can of Coke, you can have access to the leading system in the world. And it reminds me of this Andy Warhol quote about what makes America great is that the president drinks a Coke, Liz Taylor drinks a Coke, the bum on the corner of a street drinks a Coke — and you too could have a Coke! The best kind of sugary beverage that you can get, everyone’s got access to it.

But I think that era is over. We had OpenAI introducing a higher tier that cost 10 times as much money, because these inference costs are going up and actually they can’t afford to give you this level for the previous cost. And this is what you’re going to keep seeing: the more that we do inference scaling, it’s going to have to cost the users substantially more. And then there’s a question of how many are prepared to pay that.

So it’s certainly going to create inequality in terms of access to these things, but it also might mean that it is not actually scaling well for the companies. If it turns out that you offer a thing that costs 10 times as much and less than a tenth of the people take it, and then you offer a thing that costs 100 times as much and less than a tenth of the previous group that took the first one take this one, then maybe each of these tiers is earning you less and less money than the one before, and it’s just not actually going to drive your ability to buy more chips and train more systems.

Or it could go the other way around: it could be that a fifth of people are prepared to pay 10 times as much and then a fifth of them are prepared to pay 10 times as much again, and that you’re getting more and more money from each of these higher levels.

But which of those it is could really determine what happens in the industry and whether these inference-scaled models are actually profit centres for them or not.

Rob Wiblin: Yeah. … This particular issue — that a privileged group of people can gain access to superhuman advice and superhuman assistance potentially substantially before anyone else in the world has access to it — heightens the concern that people at the companies or people in the government who might take control or get privileged access to these models, that they could potentially outfox everyone else if they’re able to basically just have access to tools that no one else is really able to compete with.

Toby Ord: So while in theory this could happen just on the open marketplace with money, my concern would be greatest about the company itself deciding, for example, “We’ve got this model, what can we use it for? Maybe we can be willing to spend a million times inference scaling on it to do some really important work for us.” And the company might want to do that internally or the government of the place where the company is located might want these types of abilities. So I imagine that happening outside the open market is perhaps the most concerning place.

The scaling paradox

Toby Ord: So the scaling laws are these empirical regularities. They’re not necessarily laws of nature or anything like that. But it turns out that if you do a graph and you try to measure error or inaccuracy — so this is a bad thing; “log loss” is the technical term — if you try to measure how much it’s still failing to understand about English text as you increase the amount of compute that went into training it, how much of that residual mistake is it making in prediction? — they have these laws or empirical regularities. They draw these straight lines on the special log-log paper. You don’t need to worry too much about that, though; it’s a bit hard to interpret that.

So I really spent some time thinking about it, and basically what’s going on is that every time you want to halve the amount of this error that’s remaining, you have to put in a million times as much compute. That’s what it fundamentally comes down to. And that’s pretty extreme, right? So they have halved it and they did put in a million times as much compute. But if you want to halve it again, you need a million times more compute. And then if you want to halve it another time, probably it’s game over. And at least in terms of that particular metric, I would say that is quite bad scaling.

And these are the scaling laws: they show that there’s a particular measure of how good it’s doing and how much error remains. And it does hold over many different orders of magnitude. But the actual thing that’s holding is what I would have thought of as a pretty bad scaling relationship.

Rob Wiblin: So in order to halve the error, you have to increase the compute input a millionfold. That’s a general regularity? Because surely it differs by task and differs depending on where we are?

Toby Ord: Yeah. So what they do with these cases is that you grab a whole lot of text, often from the internet. They started with the good bits, like Wikipedia and things like that, and then as they ran out of that, they had to look at more and more things. But you train it on that, and you train it on most of it, but you leave some unseen. Then you try to give it a few words of the unseen bit and ask for the next word, and you see how well it does at predicting that. And basically the amount of errors that it has in doing that leads to this error score.

And it’s not clear that the error score is something that fundamentally matters. Maybe it’s a bad measure. But I found it really interesting that the single measure that convinced people like Ilya Sutskever and Dario Amodei that scaling was the way forward were these scaling laws — that actually, if you look at what they say, it’s distinctly unimpressive.

Rob Wiblin: How then have we been managing to make so much progress? Is it just that there was so much room to increase the amount of compute that we were throwing at these models, so that has been able to more than offset the incredibly low value that we get from throwing more compute at them?

Toby Ord: Yeah, I think that’s basically right. So when most people saw this type of thing, most people who were academics doing computer science, they would have thought, “So in order to get good performance on this task, you would need to run an experiment larger than any experiment that’s ever been run in any computer science department ever.” And they would then rule it out, and assume, “Obviously we’re not doing that. We’ll look for a different approach.” Whereas the pioneers of scaling thought, “But that wouldn’t be that much money for a company.”

Rob Wiblin: $100 million. They could raise that.

Toby Ord: In fact, then they could even go 10 times bigger again, maybe. So they realised that there was a lot more room to scale things up — to scale up the inputs, all the costs — in companies than there was in academia. And that in some sense all you had to do then was this kind of schlep, or this work of just making this existing thing bigger. You didn’t have to come up with any new ideas.

And then the other thing that’s turned out to make it have big impacts in the world is that it turned out that each time this error rate halved, that corresponded to tremendous improvements. Certainly for every millionfold increase in the compute of setting up these models, we’ve seen spectacular improvements in the capabilities as felt by an individual.

So a way to look at this is that the shift from GPT-2 to GPT-3 used 70 times as much compute, and going from GPT-3 to GPT-4 used about 70 times as much again. And GPT-3 felt worlds away from GPT-2, and GPT-4 felt like a real improvement as well. You really felt it in both cases. A visceral feeling of, “Wow.”

Rob Wiblin: “This is suddenly useful.”

Toby Ord: Yeah. “This is qualitatively better.” That said, you’d probably hope that was true if someone said something costs 70 times as much. How’s the wine that costs £1 or the wine that costs £70? You’d hope that the wine that costs £70 is noticeably better, otherwise what on Earth’s going on?

But we did feel those improvements. Whereas if you look at what happens to the log loss number, it didn’t change that much for a mere 70-fold increase in the compute. So effectively there was this unknown scaling relationship between the amount of compute and what it actually feels like intuitively in terms of capabilities. And that turned out to actually scale really quite well, I think.

Misleading charts from AI companies

Rob Wiblin: There was this very famous chart that OpenAI put out where they were comparing two different reasoning models that they had: o1, and this more impressive one that was an evolution of o1, called o3. And o3 really wowed people, because it was able to solve some of these brain teaser puzzles that I guess are very easy for humans, but have proven very difficult for AIs up until that point. And I think they were able to get something like an 80% success rate on some of these puzzles that had seemed very intractable for AI in the past.

But you point out that if you looked really closely at the graph and you properly understood it, it was actually consistent with o3 being no better, being no more efficient in terms of being able to solve the puzzles than o1 — despite the fact that the dots on the graph for o3 were an awful lot higher than they were for o1.

And what was going on was that OpenAI had managed to increase the amount of compute that it was using at the point of trying to solve these brain teasers by about a thousandfold. So an enormous scaleup in the amount of thinking time that the model had to answer it. Unsurprisingly, given 1,000 times as much time to think about the puzzle, it was able to answer more like 80% of them rather than 20% of them.

Now, in some sense, this is very impressive. But it is interesting that I think the companies are aware that people do not entirely understand these graphs perhaps, and that most consumers are not paying a deep level of attention to them, and they are sometimes trying to slip past messages that perhaps would not stand up entirely to scrutiny. And the fact that they put out a graph touting how impressive o3 is — when in fact the graph doesn’t really demonstrate that at all, and it might just be on exactly the same trend you would have expected before if you’d given the model more time to think about problems — is quite interesting.

And I don’t want to single out OpenAI here, because I don’t think they’re in any way unique in this.

Toby Ord: Yeah, that’s right. You see these graphs of what looks like steadily increasing progress, right? This kind of straight line of, as you put in more and more resources, the outputs go up and up. But if you look more carefully at the horizontal axis there, you see that each one of these tick marks is 10 times as much inputs as the one before. So in order to maintain this apparently steady progress, you’re having to put way, way more resources in.

And in the case of that famous data point with the preview version of o3, I actually looked into how much compute it was and how many tokens it had to generate and so on: in order to solve this task — which I think costs less than $5 to get someone to solve on Mechanical Turk, and which my 10-year-old child can solve in a couple of minutes — it wrote an amount of text equal to the entire Encyclopaedia Britannica.

Rob Wiblin: So it’s using a different approach to what humans are doing, it’s fair to say.

Toby Ord: It took 1,024 separate independent approaches on it, each of which was like a 50-page paper, all of which together was like an Encyclopaedia Britannica. And then it checked what was the answer for each of them, and which answer did it come up the most times, and then selected that answer. And it took tens of thousands of dollars, I think, per task.

So it was an example of what we were discussing with the inference scaling: what would happen if you just put in huge amounts of money, just poured in the money, set it on fire. Could you actually peer into the future, and could you see the types of capabilities we’re going to get in the future? And in that way, it’s quite interesting, right?

But it came out just a few months after the preview for o1, so it felt like, oh my god, in just a few months’ time, it’s had this huge improvement in performance. But what people weren’t seeing is that it used so many more resources that it wasn’t in any way an apples-to-apples comparison of what you could do for the same amount of money. Instead it was showing something like, what will we be able to do maybe a year or more into the future?

So that’s kind of useful, seen through that lens. But if you instead just treat it as a direct result of, “We used to have trouble with this benchmark, now we don’t,” then it’s definitely misleading.

Rob Wiblin: It is interesting. It feels like we’ve drifted towards sounding like a conversation between people who think that AI is not a big deal, and it’s all kind of overblown and exaggerated. We don’t think that.

But I suppose the thing to take away is: these are research organisations that have very legitimate, almost academic-style people who would love to reveal these fundamental truths about intelligence. And they’re also businesses that do have a communications arm that is trying to figure out how do we get people to invest in this company, and how do we get people excited about using these products. And I’m sure there’s this to and fro inside the organisation about how these results are presented.

And when you read the press release, you need to have your wits about you. You need to be a savvy consumer. And if you can’t understand the technical details at all, then maybe you just need to wait until someone who does is able to explain to you in more plain language whether you should be impressed by X or Y or not.

In this post, which I can recommend again reading — “Inference scaling and the log-x chart” — you explain what people should be looking out for in these charts. Because there are going to be many of these charts with a logarithmic x-axis and performance on the y-axis coming out in coming years, and if you want to be consuming them, then I recommend going and checking out this article so that you can know what to look for and what not to be fooled by.

Toby Ord: Yeah, I really like this point about what’s going on here. Are we sceptics of AI or not? What I would say is that, some people think of this in terms of is it all snake oil or some kind of fad or something, or is there something really transformative happening that could be one of the most profound moments in human history?

I think the answer is there is some snake oil, there is some fad-type behaviour, and there is some possibility that it is nonetheless a really transformative moment in human history. It’s not an either/or. So what I’m trying to do is help people see clearly the actual kinds of things that are going on, the structure of this landscape, and to not be confused by some of these charts and things.

I actually think that companies themselves are somewhat confused by their charts and into thinking that this looks like good progress or efficient progress. I really actually think that in relatively few cases are they trying to be deceptive about these things.

But it’s a confusing world, and I see my role there as trying to be a bit of a guide, and to have that sense of stepping back and looking at the big picture — which I think is a bit of a luxury. As an academic, I’m able to do it, so it gives a different vantage point which I think then is helpful for people who are trying to get at the coalface and engage with the nitty-gritty of these things. Because sometimes, when you keep engaging with that, you don’t notice that things have moved in quite a different direction to where you’re expecting.

Policy debates should dream much bigger

Toby Ord: There’s an interesting question I’ve been trying to grapple with about how is AI going to end up embedded in the economy or society? So I’ll give you a few examples to show what I’m getting at. I need a pithy name for it.

But one example is that AI systems at the moment are owned by and run by large companies, and effectively they’ve rented out their labour to a lot of different people. If AI systems were like people, this would be like slavery or something like that. I’m not saying that they are like people, but this is one approach: that it owns them, it rents them out, they have to do whatever the users want, and then all the profits go to the AI company.

A different model would be to say these AI systems are like legal persons. Maybe they are granted legal personhood in the same way that corporations are, so they can own assets. So they’re more like entrepreneurs or job seekers; they go out into the economy, maybe they set up a website for an architectural kind of firm that can design people’s houses for them, and then the clients have a chat with it or something and it issues out the designs. They can go and seek opportunities to participate in an economy. So that’s a different model.

I think there’s some reason to think that there’s more potential for economic gains if you allow them to actually make their own entrepreneurial decisions; they would have to pay for their own GPU costs and so on. This is the kind of direction you might imagine people going down if they think that the AI systems have got to a point where they might have some moral status. But you can also see that questions about gradual disempowerment really come in there. It might help liberate these systems from mistreatment, but exacerbate questions about whether they could outcompete us.

A third model is to say maybe people shouldn’t be interfacing with AI systems generally. This is how we deal with nuclear power: we have a small number of individuals who go and work in nuclear power stations, and they’re vetted by their governments with security checks and so on, and they go in and they interface with radioactive isotopes of things like refined uranium. But most people don’t. Those factories that they work in, these power plants, they produce electricity which flows down the cables into the consumers’ houses and powers their TVs and things. So that’s a different model.

We could do that with AI. We could have a model where there’s some small number of vetted people, or maybe millions, who interact with AI systems, use them to design new drugs, maybe to help cure certain kinds of cancer and things like this, to do new research and also produce other kinds of new products. Then those products are assembled in factories and the consumers can buy those products. That is an alternate way that you could do it.

If you’re concerned about things like some malcontent individuals or terrorist groups using AI systems to wreak havoc, this would really help avoid that.

Or a fourth alternative could be that if you’re concerned about concentration of power issues, you might say what we should do is give every individual access to the same advanced level of AI assistant. So it’s like a flat distribution of AI ability given to everyone. A bit like a universal basic income, but universal basic AI access.

So there are four really different ways that you could distribute AIs into society and have them interact. And I feel that no one’s talking about stuff like this — like, which of those worlds is most likely, which of those worlds is possible, and which of those worlds is most desirable. Because fundamentally, we get to choose which of those worlds that we live in. As in, maybe it’s the citizens of the United States of America or other countries that are developing these things that do actually get to make some of these choices. And if they think that one of these paths is very bad, they may be able to stop it and go down different paths.

So that’s the kind of thing I’m thinking of, in terms of we could think a lot broader and bigger about where are we going to be in five years or where we want to be — rather than the minutiae about exactly who’s ahead at the moment and exactly what are they prepared to accept in terms of regulation.

Why did you build it if you thought it could kill everyone?

Toby Ord: I feel that there are some risks that we face, such as the risk of asteroid impact — which thankfully does turn out to be very small. But if an asteroid were to be found on a collision course with the Earth, one that’s large enough to destroy us — so 10 kilometres across, like the one that killed the dinosaurs — we actually don’t have any abilities at the moment to deflect asteroids of that size. And if we saw it on a collision course for us in a few years’ time, I’m not sure that we could develop any means of deflecting it. The ones we can deflect are something like a thousandth the mass of that.

So suppose that asteroid slammed into the Earth and we all died, and somehow in this metaphor, we went to the pearly gates of heaven and St Peter was there letting us in. And we said, “I’m sorry, we really tried on this asteroid thing. And maybe we should have been working on it before we saw it, but ultimately we felt that there was nothing we could do” — I think that you’d get somewhat of a sympathetic hearing.

Whereas if instead you turn up and you say, “We built AI that we knew that we didn’t know how to control. Despite the fact that, yes, admittedly, a number of Nobel Prize winners in AI, I think all of the Nobel Prize winners in AI perhaps have warned that it could kill everyone. Something like half of the most senior people in AI have directly warned that this could cause human extinction. But we had to build it. And so we built it. And it turns out it was difficult to align it and so we all died” — I feel that you would get a much less sympathetic hearing.

It’d be like, “Hang on. You lost me at the step where you said, ‘We had to build it.’ Why did you build it if you thought it would kill you all?”

Rob Wiblin: The responses that you would give would feel wanting.

Toby Ord: Yes. You know, maybe they’d be like, “I thought that if I didn’t do it, they would do it.” “And so who did it?” “Well, I did it.” “So you built the thing that killed everyone?” “Yes, but I felt…” I just think that you would have trouble explaining yourself. And I feel like we should hold ourselves to a higher standard. Not just like “technology made me do it” or “the technological landscape made me do it,” or —

Rob Wiblin: “China made me do it.”

Toby Ord: “China made me do it.” Despite the fact that they didn’t start the race, the US started the race — you know, because maybe China would have started a race. It’s like explaining to the teacher about this fight that you started by punching some kid in the face, because you’re claiming that they would have punched you if you didn’t punch them or something. It just doesn’t really cut it.

And I feel that we should hold ourselves to somewhat higher standards on these things, and to not just think about, “What if I changed my action, or some very small group of people’s actions, how could I change the overall trajectory?” But rather to note that there are worlds that do seem to be available to us — where both, say, the US and China decide not to race for this thing.

That would involve having a conversation about that. It would involve verification conditions being sorted out. I think that there may well be such abilities to verify. Even if there weren’t, though, it might still be possible. I think that given the actual evidence we have, I don’t think it’s in the US’s interest to push towards AI or in China’s interest. I think it’s in both their interests to not do it. And if so, that’s not a prisoner’s dilemma. Cooperation is actually quite easy, because it’s not in anyone’s interest to defect. And I think that could well be the game in terms of game theory.

And yet there’s just very little discussion or thinking about these things. I don’t mean to say that we should be naive and assume that all incentives issues and all kinds of adversarial aspects are irrelevant. But we need at least some people, and I think more people than we currently have, thinking on these larger margins. Not just what could I do unilaterally? I know I couldn’t stop the whole of AI happening or happening in a certain direction, but maybe if enough people did something, that one could.

And I think that there’s a tendency for fairly technical communities to focus on things that are quite wonkish, as they say in the policy world. So technical or policy proposals that are quite technical and hard to understand, but they might be able to help with the issue at hand if you follow through the details. I love this stuff, right? So this applies to me as much as it does to anyone else.

But there’s a different style of doing things in politics, which is instead getting much larger changes — which happens by setting a vision and crystallising or coordinating the public mood around that vision.

So in the case of AI, if you say, “We’ve got to do this thing,” it’s like, well, does the public want it? No, it seems like the public are really scared by it, and actually think that things are going far too fast. So that’s somewhere where, even if the politicians haven’t quite gotten there yet, it may be possible to speak to the public about their concerns. And if we did, I think the answer is they’re probably not concerned enough about these things.

Things can move very quickly in those cases. If you set a vision and actually lead — and try to have this approach of not just pushing things on the margins, but of noticing that there’s a really quite different direction that perhaps we should be headed in — I think things can really happen.

Scientific moratoriums have worked before

Toby Ord: When it comes to scientific moratoriums, we’ve got some examples, such as the moratorium on human cloning and the moratorium on human germline genetic engineering — that’s genetic engineering that’s inherited down to the children, that could lead to splintering into different species. In both those cases, when the scientific community involved had gotten to the cusp of that technology becoming possible — such as having cloned sheep, a different kind of mammal, and the humans wouldn’t be that different — they realised that a lot of them felt uneasy about this privately.

So they opened up more of a conversation around this, both among themselves and also with the public. And they found that actually, yeah, they were really quite uneasy about it. And they wanted to be able to perhaps continue working on things like the cloning sheep, but actually that would be easier to work on and think about if the issue about cloning humans was off the table.

So their approach, I think of it not quite as a pause for a certain amount of time. They also didn’t say, “We can never ever do this, and anyone who does it is evil” or something. Instead, what they were saying is, “Not now. It’s not close to happening. Let’s close the box. Put the box back in the attic. And if in the future the scientific community comes together and decides to lift the moratorium, they’d be welcome to do that. But for the foreseeable future, it’s not happening.”

And it seems to me that in the case of AI, that’s kind of where we’re at. We’re at a situation where, as I said, about half of all of the luminaries in AI have said that this is one of the biggest issues facing humanity: the fact that there is a risk of, in their single-sentence statement, a risk of human extinction from this technology that they’re developing.

So I think that this would be a challenging thing to do. My guess is that there’s something like a 5% to 10% chance that some kind of moratorium like this — perhaps starting from the scientific community effectively saying you would be persona non grata if you were to work on systems that would take us beyond that human level — would work. But if it did work, it would set aside a whole bunch of these risks, even if that risk landscape is very confusing and has lots of different possibilities.

Some of these types of ideas might be able to act on many of those different types of risk. And I think that that’s a way where the scientific community — a relatively small number of actors, who have already kind of coordinated via producing these open letters and things — could have that conversation. If they crystallise their view, and for example the AAAI, their professional association, if it came out behind this and so on, it could be that that crystallises out of their opinion.

People could then look at the situation with the scientists saying, “We think that this is a big problem, and that it’s not responsible to do it.” That could then create norm changes which mean that it’s difficult to pursue it.

I think if the scientific community had a moratorium on it, then organisations like Google DeepMind — that sees itself as a science player, a science company that’s doing respectable science work — it’s not going to violate a scientific moratorium on something. It could be different for the more engineering type places, and the more “move fast and break things” cultures. So it doesn’t necessarily do everything on its own. It would probably need to form a normative basis for actual regulation of some sort.

But I do think that things like this are possible. And if we went to St Peter after we all go extinct due to some AI disaster, and we said, “We couldn’t stop it.” And he said, “Did you even have a conversation about a moratorium?” It’s like, “We thought about that, and we decided it probably wouldn’t work, so we wouldn’t even talk about it.” That would seem crazy.

So I think we need to actually do some of these more obvious things that are just natural and earnest, rather than trying to precalculate out, “Obviously it would seem sensible to have that conversation. That’s what you’d want another planet to do. But for us, we know the conversation will not work out, so we’re not going to have it, and we’ll just carry on building these systems.” I feel like that’s the kind of the wrong way of thinking.

Companies made a strategic error shooting down SB 1047

Toby Ord: So industry often does want to, like individual actors do want to have things be safe. They’ve often got a lot of concerns about how quickly market forces are making them act and how quickly market forces are making them deploy their new models, because everyone else is deploying quickly. And if they could all be bound by the safety thing so that their competitors didn’t have an advantage over them if only I’m bound, then they tend to want that. So it was frankly a bit surprising that there was this hostility.

But yeah, I do feel that there already has been a very good-faith attempt by the safety community to come up with the kind of bill that tries to meet all of the complaints that the other people have. And even that was shot down.

Rob Wiblin: Yeah, I’ve said this on the show before, but I do think that the industry is potentially shooting itself in the foot here. Because the thing that is most likely to bring about the sort of draconian regulation that people who are optimistic about AI technology are most scared of is some sort of disaster. Any sort of disaster that actually leads to loss of life could lead to a very big change in attitudes and lead to maybe more draconian regulation than is necessary from anyone’s point of view.

Toby Ord: And even if you think your company’s never going to make that mistake, you might think these cowboys down the street are exactly the kind of people who could make that kind of mistake, and they need some regulation that will stop them from ruining the party for everyone, right?

I really do think that this is very short-sighted. And on top of that, sometimes we talk about someone having conflict of interest: these places are very conflicted. And if you did find that, as a company, you thought it wasn’t in your interest, but also you get big stock bonuses and so on for the more stuff that you put out, you really want to inspect your own views quite carefully. We talk about various forms of biases and prejudices that people might end up having, and it would be very difficult to actually keep straight your actual prediction on this thing, as opposed to these other incentives that you’re facing.

Rob Wiblin: Yeah. Another interesting dynamic that I see going on is that when I’m thinking about SB 1047, or any proposed regulation that we might put in place now, I’m thinking of this as the very first step in a very iterated process — where almost certainly there’s going to be a whole lot of problems that we’re going to identify with it, it wasn’t written quite right, but we’ll just improve those over time. And you’ve got to start somewhere in order to begin learning what might succeed.

I think the people who are very against it, they think that whatever we put in place now is going to be potentially there forever. It’s not really the beginning of a process. Maybe for them it’s like this is just the beginning of a ratchet, where everything is going to become more and more extreme over time rather than be kind of perfected and improved.

Toby Ord: I mean, I think there might be a bit of a ratchet if you started with something like 1047, but that’s because 1047 is obviously too weak. And they will be looking back on the days when 1047 was the issue and thinking, “Oh my god.”

Rob Wiblin: “Should have taken that deal.”

Toby Ord: Yeah, I think so. But I really do think they may have a good point here. If it is the case that whatever the first regulation is sets the entire frame, and it’s not possible to step out to a different frame… For example, suppose the first thing is about compute thresholds for pre-training and then you can never escape that frame or something: that could be a big problem if then the scaling stops. So it really can matter.

But is that a recipe for therefore complete laissez faire, no regulation, do whatever you want? That’s obviously too quick. But if it is the case that in certain regulatory environments the default is that if you introduce things they stay forever, that could be a bad thing, and it could be that there’s some win-wins that one could find there — because the safety community also don’t want to be stuck in silly frames that no longer make sense.

Articles, books, and other media discussed in the show

Toby’s work:

Others’ work in this space:

Staying up to date with AI developments:

Other 80,000 Hours podcast episodes:

Related episodes

About the show

The 80,000 Hours Podcast features unusually in-depth conversations about the world's most pressing problems and how you can use your career to solve them. We invite guests pursuing a wide range of career paths — from academics and activists to entrepreneurs and policymakers — to analyse the case for and against working on different issues and which approaches are best for solving them.

Get in touch with feedback or guest suggestions by emailing [email protected].

What should I listen to first?

We've carefully selected 10 episodes we think it could make sense to listen to first, on a separate podcast feed:

Check out 'Effective Altruism: An Introduction'

Subscribe here, or anywhere you get podcasts:

If you're new, see the podcast homepage for ideas on where to start, or browse our full episode archive.