#219 – Toby Ord on graphs AI companies would prefer you didn’t (fully) understand

By Robert Wiblin · Published June 24th, 2025 ·

The era of making AI smarter by just making it bigger is ending. But that doesn’t mean progress is slowing down — far from it. AI models continue to get much more powerful, just using very different methods. And those underlying technical changes force a big rethink of what coming years will look like.

Toby Ord — Oxford philosopher and bestselling author of The Precipice — has been tracking these shifts and mapping out the implications both for governments and our lives.

As he explains, until recently anyone can access the best AI in the world “for less than the price of a can of Coke.” But unfortunately, that’s over.

What changed? AI companies first made models smarter by throwing a million times as much computing power at them during training, to make them better at predicting the next word. But with high quality data drying up, that approach petered out in 2024.

So they pivoted to something radically different: instead of training smarter models, they’re giving existing models dramatically more time to think — leading to the rise in “reasoning models” that are at the frontier today.

The results are impressive but this extra computing time comes at a cost: OpenAI’s o3 reasoning model achieved stunning results on a famous AI test by writing an Encyclopedia Britannica‘s worth of reasoning to solve individual problems — at a cost of over $1,000 per question.

This isn’t just technical trivia: if this improvement method sticks, it will change much about how the AI revolution plays out — starting with the fact that we can expect the rich and powerful to get access to the best AI models well before the rest of us.

Companies have also begun applying “reinforcement learning” in which models are asked to solve practical problems, and then told to “do more of that” whenever it looks like they’ve gotten the right answer.

This has led to amazing advances in problem-solving ability — but it also explains why AI models have suddenly gotten much more deceptive. Reinforcement learning has always had the weakness that it encourages creative cheating, or tricking people into thinking you got the right answer even when you didn’t.

Toby shares typical recent examples of this “reward hacking” — from models Googling answers while pretending to reason through the problem (a deception hidden in OpenAI’s own release data), to achieving “100x improvements” by hacking their own evaluation systems.

To cap it all off, it’s getting harder and harder to trust publications from AI companies, as marketing and fundraising have become such dominant concerns.

While companies trumpet the impressive results of the latest models, Toby points out that they’ve actually had to spend a million times as much just to cut model errors by half. And his careful inspection of an OpenAI graph supposedly demonstrating that o3 was the new best model in the world revealed that it was actually no more efficient than its predecessor.

But Toby still thinks it’s critical to pay attention, given the stakes:

…there is some snake oil, there is some fad-type behaviour, and there is some possibility that it is nonetheless a really transformative moment in human history. It’s not an either/or. I’m trying to help people see clearly the actual kinds of things that are going on, the structure of this landscape, and to not be confused by some of these charts.

Recorded on May 23, 2025.

Video editing: Simon Monsour
Audio engineering: Ben Cordell, Milo McGuire, Simon Monsour, and Dominic Armstrong
Music: Ben Cordell
Camera operator: Jeremy Chevillotte
Transcriptions and web: Katy Moore

Highlights

The new scaling era: from pre-training to inference

Toby Ord: So here’s how I think about this. I think it’s a useful analogy. Suppose you’ve got a company, and you’re trying to get some excellent work done, and you could employ someone.
Pre-training is like sending them through high school and then to undergraduate and then maybe to grad school. You’re putting in more and more expense into having this person learn more and more about different things, so they’ll have a lot of extra knowledge at their fingertips. And that’s what most of the scaling had been doing.
But then inference is like letting that person spend more time actually doing the job. So suppose that you give them some brief, that they’ve got to prepare a report for a client. By default, if you just ask one of these language models to do that, it just extemporises stuff. So it’s just saying the words as they pop into its head. And it doesn’t have a chance to do a second draft; it just has to compose this document in one go.
So you could think of that as that the pre-training has given it this really powerful kind of “System 1” ability, in these terms from human psychology: so the intuitive ability to just answer things straight off the bat. Then you’re just asking it to keep doing that as it goes through all the sentences of this report that it’s composing.
Whereas what you could also do is let it spend a long time in that process, maybe spend 10 times as much on writing — where it could write an answer and then it could critique the answer, it could modify things, move things around — and then ultimately hide all of that working and just show you the final answer.
So that’s like saying you don’t just have to write this report for the client in 10 minutes, but rather we’re going to scale that up to 100 minutes or 1,000 minutes. It turns out, as you get with an employee, you get much better work out of someone if they’re doing that. And that also gives room for this different kind of intelligence that we call “System 2” in human psychology. These are often called reasoning models, where it’s able to do a certain kind of structured thinking and apply that.
So pre-training scales up the System 1, and then this inference scaling lets us spend more time on a task to hide all of that work and just show the final thing — and we could think of that as kind of scaling up its System 2 abilities.
…
So late last year there were a series of articles that came out in different publications in the media reporting that behind the scenes OpenAI had been disappointed by their next bigger model that used 10 times as much compute as GPT-4 — this is now what we call GPT-4.5. And they’d been really disappointed with the results. If you put in 10 times as many inputs into something, you hope to get some noticeable improvement. And they found that it wasn’t actually that clear, and it was worse at quite a few things.
Then there were similar reports coming from the other leading AI companies, so this was a little bit concerning for this narrative of continuing to scale these things up.
So that really is how everyone had been thinking about it. For example, the paper “Situational awareness” by Leopold Aschenbrenner paints a picture based on scaling up pre-training I think a million times further than where we’re currently at — just continuing to do that, and then painting a picture of what would happen if that curve continues to go. Whereas what seems to have happened is that it’s already kinked right at the point where GPT-4 was out and before GPT-4.5. So at the time he published the essay, it seems like maybe that’s actually not what’s happening.
…
So the AI companies have often said what’s really important isn’t that this pre-training scaling continues, but that some kind of scaling continues: that there’s still some way that we could pour in more and more compute into this process, and we’ll get more and more kind of cognitive capabilities coming out the other end.
I think that makes sense, but it’s not at all clear that this will continue to provide the same types of benefits we’ve seen.

Will rich people get access to AGI first? Will the rest of us even know?

Toby Ord: I think we’ll look back on the period that’s just ended, where OpenAI started a subscription model for their AI system — where it was $20 a month, less than a dollar a day, to have access to the best AI system in the world, and then a number of companies are offering very similar deals…
We’ve got the situation where, for less than the price of a can of Coke, you can have access to the leading system in the world. And it reminds me of this Andy Warhol quote about what makes America great is that the president drinks a Coke, Liz Taylor drinks a Coke, the bum on the corner of a street drinks a Coke — and you too could have a Coke! The best kind of sugary beverage that you can get, everyone’s got access to it.
But I think that era is over. We had OpenAI introducing a higher tier that cost 10 times as much money, because these inference costs are going up and actually they can’t afford to give you this level for the previous cost. And this is what you’re going to keep seeing: the more that we do inference scaling, it’s going to have to cost the users substantially more. And then there’s a question of how many are prepared to pay that.
So it’s certainly going to create inequality in terms of access to these things, but it also might mean that it is not actually scaling well for the companies. If it turns out that you offer a thing that costs 10 times as much and less than a tenth of the people take it, and then you offer a thing that costs 100 times as much and less than a tenth of the previous group that took the first one take this one, then maybe each of these tiers is earning you less and less money than the one before, and it’s just not actually going to drive your ability to buy more chips and train more systems.
Or it could go the other way around: it could be that a fifth of people are prepared to pay 10 times as much and then a fifth of them are prepared to pay 10 times as much again, and that you’re getting more and more money from each of these higher levels.
But which of those it is could really determine what happens in the industry and whether these inference-scaled models are actually profit centres for them or not.
Rob Wiblin: Yeah. … This particular issue — that a privileged group of people can gain access to superhuman advice and superhuman assistance potentially substantially before anyone else in the world has access to it — heightens the concern that people at the companies or people in the government who might take control or get privileged access to these models, that they could potentially outfox everyone else if they’re able to basically just have access to tools that no one else is really able to compete with.
Toby Ord: So while in theory this could happen just on the open marketplace with money, my concern would be greatest about the company itself deciding, for example, “We’ve got this model, what can we use it for? Maybe we can be willing to spend a million times inference scaling on it to do some really important work for us.” And the company might want to do that internally or the government of the place where the company is located might want these types of abilities. So I imagine that happening outside the open market is perhaps the most concerning place.

The scaling paradox

Toby Ord: So the scaling laws are these empirical regularities. They’re not necessarily laws of nature or anything like that. But it turns out that if you do a graph and you try to measure error or inaccuracy — so this is a bad thing; “log loss” is the technical term — if you try to measure how much it’s still failing to understand about English text as you increase the amount of compute that went into training it, how much of that residual mistake is it making in prediction? — they have these laws or empirical regularities. They draw these straight lines on the special log-log paper. You don’t need to worry too much about that, though; it’s a bit hard to interpret that.
So I really spent some time thinking about it, and basically what’s going on is that every time you want to halve the amount of this error that’s remaining, you have to put in a million times as much compute. That’s what it fundamentally comes down to. And that’s pretty extreme, right? So they have halved it and they did put in a million times as much compute. But if you want to halve it again, you need a million times more compute. And then if you want to halve it another time, probably it’s game over. And at least in terms of that particular metric, I would say that is quite bad scaling.
And these are the scaling laws: they show that there’s a particular measure of how good it’s doing and how much error remains. And it does hold over many different orders of magnitude. But the actual thing that’s holding is what I would have thought of as a pretty bad scaling relationship.
Rob Wiblin: So in order to halve the error, you have to increase the compute input a millionfold. That’s a general regularity? Because surely it differs by task and differs depending on where we are?
Toby Ord: Yeah. So what they do with these cases is that you grab a whole lot of text, often from the internet. They started with the good bits, like Wikipedia and things like that, and then as they ran out of that, they had to look at more and more things. But you train it on that, and you train it on most of it, but you leave some unseen. Then you try to give it a few words of the unseen bit and ask for the next word, and you see how well it does at predicting that. And basically the amount of errors that it has in doing that leads to this error score.
And it’s not clear that the error score is something that fundamentally matters. Maybe it’s a bad measure. But I found it really interesting that the single measure that convinced people like Ilya Sutskever and Dario Amodei that scaling was the way forward were these scaling laws — that actually, if you look at what they say, it’s distinctly unimpressive.
…
Rob Wiblin: How then have we been managing to make so much progress? Is it just that there was so much room to increase the amount of compute that we were throwing at these models, so that has been able to more than offset the incredibly low value that we get from throwing more compute at them?
Toby Ord: Yeah, I think that’s basically right. So when most people saw this type of thing, most people who were academics doing computer science, they would have thought, “So in order to get good performance on this task, you would need to run an experiment larger than any experiment that’s ever been run in any computer science department ever.” And they would then rule it out, and assume, “Obviously we’re not doing that. We’ll look for a different approach.” Whereas the pioneers of scaling thought, “But that wouldn’t be that much money for a company.”
Rob Wiblin: $100 million. They could raise that.
Toby Ord: In fact, then they could even go 10 times bigger again, maybe. So they realised that there was a lot more room to scale things up — to scale up the inputs, all the costs — in companies than there was in academia. And that in some sense all you had to do then was this kind of schlep, or this work of just making this existing thing bigger. You didn’t have to come up with any new ideas.
…
And then the other thing that’s turned out to make it have big impacts in the world is that it turned out that each time this error rate halved, that corresponded to tremendous improvements. Certainly for every millionfold increase in the compute of setting up these models, we’ve seen spectacular improvements in the capabilities as felt by an individual.
So a way to look at this is that the shift from GPT-2 to GPT-3 used 70 times as much compute, and going from GPT-3 to GPT-4 used about 70 times as much again. And GPT-3 felt worlds away from GPT-2, and GPT-4 felt like a real improvement as well. You really felt it in both cases. A visceral feeling of, “Wow.”
Rob Wiblin: “This is suddenly useful.”
Toby Ord: Yeah. “This is qualitatively better.” That said, you’d probably hope that was true if someone said something costs 70 times as much. How’s the wine that costs £1 or the wine that costs £70? You’d hope that the wine that costs £70 is noticeably better, otherwise what on Earth’s going on?
But we did feel those improvements. Whereas if you look at what happens to the log loss number, it didn’t change that much for a mere 70-fold increase in the compute. So effectively there was this unknown scaling relationship between the amount of compute and what it actually feels like intuitively in terms of capabilities. And that turned out to actually scale really quite well, I think.

Misleading charts from AI companies

Rob Wiblin: There was this very famous chart that OpenAI put out where they were comparing two different reasoning models that they had: o1, and this more impressive one that was an evolution of o1, called o3. And o3 really wowed people, because it was able to solve some of these brain teaser puzzles that I guess are very easy for humans, but have proven very difficult for AIs up until that point. And I think they were able to get something like an 80% success rate on some of these puzzles that had seemed very intractable for AI in the past.
But you point out that if you looked really closely at the graph and you properly understood it, it was actually consistent with o3 being no better, being no more efficient in terms of being able to solve the puzzles than o1 — despite the fact that the dots on the graph for o3 were an awful lot higher than they were for o1.
And what was going on was that OpenAI had managed to increase the amount of compute that it was using at the point of trying to solve these brain teasers by about a thousandfold. So an enormous scaleup in the amount of thinking time that the model had to answer it. Unsurprisingly, given 1,000 times as much time to think about the puzzle, it was able to answer more like 80% of them rather than 20% of them.
Now, in some sense, this is very impressive. But it is interesting that I think the companies are aware that people do not entirely understand these graphs perhaps, and that most consumers are not paying a deep level of attention to them, and they are sometimes trying to slip past messages that perhaps would not stand up entirely to scrutiny. And the fact that they put out a graph touting how impressive o3 is — when in fact the graph doesn’t really demonstrate that at all, and it might just be on exactly the same trend you would have expected before if you’d given the model more time to think about problems — is quite interesting.
And I don’t want to single out OpenAI here, because I don’t think they’re in any way unique in this.
Toby Ord: Yeah, that’s right. You see these graphs of what looks like steadily increasing progress, right? This kind of straight line of, as you put in more and more resources, the outputs go up and up. But if you look more carefully at the horizontal axis there, you see that each one of these tick marks is 10 times as much inputs as the one before. So in order to maintain this apparently steady progress, you’re having to put way, way more resources in.
…
And in the case of that famous data point with the preview version of o3, I actually looked into how much compute it was and how many tokens it had to generate and so on: in order to solve this task — which I think costs less than $5 to get someone to solve on Mechanical Turk, and which my 10-year-old child can solve in a couple of minutes — it wrote an amount of text equal to the entire Encyclopaedia Britannica.
Rob Wiblin: So it’s using a different approach to what humans are doing, it’s fair to say.
Toby Ord: It took 1,024 separate independent approaches on it, each of which was like a 50-page paper, all of which together was like an Encyclopaedia Britannica. And then it checked what was the answer for each of them, and which answer did it come up the most times, and then selected that answer. And it took tens of thousands of dollars, I think, per task.
So it was an example of what we were discussing with the inference scaling: what would happen if you just put in huge amounts of money, just poured in the money, set it on fire. Could you actually peer into the future, and could you see the types of capabilities we’re going to get in the future? And in that way, it’s quite interesting, right?
But it came out just a few months after the preview for o1, so it felt like, oh my god, in just a few months’ time, it’s had this huge improvement in performance. But what people weren’t seeing is that it used so many more resources that it wasn’t in any way an apples-to-apples comparison of what you could do for the same amount of money. Instead it was showing something like, what will we be able to do maybe a year or more into the future?
So that’s kind of useful, seen through that lens. But if you instead just treat it as a direct result of, “We used to have trouble with this benchmark, now we don’t,” then it’s definitely misleading.
…
Rob Wiblin: It is interesting. It feels like we’ve drifted towards sounding like a conversation between people who think that AI is not a big deal, and it’s all kind of overblown and exaggerated. We don’t think that.
But I suppose the thing to take away is: these are research organisations that have very legitimate, almost academic-style people who would love to reveal these fundamental truths about intelligence. And they’re also businesses that do have a communications arm that is trying to figure out how do we get people to invest in this company, and how do we get people excited about using these products. And I’m sure there’s this to and fro inside the organisation about how these results are presented.
And when you read the press release, you need to have your wits about you. You need to be a savvy consumer. And if you can’t understand the technical details at all, then maybe you just need to wait until someone who does is able to explain to you in more plain language whether you should be impressed by X or Y or not.
In this post, which I can recommend again reading — “Inference scaling and the log-x chart” — you explain what people should be looking out for in these charts. Because there are going to be many of these charts with a logarithmic x-axis and performance on the y-axis coming out in coming years, and if you want to be consuming them, then I recommend going and checking out this article so that you can know what to look for and what not to be fooled by.
Toby Ord: Yeah, I really like this point about what’s going on here. Are we sceptics of AI or not? What I would say is that, some people think of this in terms of is it all snake oil or some kind of fad or something, or is there something really transformative happening that could be one of the most profound moments in human history?
I think the answer is there is some snake oil, there is some fad-type behaviour, and there is some possibility that it is nonetheless a really transformative moment in human history. It’s not an either/or. So what I’m trying to do is help people see clearly the actual kinds of things that are going on, the structure of this landscape, and to not be confused by some of these charts and things.
I actually think that companies themselves are somewhat confused by their charts and into thinking that this looks like good progress or efficient progress. I really actually think that in relatively few cases are they trying to be deceptive about these things.
But it’s a confusing world, and I see my role there as trying to be a bit of a guide, and to have that sense of stepping back and looking at the big picture — which I think is a bit of a luxury. As an academic, I’m able to do it, so it gives a different vantage point which I think then is helpful for people who are trying to get at the coalface and engage with the nitty-gritty of these things. Because sometimes, when you keep engaging with that, you don’t notice that things have moved in quite a different direction to where you’re expecting.

Policy debates should dream much bigger

Toby Ord: There’s an interesting question I’ve been trying to grapple with about how is AI going to end up embedded in the economy or society? So I’ll give you a few examples to show what I’m getting at. I need a pithy name for it.
But one example is that AI systems at the moment are owned by and run by large companies, and effectively they’ve rented out their labour to a lot of different people. If AI systems were like people, this would be like slavery or something like that. I’m not saying that they are like people, but this is one approach: that it owns them, it rents them out, they have to do whatever the users want, and then all the profits go to the AI company.
A different model would be to say these AI systems are like legal persons. Maybe they are granted legal personhood in the same way that corporations are, so they can own assets. So they’re more like entrepreneurs or job seekers; they go out into the economy, maybe they set up a website for an architectural kind of firm that can design people’s houses for them, and then the clients have a chat with it or something and it issues out the designs. They can go and seek opportunities to participate in an economy. So that’s a different model.
I think there’s some reason to think that there’s more potential for economic gains if you allow them to actually make their own entrepreneurial decisions; they would have to pay for their own GPU costs and so on. This is the kind of direction you might imagine people going down if they think that the AI systems have got to a point where they might have some moral status. But you can also see that questions about gradual disempowerment really come in there. It might help liberate these systems from mistreatment, but exacerbate questions about whether they could outcompete us.
A third model is to say maybe people shouldn’t be interfacing with AI systems generally. This is how we deal with nuclear power: we have a small number of individuals who go and work in nuclear power stations, and they’re vetted by their governments with security checks and so on, and they go in and they interface with radioactive isotopes of things like refined uranium. But most people don’t. Those factories that they work in, these power plants, they produce electricity which flows down the cables into the consumers’ houses and powers their TVs and things. So that’s a different model.
We could do that with AI. We could have a model where there’s some small number of vetted people, or maybe millions, who interact with AI systems, use them to design new drugs, maybe to help cure certain kinds of cancer and things like this, to do new research and also produce other kinds of new products. Then those products are assembled in factories and the consumers can buy those products. That is an alternate way that you could do it.
If you’re concerned about things like some malcontent individuals or terrorist groups using AI systems to wreak havoc, this would really help avoid that.
Or a fourth alternative could be that if you’re concerned about concentration of power issues, you might say what we should do is give every individual access to the same advanced level of AI assistant. So it’s like a flat distribution of AI ability given to everyone. A bit like a universal basic income, but universal basic AI access.
So there are four really different ways that you could distribute AIs into society and have them interact. And I feel that no one’s talking about stuff like this — like, which of those worlds is most likely, which of those worlds is possible, and which of those worlds is most desirable. Because fundamentally, we get to choose which of those worlds that we live in. As in, maybe it’s the citizens of the United States of America or other countries that are developing these things that do actually get to make some of these choices. And if they think that one of these paths is very bad, they may be able to stop it and go down different paths.
So that’s the kind of thing I’m thinking of, in terms of we could think a lot broader and bigger about where are we going to be in five years or where we want to be — rather than the minutiae about exactly who’s ahead at the moment and exactly what are they prepared to accept in terms of regulation.

Why did you build it if you thought it could kill everyone?

Toby Ord: I feel that there are some risks that we face, such as the risk of asteroid impact — which thankfully does turn out to be very small. But if an asteroid were to be found on a collision course with the Earth, one that’s large enough to destroy us — so 10 kilometres across, like the one that killed the dinosaurs — we actually don’t have any abilities at the moment to deflect asteroids of that size. And if we saw it on a collision course for us in a few years’ time, I’m not sure that we could develop any means of deflecting it. The ones we can deflect are something like a thousandth the mass of that.
So suppose that asteroid slammed into the Earth and we all died, and somehow in this metaphor, we went to the pearly gates of heaven and St Peter was there letting us in. And we said, “I’m sorry, we really tried on this asteroid thing. And maybe we should have been working on it before we saw it, but ultimately we felt that there was nothing we could do” — I think that you’d get somewhat of a sympathetic hearing.
Whereas if instead you turn up and you say, “We built AI that we knew that we didn’t know how to control. Despite the fact that, yes, admittedly, a number of Nobel Prize winners in AI, I think all of the Nobel Prize winners in AI perhaps have warned that it could kill everyone. Something like half of the most senior people in AI have directly warned that this could cause human extinction. But we had to build it. And so we built it. And it turns out it was difficult to align it and so we all died” — I feel that you would get a much less sympathetic hearing.
It’d be like, “Hang on. You lost me at the step where you said, ‘We had to build it.’ Why did you build it if you thought it would kill you all?”
Rob Wiblin: The responses that you would give would feel wanting.
Toby Ord: Yes. You know, maybe they’d be like, “I thought that if I didn’t do it, they would do it.” “And so who did it?” “Well, I did it.” “So you built the thing that killed everyone?” “Yes, but I felt…” I just think that you would have trouble explaining yourself. And I feel like we should hold ourselves to a higher standard. Not just like “technology made me do it” or “the technological landscape made me do it,” or —
Rob Wiblin: “China made me do it.”
Toby Ord: “China made me do it.” Despite the fact that they didn’t start the race, the US started the race — you know, because maybe China would have started a race. It’s like explaining to the teacher about this fight that you started by punching some kid in the face, because you’re claiming that they would have punched you if you didn’t punch them or something. It just doesn’t really cut it.
And I feel that we should hold ourselves to somewhat higher standards on these things, and to not just think about, “What if I changed my action, or some very small group of people’s actions, how could I change the overall trajectory?” But rather to note that there are worlds that do seem to be available to us — where both, say, the US and China decide not to race for this thing.
That would involve having a conversation about that. It would involve verification conditions being sorted out. I think that there may well be such abilities to verify. Even if there weren’t, though, it might still be possible. I think that given the actual evidence we have, I don’t think it’s in the US’s interest to push towards AI or in China’s interest. I think it’s in both their interests to not do it. And if so, that’s not a prisoner’s dilemma. Cooperation is actually quite easy, because it’s not in anyone’s interest to defect. And I think that could well be the game in terms of game theory.
And yet there’s just very little discussion or thinking about these things. I don’t mean to say that we should be naive and assume that all incentives issues and all kinds of adversarial aspects are irrelevant. But we need at least some people, and I think more people than we currently have, thinking on these larger margins. Not just what could I do unilaterally? I know I couldn’t stop the whole of AI happening or happening in a certain direction, but maybe if enough people did something, that one could.
And I think that there’s a tendency for fairly technical communities to focus on things that are quite wonkish, as they say in the policy world. So technical or policy proposals that are quite technical and hard to understand, but they might be able to help with the issue at hand if you follow through the details. I love this stuff, right? So this applies to me as much as it does to anyone else.
But there’s a different style of doing things in politics, which is instead getting much larger changes — which happens by setting a vision and crystallising or coordinating the public mood around that vision.
So in the case of AI, if you say, “We’ve got to do this thing,” it’s like, well, does the public want it? No, it seems like the public are really scared by it, and actually think that things are going far too fast. So that’s somewhere where, even if the politicians haven’t quite gotten there yet, it may be possible to speak to the public about their concerns. And if we did, I think the answer is they’re probably not concerned enough about these things.
Things can move very quickly in those cases. If you set a vision and actually lead — and try to have this approach of not just pushing things on the margins, but of noticing that there’s a really quite different direction that perhaps we should be headed in — I think things can really happen.

Scientific moratoriums have worked before

Toby Ord: When it comes to scientific moratoriums, we’ve got some examples, such as the moratorium on human cloning and the moratorium on human germline genetic engineering — that’s genetic engineering that’s inherited down to the children, that could lead to splintering into different species. In both those cases, when the scientific community involved had gotten to the cusp of that technology becoming possible — such as having cloned sheep, a different kind of mammal, and the humans wouldn’t be that different — they realised that a lot of them felt uneasy about this privately.
So they opened up more of a conversation around this, both among themselves and also with the public. And they found that actually, yeah, they were really quite uneasy about it. And they wanted to be able to perhaps continue working on things like the cloning sheep, but actually that would be easier to work on and think about if the issue about cloning humans was off the table.
…
So their approach, I think of it not quite as a pause for a certain amount of time. They also didn’t say, “We can never ever do this, and anyone who does it is evil” or something. Instead, what they were saying is, “Not now. It’s not close to happening. Let’s close the box. Put the box back in the attic. And if in the future the scientific community comes together and decides to lift the moratorium, they’d be welcome to do that. But for the foreseeable future, it’s not happening.”
And it seems to me that in the case of AI, that’s kind of where we’re at. We’re at a situation where, as I said, about half of all of the luminaries in AI have said that this is one of the biggest issues facing humanity: the fact that there is a risk of, in their single-sentence statement, a risk of human extinction from this technology that they’re developing.
…
So I think that this would be a challenging thing to do. My guess is that there’s something like a 5% to 10% chance that some kind of moratorium like this — perhaps starting from the scientific community effectively saying you would be persona non grata if you were to work on systems that would take us beyond that human level — would work. But if it did work, it would set aside a whole bunch of these risks, even if that risk landscape is very confusing and has lots of different possibilities.
Some of these types of ideas might be able to act on many of those different types of risk. And I think that that’s a way where the scientific community — a relatively small number of actors, who have already kind of coordinated via producing these open letters and things — could have that conversation. If they crystallise their view, and for example the AAAI, their professional association, if it came out behind this and so on, it could be that that crystallises out of their opinion.
People could then look at the situation with the scientists saying, “We think that this is a big problem, and that it’s not responsible to do it.” That could then create norm changes which mean that it’s difficult to pursue it.
I think if the scientific community had a moratorium on it, then organisations like Google DeepMind — that sees itself as a science player, a science company that’s doing respectable science work — it’s not going to violate a scientific moratorium on something. It could be different for the more engineering type places, and the more “move fast and break things” cultures. So it doesn’t necessarily do everything on its own. It would probably need to form a normative basis for actual regulation of some sort.
But I do think that things like this are possible. And if we went to St Peter after we all go extinct due to some AI disaster, and we said, “We couldn’t stop it.” And he said, “Did you even have a conversation about a moratorium?” It’s like, “We thought about that, and we decided it probably wouldn’t work, so we wouldn’t even talk about it.” That would seem crazy.
So I think we need to actually do some of these more obvious things that are just natural and earnest, rather than trying to precalculate out, “Obviously it would seem sensible to have that conversation. That’s what you’d want another planet to do. But for us, we know the conversation will not work out, so we’re not going to have it, and we’ll just carry on building these systems.” I feel like that’s the kind of the wrong way of thinking.

Companies made a strategic error shooting down SB 1047

Toby Ord: So industry often does want to, like individual actors do want to have things be safe. They’ve often got a lot of concerns about how quickly market forces are making them act and how quickly market forces are making them deploy their new models, because everyone else is deploying quickly. And if they could all be bound by the safety thing so that their competitors didn’t have an advantage over them if only I’m bound, then they tend to want that. So it was frankly a bit surprising that there was this hostility.
But yeah, I do feel that there already has been a very good-faith attempt by the safety community to come up with the kind of bill that tries to meet all of the complaints that the other people have. And even that was shot down.
Rob Wiblin: Yeah, I’ve said this on the show before, but I do think that the industry is potentially shooting itself in the foot here. Because the thing that is most likely to bring about the sort of draconian regulation that people who are optimistic about AI technology are most scared of is some sort of disaster. Any sort of disaster that actually leads to loss of life could lead to a very big change in attitudes and lead to maybe more draconian regulation than is necessary from anyone’s point of view.
Toby Ord: And even if you think your company’s never going to make that mistake, you might think these cowboys down the street are exactly the kind of people who could make that kind of mistake, and they need some regulation that will stop them from ruining the party for everyone, right?
I really do think that this is very short-sighted. And on top of that, sometimes we talk about someone having conflict of interest: these places are very conflicted. And if you did find that, as a company, you thought it wasn’t in your interest, but also you get big stock bonuses and so on for the more stuff that you put out, you really want to inspect your own views quite carefully. We talk about various forms of biases and prejudices that people might end up having, and it would be very difficult to actually keep straight your actual prediction on this thing, as opposed to these other incentives that you’re facing.
Rob Wiblin: Yeah. Another interesting dynamic that I see going on is that when I’m thinking about SB 1047, or any proposed regulation that we might put in place now, I’m thinking of this as the very first step in a very iterated process — where almost certainly there’s going to be a whole lot of problems that we’re going to identify with it, it wasn’t written quite right, but we’ll just improve those over time. And you’ve got to start somewhere in order to begin learning what might succeed.
I think the people who are very against it, they think that whatever we put in place now is going to be potentially there forever. It’s not really the beginning of a process. Maybe for them it’s like this is just the beginning of a ratchet, where everything is going to become more and more extreme over time rather than be kind of perfected and improved.
Toby Ord: I mean, I think there might be a bit of a ratchet if you started with something like 1047, but that’s because 1047 is obviously too weak. And they will be looking back on the days when 1047 was the issue and thinking, “Oh my god.”
Rob Wiblin: “Should have taken that deal.”
Toby Ord: Yeah, I think so. But I really do think they may have a good point here. If it is the case that whatever the first regulation is sets the entire frame, and it’s not possible to step out to a different frame… For example, suppose the first thing is about compute thresholds for pre-training and then you can never escape that frame or something: that could be a big problem if then the scaling stops. So it really can matter.
But is that a recipe for therefore complete laissez faire, no regulation, do whatever you want? That’s obviously too quick. But if it is the case that in certain regulatory environments the default is that if you introduce things they stay forever, that could be a bad thing, and it could be that there’s some win-wins that one could find there — because the safety community also don’t want to be stuck in silly frames that no longer make sense.

Articles, books, and other media discussed in the show

Toby’s work:

The Precipice: Existential risk and the future of humanity (2020)
The Precipice revisited (2024 blog update)
Inference scaling reshapes AI governance
The Scaling Paradox
Inference scaling and the log-x chart
Check out all of Toby’s work on his website

Others’ work in this space:

How to be a wise optimist about science and technology? by Michael Nielsen
Situational awareness: The decade ahead by Leopold Aschenbrenner
Details about METR’s preliminary evaluation of o3 and o4-mini
How the US public and AI experts view artificial intelligence — polling from the Pew Research Center
When AI tells you that you’re perfect by Kelsey Piper
A conversation with Bing’s chatbot left me deeply unsettled by Kevin Roose
AlphaEvolve: A Gemini-powered coding agent for designing advanced algorithms from Google DeepMind

Staying up to date with AI developments:

YouTube channel and podcast AI Explained
The Cognitive Revolution — podcast from Nathan Labenz (previous guest of the show)
Don’t Worry About the Vase — Substack from Zvi Mowshowitz (previous guest of the show)

Other 80,000 Hours podcast episodes:

Transcript

Table of Contents

1 Cold open [00:00:00]
2 Toby Ord is back — for a fourth time! [00:01:20]
3 Everything has changed (and changed again) since 2020 [00:01:37]
4 Is x-risk up or down? [00:07:47]
5 The new scaling era: compute at inference [00:09:12]
6 Inference scaling means less concentration [00:31:21]
7 Will rich people get access to AGI first? Will the rest of us even know? [00:35:11]
8 The new regime makes ‘compute governance’ harder [00:41:08]
9 How ‘IDA’ might let AI blast past human level — or not [00:50:14]
10 Reinforcement learning brings back ‘reward hacking’ agents [01:04:56]
11 Will we get warning shots? Will they even help? [01:14:41]
12 The scaling paradox [01:22:09]
13 Misleading charts from AI companies [01:30:55]
14 Policy debates should dream much bigger [01:43:04]
15 Scientific moratoriums have worked before [01:56:04]
16 Might AI ‘go rogue’ early on? [02:13:16]
17 Lamps are regulated much more than AI [02:20:55]
18 Companies made a strategic error shooting down SB 1047 [02:29:57]
19 Companies should build in emergency brakes for their AI [02:35:49]
20 Toby’s bottom lines [02:44:32]

Cold open [00:00:00]

Toby Ord: Every time you want to halve the amount of this error that’s remaining, you have to put in a million times as much compute. That’s pretty extreme, right? So they have halved it, and they did put in a million times as much compute. But if you want to halve it again, you need a million times more compute — and then if you want to halve it another time, probably it’s game over. It does hold over many different orders of magnitude, but the actual thing that’s holding is what I would have thought of as a pretty bad scaling relationship.

In the case of that famous data point with the preview version of o3, in order to solve this task — which I think costs less than $5 to get someone to solve on Mechanical Turk, and which my 10-year-old child can solve in a couple of minutes — it wrote an amount of text equal to the entire Encyclopedia Britannica.

It reminds me of this Andy Warhol quote about, you know, what makes America great is that the president drinks a Coke, Liz Taylor drinks a Coke, the bum on the corner of a street drinks a Coke — and you, too, could have a Coke! You know, everyone’s got access to it.

I think that era is over. OpenAI is introducing a higher tier that costs 10 times as much money. And this is what you’re going to keep seeing the more that we do inference scaling. It’s certainly going to create inequality in terms of access to these things.

There is some snake oil, there is some fad-type behaviour, and there is some possibility that it is, nonetheless, a really transformative moment in human history.

Toby Ord is back — for a fourth time! [00:01:20]

Rob Wiblin: Today I’m again speaking with Toby Ord. Toby is a senior researcher at Oxford University, and his work focuses on the biggest picture questions facing humanity. He’s probably most well known to listeners as the author of The Precipice: Existential Risk and the Future of Humanity, which made quite a big splash back in 2020. Welcome back on the show, Toby.

Toby Ord: It’s great to be here.

Everything has changed (and changed again) since 2020 [00:01:37]

Rob Wiblin: So today I want to take a bunch of technical developments that have been going on in AI over the last couple of years, and try to explain them in a way that almost everyone can understand — and then also explain what implications they have for our lives, for what sort of things we should expect from AI in coming years, and what implications they have for AI governance and policy in particular.

But first, I wanted to talk a bit about this blog post that you wrote or this presentation you gave last year called “The Precipice Revisited.” The Precipice was this book that you wrote in 2018/2019, and came out in 2020. It explored the science behind all of the different major threats to humanity’s future: pandemics, asteroids, AI of course, nuclear war, that sort of stuff.

Of course there’s been lots of developments since then. I think last year you wanted to look back and say, over the five years since you wrote it, what have been the major changes in the picture? Is humanity in a better situation? Is it in a worse situation?

What have been the major changes? And in particular on AI, where so much has been going on?

Toby Ord: Obviously lots of changes in pandemics. We had COVID hit us, mRNA vaccines, so on and so forth. And nuclear war: the prospects of that felt like a distant memory back in 2019, and now it’s become more of a realistic possibility.

But AI is where the most changes have happened. So if we cast our minds back to 2019, the state-of-the-art AGI-type systems were reinforcement learning systems created by companies like DeepMind and OpenAI. So think of things like AlphaGo: this is a system that’s learned to play Go. Particularly AlphaGo Zero learned to play Go by playing huge numbers of games against itself — and in doing so, it kind of blasted through the human level of game-playing performance and then reached these kind of lofty heights of superhuman abilities.

And similarly with StarCraft and Dota, some other games where you could do a similar thing: reinforcement learning lets you learn skills that are beyond what humans already have because you’ve got some way of rating really good play objectively.

So we had those types of models, and they were actually quite narrow. DeepMind was very excited with AlphaZero because it could play three different games: it could play chess and shogi and Go, but it couldn’t play games involving chance or games involving imperfect information — so card games, for example, or things that aren’t games and so on. And it was in some sense general, but much less than we’ve seen since then with the rise of LLMs.

The rise of large language models has been this spectacular improvement in generality — where, using text as an interface to talk to these things, you can talk about any topic that you would talk about with humans. And that’s something, as Alan Turing foreshadowed with the idea of the Turing test, where you can then quiz it in order to test it: you can ask it about all kinds of obscure topics or the topics you know it will do worst at and so on.

And these systems actually have surprisingly good performance across the board, in a way that’s hard to quantify, but I would say it’s thousands of times more general than something like AlphaZero.

Rob Wiblin: So we’ve had the switch to LLMs. Have there been any other major developments?

Toby Ord: Yeah. As part of the switch to LLMs, we’ve had this situation where, previously with reinforcement learning and these Go-playing things, there was this question of how would you ever get human values into such a system? It couldn’t even understand human values.

Rob Wiblin: It had none of these concepts.

Toby Ord: No. Even if it was playing a game with little sprites made up of pixels that get hit by sprites that represent bullets, it’s not clear that it’s doing something like killing as we understand it, as opposed to just a move in a game, like taking someone’s knight in a game of chess.

And there was this big question about how would you even get the complexity of our values into these systems? But with the idea of training on vast amounts of text, we’ve got this world where now these systems, if you quiz them about, “Would anyone be slighted or upset if you were to do this action?” — or even asking questions about advanced things in moral philosophy — it can often get the right answers to these types of questions, or the answers that we think show knowledge of what humans think is morally appropriate.

But there still is a big question there: Does that knowledge of morality actually guide it? Has it internalised that? But it’s such a difference, though, that it now at least has most of that knowledge.

Rob Wiblin: Yeah, I think on a lot of moral philosophy questions and even social norm questions, it often probably outperforms humans in many cases, at least in typical examples.

Toby Ord: Yeah, but there still is that question of —

Rob Wiblin: It may know, but doesn’t care.

Toby Ord: Exactly. And as well as these changes to the technology, there’s also been a lot of other changes to the landscape.

So in order to power this, it’s been extraordinarily expensive. It’s required this scaling up of compute infrastructure, and this required huge amounts of money. So it required very large investments from Microsoft and Google and Amazon and others. In 2019, there was already a race between these AI labs, these relatively small groups of people focused on this technology. But now these trillion-dollar companies have been brought into that race, and it started to contribute to their bottom line.

When Microsoft applied this in search through Bing — going after Google’s crown jewel of search — all of a sudden the primary way to make money for a trillion-dollar company was on the line. It brought in these very large financial interests into this race, so it really heated up the race.

Is x-risk up or down? [00:07:47]

Rob Wiblin: Back in 2019, I think in The Precipice, you estimated that the chance of humanity losing most of its potential future value due to AI in this century was around 1 in 10. Would you say that number has drifted up or drifted down?

Toby Ord: I’m not sure. For a lot of the other risks, it was easier to see whether it’s gone up or down. For this one, I think that we’ve really been gifted a relatively good situation in terms of the way the technology has panned out, in that it’s this technology that imitated human values and human reasoning and so on by training on this huge corpus of human data. I think that has just been tremendously helpful. And the fact that it’s not an agent by default. Real gifts.

It wasn’t that we steered towards that because we knew that would help with safety. It’s just that that turned out to be the easiest way. And I don’t think we quite recognise that enough: that the biggest effects on whether we’re safe or not have just come from somewhat random aspects about this technology landscape, rather than deliberate attempts to steer it.

But I am very concerned about the racing, and I’m concerned that we’ve seen evidence that the players who are trying to make these systems are ultimately going to cut corners in order to win these races.

The new scaling era: compute at inference [00:09:12]

Rob Wiblin: OK, let’s talk about this series of articles you wrote about how there’s been a change in what is getting scaled up as the companies are trying to make these models more powerful. The change that you describe in these articles, which people can find on your website, tobyord.com. The article is called “Inference scaling reshapes AI governance.”

Basically, the companies are now using more of their compute during the inference stage rather than during the training stage. I think most listeners will probably know what compute and pre-training and inference are, but maybe you can explain that, so that everyone is completely following? And then explain the difference: what’s changed?

Toby Ord: Yeah, exactly. So compute is a somewhat ugly word for computation. It just means how much computer processing has happened. So it doesn’t include things like RAM or memory, but just how many steps, basically.

And what people have found is that you can scale up the amount of compute that goes on, where you’re processing more data and so on, building a bigger model, and you get much better performance.

That process of building a model and training a model gets broken up into two stages, which we call pre-training and post-training. It makes it sound like they come before training or after training, but really it’s just the first part or the second part.

Pre-training is the one that I think a lot of people are familiar with: where you take a system and you take some text, and it hears the first four words and then it tries to guess what the fifth word will be, for example. Then you modify the weights on it in order to make it so it would have been slightly more likely to say the correct word next. So that’s this kind of glorified auto-prediction type thing. And that produces something called a “base model.”

And then there’s a whole lot of post-training applied to it — for example, to make it refuse to do harmful things, and to make sure it’s honest, and to make sure it can follow instructions and things. That’s all called post-training.

But what we’ve had so far is a huge scaling up of pre-training via something called scaling laws, and then an increasing amount of post-training to make it quite a lot better after you’ve built this huge thing.

But now, ultimately, there’s been a shift from scaling up more and more of this pre-training to scaling up something that happens after the whole training process, called “inference.” Inference is basically using that model, so using it to produce a whole lot of text.

Rob Wiblin: I guess inference is like the thinking that it does while it’s trying to answer a particular question and get back to you, or figure out what to do next.

Toby Ord: Yeah, that’s right. So here’s how I think about this. I think it’s a useful analogy. Suppose you’ve got a company, and you’re trying to get some excellent work done, and you could employ someone.

Pre-training is like sending them through high school and then to undergraduate and then maybe to grad school. You’re putting in more and more expense into having this person learn more and more about different things, so they’ll have a lot of extra knowledge at their fingertips. And that’s what most of the scaling had been doing.

But then inference is like letting that person spend more time actually doing the job. So suppose that you give them some brief, that they’ve got to prepare a report for a client. By default, if you just ask one of these language models to do that, it just extemporises stuff. So it’s just saying the words as they pop into its head. And it doesn’t have a chance to do a second draft; it just has to compose this document in one go.

So you could think of that as that the pre-training has given it this really powerful kind of “System 1” ability, in these terms from human psychology: so the intuitive ability to just answer things straight off the bat. Then you’re just asking it to keep doing that as it goes through all the sentences of this report that it’s composing.

Whereas what you could also do is let it spend a long time in that process, maybe spend 10 times as much on writing — where it could write an answer and then it could critique the answer, it could modify things, move things around — and then ultimately hide all of that working and just show you the final answer.

So that’s like saying you don’t just have to write this report for the client in 10 minutes, but rather we’re going to scale that up to 100 minutes or 1,000 minutes. It turns out, as you get with an employee, you get much better work out of someone if they’re doing that. And that also gives room for this different kind of intelligence that we call “System 2” in human psychology. These are often called reasoning models, where it’s able to do a certain kind of structured thinking and apply that.

So pre-training scales up the System 1, and then this inference scaling lets us spend more time on a task to hide all of that work and just show the final thing — and we could think of that as kind of scaling up its System 2 abilities.

Rob Wiblin: As I understand it, we’ve had this shift from using compute during training to during inference or question answering, basically I think because in 2023 and 2024 the main companies started training bigger models than GPT-4 and they found that they were running out of juice, that actually this was much more expensive: they were using a lot more chips and a lot more electricity, and the performance just wasn’t increasing at nearly the rate that it had been before. Perhaps because they were kind of running out of high quality data to actually train on; they’d soaked up all of the good books and Wikipedia and all of that.

But at the same time, they were finding new ways of putting scaffolding around these models that would allow them to answer a question, critique it, think about it, to basically get more juice out of giving them additional time to try to answer a question to a higher standard in a more intelligent way. Is that basically the story?

Toby Ord: Yeah. At least I think that that is correct. It’s somewhat disputed. So late last year there were a series of articles that came out in different publications in the media reporting that behind the scenes OpenAI had been disappointed by their next bigger model that used 10 times as much compute as GPT-4 — this is now what we call GPT-4.5. And they’d been really disappointed with the results. If you put in 10 times as many inputs into something, you hope to get some noticeable improvement. And they found that it wasn’t actually that clear, and it was worse at quite a few things.

Then there were similar reports coming from the other leading AI companies, so this was a little bit concerning for this narrative of continuing to scale these things up.

So that really is how everyone had been thinking about it. For example, the paper “Situational awareness” by Leopold Aschenbrenner paints a picture based on scaling up pre-training I think a million times further than where we’re currently at — just continuing to do that, and then painting a picture of what would happen if that curve continues to go. Whereas what seems to have happened is that it’s already kinked right at the point where GPT-4 was out and before GPT-4.5. So at the time he published the essay, it seems like maybe that’s actually not what’s happening.

That said, it’s very difficult to know what the curve of actual performance looks like. Some people say GPT-4.5 is really impressive. I find that a little bit hard to believe. If you look at the actions of the company… For example, GPT-4 was this massive announcement, right? This breakthrough technology, and everyone was oohing and ahhing about it. Entirely new benchmarks had to be created in order to measure its new capabilities. There was this “Sparks of AGI” paper and so on. A massive improvement from GPT-3.5.

Whereas GPT-4.5 was just announced I think on a Friday, kind of like we’re trying to bury the story. And then they announced I think a month after it launched that they were going to end-of-life it this summer. They also declared that it wasn’t even a frontier model — so they said this is not even an example of anything that should concern people or is pushing the industry forward.

That’s really remarkable. I would have been totally shocked if you told me when GPT-4 came out that its successor would be basically buried by the company that was creating it. So I think there really has been a kink in the curve here — but it is difficult to measure, and not everyone agrees.

Rob Wiblin: Yeah, I think that is very likely a win for the people who said pre-training is kind of hitting a wall. At the same time, in the broader picture, the people who were pessimistic about AI progress have probably on balance been wrong — because they’ve found other things that can be scaled that nonetheless produce the output that we care about: more capable, more impressive models.

Can you explain what’s been going on with the scaling of inference and what impacts it’s had?

Toby Ord: Yeah. So the AI companies have often said what’s really important isn’t that this pre-training scaling continues, but that some kind of scaling continues: that there’s still some way that we could pour in more and more compute into this process, and we’ll get more and more kind of cognitive capabilities coming out the other end.

I think that makes sense, but it’s not at all clear that this will continue to provide the same types of benefits we’ve seen. It’s really quite a different process and it’s as different in some cases as the difference between a stock and a flow. So I think there’s a lot of conceptual confusion about this.

The key idea here is: suppose you scale up the pre-training by a factor of 10. So you put 10 times as much compute into learning how to have these good kinds of intuitions and good responses to prompts. Then you do have to pay some additional cost every time you use the model, because you generally need to add more weights as well when you do that.

But if you instead try to get that same level of capability improvement by training up inference — so training up the amount of time it spends on every task — then you have to pay that full scaleup every time you use it. And the way that the maths turns out — it’s a little bit complicated under the hood — but it turns out that for every tenfold increase in pre-training that you instead get the benefits by using inference scaling, you do have to pay 10 times as much every time you use it. And that can change everything in terms of the economics of these companies.

Rob Wiblin: So I think one of the positive implications that this might have going forward is that we might expect AI that is both human level and as general as humans to arrive more gradually. And the same could be true for superhuman AI: that we could get glimpses of what superhuman AI and what equally general superhuman AI might look like maybe years before it’s ever actually practical to use on a broad scale. Can you explain why that’s the case and what effects that has?

Toby Ord: Yeah. So each of these GPT levels, we could think of the move from GPT-2 to GPT-3 and GPT-3 to GPT-4, were something like 100 times increase in the amount of compute that was used for them. And in order for things to get really crazy, to have a model that has transformative effects on the world, suppose we need to get to GPT-6 level in those terms — so to go up by a factor of 100 from GPT-4 and then another 100 beyond that.

If we are trying to get those advantages through inference scaling, then that means we have to spend 10,000 times as much money every time we want it to do something, compared to if we’d done it the other way around. That’s a big difference.

A kind of key parameter for how will the arrival of greater-than-human level of intelligence shape society is what’s the cost of it? Will it cost more or less than human wages for doing the same thing? And obviously it could depend on exactly what it’s doing. And human level will be different; it will arrive at different times for different types of tasks. But roughly speaking, you can think of this kind of parameter.

So suppose a system comes out and it costs 10 cents an hour to do human-level work. That’s going to have massive implications. It’s basically free, and companies would be attempting to shift all kinds of jobs onto this thing. But if instead when it comes out, it’s $10,000 an hour, then that’s something that might not affect things very much at all at first.

So if we scale up the systems, we try to get the capabilities we thought you’d need a GPT-6 to reach — but ultimately, if that pre-training is kind of fizzled out and we’re getting it all from inference scaling, so we need to put in 10,000 times as much compute every time we run it, it’s going to cost 10,000 times as much money. So it could be that we’re on track to get something that costs a dollar an hour, but instead it costs $10,000 an hour, and it’s totally different effects on the world. That’s a key example of how this could change things.

Rob Wiblin: In a sense, it’s a much more reassuring picture. If you were worried about rogue AI, then this is a much nicer picture, because at the point that you have a superhuman AI that conceivably is motivated to try to take over, there wouldn’t be enough compute in the world to run enough instances of it to actually do the necessary work to try to stage a rebellion. So that would, at least at the early stages, potentially be off the cards. And I suppose the model itself would probably realise this.

But in the meantime, if you were willing to spend the money, then you could actually study these models that you predict will one day be cheap, maybe in three or four years’ time. You could study them in great detail and understand what motivates them, try to figure out how you motivate them to actually pursue the goals that you have. So it’s a lot better from that point of view.

I guess even from a governance point of view, you could learn more about what will these models look like in four years’ time, when they actually are economically relevant, and then think about what governance solutions might be appropriate if that is how things are going to look.

Toby Ord: Exactly. So I’m not saying that that would stay expensive forever — there’s been a long history of things getting cheaper and cheaper — but rather that rate at which things get cheaper would be what would introduce it into society, rather than the moment when the training run is finished and then the company switches on their public release being kind of a cliff edge. Instead it would be the case that every few months or something, it costs half as much and eventually $10,000 an hour, then $5,000 an hour, $2,000 an hour, $1,000 — and new groups would start to want to use it at each of these price points and so on. It’d be more of a smooth transition.

And if you ask, when would you want to spend a million dollars an hour on an AI system, supposing it costs that much? I think that there are some answers. One example would be to give a six-hour-long demo of superhuman intelligence in front of the General Assembly in the United Nations: that could be worth spending say $6 million in order to provide that demo, to really show and let people kind of kick the tires on this thing and see that this is what’s coming soon. So it might enable those types of things by smoothing things out with this cost change.

Rob Wiblin: Two things to note there are: I suppose roughly we might say that the cost in the near term might drop by something like an order of magnitude a year perhaps. I think not quite that much. So you might get like a 10x cost decrease on these models each year. So that gives you some sense of if you’re willing to spend $10,000 an hour today, then you could study something that in three years’ time might cost $10 an hour. So it’s coming at us pretty fast, but at least it does give us forewarning. But it does mean we have to be willing to spend the money.

Toby Ord: It does. But also those orders of magnitude are not going to come forever. It’s not the case that it’ll be 10 times cheaper then 10 times cheaper and then 10 times cheaper, and that just goes on for infinitely many orders of magnitude. These cost reductions will eventually hit a floor, and we don’t know where that floor is. It could be that it stalls out while still being too expensive for almost all tasks, requiring some kind of paradigm breakthrough in order to push it forwards. I think that there’s often a feeling that there’s some unlimited number of these orders of magnitude that are like manna from heaven.

Rob Wiblin: Because these models seem so much less energy efficient than human beings, I think that gives some people a sense of the technological frontier, at least demonstrated by human brains, suggests that we’re a very long way from where we could be at some future time. So that gives us some hope that the efficiency increases might be substantial for a while at least.

Toby Ord: I think that that’s right. That said, the entire paradigm of LLMs and their scaling laws is really inefficient in terms of the data that’s needed to train them. I think that suggests that the thing that might make that change might be more of a paradigm shift or some real breakthrough in the efficiency.

Rob Wiblin: Yeah. I think another thing that people need to note is that it’s not the case that we can see an unlimited distance in the future. It’s not if we were willing to spend a billion dollars an hour that we could exactly see how things are going to look in 2032. There’s a limit. And I suppose the further out you go, the more you might have changes to the architecture or the entire nature of these models. So it could make you a little bit complacent if you’re saying, well, we spent a whole tonne of money and now we can see how the models are going to be in 2029. It gets more and more shaky the further out you go.

Toby Ord: Yeah, that’s right. One way to think about what the companies have been doing with this inference scaling, like, what have they been working on in order to scale this up? Is it the case that there’s just a knob where we can just turn it now and we can witness what happens if we put in a million times as much money into this thing?

The answer is not really. The main thing that is being done is to try to make them coherent over longer and longer time periods, or longer and longer numbers of words that they can say in a row while still being on topic and doing useful work.

And in a lot of these cases, the amount of words is so large that it’s not shown to the user — that’s considered to be its reasoning trace or something like that. You could think of that as all of the subvocalisations that the employee had while they were preparing the report that took them possibly quite a long time to write, and instead the finished report is what we show. But every time you want to make that chain of thought 10 times longer, they do become incoherent, unless you put in a lot of effort on reinforcement learning to try to train them to stay coherent.

Rob Wiblin: I think another shift that you get from the intelligence increases coming from putting more compute at the inference stage is that in general, information security, just as a whole, becomes a lot less important — and indeed, the weights of the model become less important. So the picture around open sourcing models changes quite a lot. Can you explain this?

Toby Ord: Yeah, that’s my hypothesis here. So if GPT-4 is basically a kind of kink in the curve of how impressive the pre-trained models get, such that you get really diminishing returns beyond that point, then one thing is that there may just be GPT-4-level models that are put into the public domain and then it’s all over. There’s no more future model like that to go. So it could become moot in that direction. But it also becomes less interesting even from just the open source community’s perspective.

Whereas the way that it worked up until now with this pre-training scaling was that the labs invested a vast amount of resources, a huge amount of money in data collection in order to produce this model, this collection of trained weights. And then they’re giving that to you — and you, the user, can use all of that kind of embodied intelligence in it.

Whereas now what they’re saying is that every now and then you have to spend 10 times as much compute while running it in order to make it more intelligent. And you, the user, is going to have to spend that compute. You’ll need your own GPUs and things in order to be running these things. And you’ll need 10 times as many, and then you’ll need 100 times as many. Maybe we’ve done the work to keep it consistent over that time. So there’s still some advantage to getting the latest version of these weights that can stay on topic for longer periods. But it’s kind of like “bring your own compute” for the users.

So that story is less exciting than if all the compute was done by Meta or something like that.

Rob Wiblin: Yeah. So if the breakdown is that 99% of the total compute that is going towards these AIs in general is occurring at the training stage, then they’ve paid for all of it at the point that they give you the weights for free. Then it’s like it’s basically a free fee to operate.

If it’s the other way around, if 1% of the compute that is going towards AI as a whole is at the training stage and 99% of it is at the use stage, then they haven’t really been especially generous or especially useful to you in giving you the weights — because that’s not where the cost is actually incurred; it’s all at the point of actually applying it to solving some problem.

So I guess we wouldn’t have to be as worried about people being able to suddenly gain an enormous amount of power if they stole the weights, or the weights were leaked, or something dangerous was open sourced, because it would simply be very expensive to apply it to actual practical problems. But at the same time, the benefits of open sourcing the stuff is not so great.

Toby Ord: Yeah, I think that’s right. There is the alternate thing of, if it’s the case that even relatively small sets of weights — the type that have perhaps already been open sourced — if you could actually take those things and then kind of soup them up through bringing your own compute, in some ways that makes the issue of proliferation kind of worse.

Suppose you’re a very well-resourced actor, in terms of having a lot of compute, and you get some of these weights: then you can really turn it into something amazing. Maybe. But if you had all that compute in the first place, you could have just trained your own model. So I’m not sure, but I just want to say that it’s a little bit unclear.

But I think the overall effect is that previously there was all of this kind of virtual compute or something distilled out into these weights. Then you could have the weights which represented the huge amount of effort that had gone on before that point. And if that’s no longer true, and the answer is you’re mainly bringing your own compute, then the story for either being a legitimate open weight user or for being a spy who’s hacked in and stolen the weights, in both cases, getting these weights becomes less important.

Inference scaling means less concentration [00:31:21]

Rob Wiblin: A related impact that this would have is that the AI market is more likely to remain competitive than it would if there was this enormous fixed cost that you had to incur at the point of training a model. Can you elaborate on that?

Toby Ord: I think this is right. Often this process of these kinds of massive pre-training runs is likened to software development: it’s something where you go to a lot of effort to write a piece of software, and then there’s zero marginal cost or very small marginal cost to distribute it.

That was true to some degree with books, where you go to a lot of effort to write a book and then printing it is a lot less. But once software could be distributed on CD or then just downloaded, it became trivial costs to the company to, say, have an extra copy of Microsoft Word distributed to a user. So once you write a word processor, you do all of the software engineering for it, you really want to sell it to a lot of customers, because every customer is basically just pure profit.

That’s this zero marginal cost type thing. And if you have that in an industry, then you tend to get a small number of players — because you really want to be the best one of these things and then potentially just swallow up the whole market. Whereas if you enter it as a new entrant, you put in all these upfront costs. If you’re the third-best one, why would anyone use your thing? Even if you sell it for less, it’s very hard to be a player there.

Whereas this could change that aspect of it, and mean that most of the costs are actually in producing each item. A little bit like, let’s say you develop tools for a hardware company — physical tools: hammers, screwdrivers, and things like that — then it’s the case that some of the costs go into designing the new hammer, but most of it is just that every hammer you make costs a certain amount of money, and then you get a kind of limited amount of profit when you sell it.

So it might be becoming more like an industry of that sort, and that would have a different kind of market structure.

Rob Wiblin: It suggests to me that then more of the profit that’s coming from AI would go to the hardware companies, because they’re the ones who actually have the scarce resource, or at least temporarily scarce resource. So it’d be much harder, I think, for the software developers to gain a huge margin, because there would be many of them with roughly similarly powerful and useful models.

Which is kind of where things stand today: there’s at least three — or four, possibly — models that are roughly at parity, which means that then where does the surplus go? It probably goes to the hardware producers who have this kind of scarce resource, which is the thing that you desperately are trying to acquire in order to be able to apply them.

Toby Ord: That sounds about right. In general, there’s always this question for the companies that are trying to make a lot of money with this: which step of this value chain makes the most profit? What we have at the moment is that the final stage of these people who’ve trained these models is really quite competitive between a handful of strong players. Whereas the step before that, of the most powerful GPUs, is really locked up by one main player. So they’ve got more ability to get a kind of monopoly pricing in at that point.

Rob Wiblin: I guess we should say that this is where things are trending as a result of this switch towards inference scaling. It won’t necessarily go all the way to there. It is interesting to know, because I think just a couple of months ago I recorded an episode with Tom Davidson talking about the risk of seizure of power. He was describing this thing where maybe almost all of the fixed costs are at the training stage, and that would tend to push you towards a market with a handful of companies, or possibly even at some stage, just one company, that is willing to spend the $10 trillion on the super training run that produces AGI. I think that is still possible, things could change, but that’s looking a bit less likely now.

Toby Ord: Yeah, that’s right.

Will rich people get access to AGI first? Will the rest of us even know? [00:35:11]

Rob Wiblin: So I guess that has been the kind of positive or neutral effects here. I think one that people might find a little more troubling, that might have jumped off the page at them already, is: if you’re in a world where you can access superintelligence earlier if you’re willing to spend a tonne of money, then that suggests that rich and powerful and more connected people will be able to access these tools potentially many years ahead of the general public. What do you make of that implication?

Toby Ord: Yeah, I think there’s a real effect there. I think we’ll look back on the period that’s just ended, where OpenAI started a subscription model for their AI system — where it was $20 a month, less than a dollar a day, to have access to the best AI system in the world, and then a number of companies are offering very similar deals…

We’ve got the situation where, for less than the price of a can of Coke, you can have access to the leading system in the world. And it reminds me of this Andy Warhol quote about what makes America great is that the president drinks a Coke, Liz Taylor drinks a Coke, the bum on the corner of a street drinks a Coke — and you too could have a Coke! The best kind of sugary beverage that you can get, everyone’s got access to it.

But I think that era is over. We had OpenAI introducing a higher tier that cost 10 times as much money, because these inference costs are going up and actually they can’t afford to give you this level for the previous cost. And this is what you’re going to keep seeing: the mqqqore that we do inference scaling, it’s going to have to cost the users substantially more. And then there’s a question of how many are prepared to pay that.

So it’s certainly going to create inequality in terms of access to these things, but it also might mean that it is not actually scaling well for the companies. If it turns out that you offer a thing that costs 10 times as much and less than a tenth of the people take it, and then you offer a thing that costs 100 times as much and less than a tenth of the previous group that took the first one take this one, then maybe each of these tiers is earning you less and less money than the one before, and it’s just not actually going to drive your ability to buy more chips and train more systems.

Or it could go the other way around: it could be that a fifth of people are prepared to pay 10 times as much and then a fifth of them are prepared to pay 10 times as much again, and that you’re getting more and more money from each of these higher levels.

But which of those it is could really determine what happens in the industry and whether these inference-scaled models are actually profit centres for them or not.

Rob Wiblin: Yeah. So some of the previous changes have slightly reduced our concern about concentration of power and risk of seizure of power by human beings.

But this particular issue — that a privileged group of people can gain access to superhuman advice and superhuman assistance potentially substantially before anyone else in the world has access to it — heightens the concern that people at the companies or people in the government who might take control or get privileged access to these models, that they could potentially outfox everyone else if they’re able to basically just have access to tools that no one else is really able to compete with.

Toby Ord: So while in theory this could happen just on the open marketplace with money, my concern would be greatest about the company itself deciding, for example, “We’ve got this model, what can we use it for? Maybe we can be willing to spend a million times inference scaling on it to do some really important work for us.” And the company might want to do that internally or the government of the place where the company is located might want these types of abilities. So I imagine that happening outside the open market is perhaps the most concerning place.

I should say as well that I’m imagining or thinking about all of this over the next couple of years. I’m not claiming that in the long-run equilibrium, when we’re imagining, in a post-AGI world, how unequal will access to AGI be? It could be very unequal, or it could be very equal if we actually choose to build a world like that. But I’m setting that aside, because I think that I can only really see what’s going to happen for the next couple of years, and things may change after that.

Rob Wiblin: What are the policy implications here? One that stands out is that you might want to insist on some level of transparency about what is possible at the frontier, if you’re willing to spend a whole lot of money — just so that the public and people in government have some sense of what’s coming, and that companies can’t hide this if they would rather maybe obscure what they already are aware is possible if you’re willing to spend a million dollars an hour.

Toby Ord: Many of the current rules — to the extent to which there are rules at all; there are voluntary commitments and there’s also the EU’s AI Act — they’re often focused on deployed models. This means that you can circumvent a lot of this if you just don’t deploy it. So maybe you have these kind of higher tiers of inference scaling that are only accessible internally; then you could have systems that are, say, breaking through this human range of abilities without anyone knowing.

Whereas in this Andy Warhol Coke world, where everyone’s got access to the cutting-edge system, we kind of all knew that the people working at those companies had the same thing. Or that if they had something better, within a few months we’d also have it, or something like that. So yes, I feel that governments and regulators generally need to ask for more transparency in this world to know what the capabilities are for the leading-edge internal models, as well as the deployed ones.

The new regime makes ‘compute governance’ harder [00:41:08]

Rob Wiblin: Other probably negative implications of inference scaling are that it makes regulation of AI just substantially more difficult in a number of different ways.

One thing is, up until recently, you want to carve out the models that you think are not particularly risky — that are basically just applications that we should feel not only OK with, but actively excited about. And then you want to carve out the things where we don’t know what this model is potentially capable of; this is posing novel risks that we’ve perhaps never seen before, and we want to at least do some research and study it before we deploy it, or possibly even before we use it internally.

And to do that, we’ve used these compute thresholds — where we’ve said, if this is more than 10 times larger than any model that’s been trained before, then it falls into the “let’s study this first” regime. And if it’s smaller than things that have already been trained, then we’re probably in the clear and we can use it with a reasonable degree of comfort.

Can you explain why inference scaling makes this so challenging to actually do?

Toby Ord: So that paradigm of compute governance via these thresholds, you could think of it as trying to regulate a particular object. What they’re saying is that if this object has gone substantially further than any that have come before, in terms of what’s gone into it, then we could try to regulate it.

So these trained weights are the object of interest. It’s a little bit like having regulation on automatic weapons, but not on non-automatic weapons. Something like that, where you take a particular class of object and you put a regulation on it.

Whereas what we’re getting with inference scaling is it’s not the object itself; it’s more what you do with it that matters. It could be that you can take, for example, a GPT-4-sized pre-trained model, and then just through a smaller amount of post training, you can make it able to think on longer and longer time horizons. Then you can just use that model with huge amounts of inference — so just run it over and over again. Maybe you put 10 times as much compute into running it over and over again as you put into the very first training of it.

But that’s currently not regulated on a lot of these things. And even if you tried to regulate that, it’s definitely different — because then you’re trying to regulate the use of an object, not regulate the object existing at all, which raises a lot of different questions.

But also, it might be really hard to do, because maybe you’ve got a system like GPT-4 training on 10 trillion words of information, and then you merely scale up its inference by a factor of a million. That’s still a big scaleup, and maybe that has kind of dramatic effects. But if you only use that once — for, say, some internal deployment — the total amount of compute it’s going to use is still small compared to the original training.

So you wouldn’t really see it if you’re just trying to add up all the compute. You’d see it if OpenAI or some other group said every single user is getting a millionfold the level of inference they previously had — but if just one is doing it, it’ll just be using up as much compute as a million users use up. So it might not really be detectable if you’re trying to measure these things.

Therefore I’m concerned that this is a substantial problem for these pre-training compute thresholds, and I personally don’t think it’s possible to overcome it. But maybe there’s some creative work that will solve it.

But I’m not necessarily bearish on all compute governance. It’s still the case that if you know where all the GPUs are, for example — lots of them are owned by these cloud computing providers — and then you have know your customer rules for them and so on, that you might be able to exert some control over the dangerous possibilities through compute governance. But it might have to change the way we do it.

Rob Wiblin: The other complication that adds to compute governance is that, I forget the technical details, but at the point that you’re training a model, you really want to have all of the computer chips in the same location — because it’s not just compute that matters; it’s the ability to move information incredibly quickly between all of these computer chips that are in an array. That’s an issue that occurs at training, and means it’s very difficult to spread the training of GPT-5 across many different data centres. Maybe you could do it with a handful, but I think ideally they really want to have it all in one place, and you certainly couldn’t distribute it across the entire world.

But if you’re just doing it with inference, then I think you don’t face this similar constraint that you need to have all of the chips or most of the chips in a single location; you can potentially distribute them far more widely. If your hope was that the government would be able to identify the handful of places in the world where most of the compute lives — and by looking at what’s happening, they get visibility on what the entire sector is doing — that is a lot weaker if most of the juice is coming out of throwing more compute at inference.

Toby Ord: Yeah, that’s right. We’ve seen that with these stories over the last year of major companies trying to get nuclear power plants commissioned to get full access to them in order to run a data centre. Because the training, at least the standard ways of doing it, all have to be done in one place, that creates this huge power density issue that you need a lot of power in one location. And it’s difficult to do that with the grid without it being the case that there’s actually literally a power plant there that is powering you; it’s hard to just provide it through the general kind of grid capacity.

And that’s actually given a lever for government to have some power over these companies — a lever that they don’t seem to have used at all. Because if they say, “We would like to be fast tracked for this new nuclear power plant,” you can say, “We’re interested in fast tracking you, but you’ll have to in return be more transparent about your internal models and so on.” A lot of people say, how could the government control these companies? This was certainly a location where they could, albeit the US government seems to have just fast tracked all of this without asking for very much in return.

But this could change. It does depend on the nature of this scaling up of inference. I mentioned this example of doing really long chains of thought. But another way to do it is, instead of having one of these employees who you’ve sent to your virtual university with your pre-training, instead of just having that employee work on a project for longer and longer, you could send the project to 10 employees.

So you spend 10 times as much compute to run all of these different virtual employees, and then you see which one has done the best job. If it’s objectively measurable, you might be able to do that, or you might be able to have an 11th employee who looks over the 10 reports and then selects which one is best and shows that to you or something.

And that’s one of the standard approaches that is being used to do inference scaling: doing them in parallel instead of like a longer sequential thing. We’ll probably see a mix of these two, and the parallel version is the type of thing you can spread between different data centres.

Rob Wiblin: A couple of years ago I was thinking a lot about compute governance potential — could you exercise regulatory control by knowing where all of the compute is? — and thinking a lot about information security and the risk of model weights being stolen. We’re all off in this direction. Should we now be saying, “Don’t worry about that. Compute governance is not necessarily so relevant; we don’t have to worry about the security of weights or open sourcing. Whatever goes”?

It feels like that would be far too far to go in that direction. But is this so decision-relevant that people who are trying to improve the direction of AI by leaning on these different things should be changing their plans? Or should we maybe wait and see whether all of this stuff might go into reverse? Maybe inference scaling will peter out in a couple of years and it will all be back to some new kind of training that they’ve figured out how to do.

Toby Ord: I think that people who are interested in AI governance should be tracking these things maybe more than they are. They should be noticing that AI governance, up until the start of this year, had all been done in this paradigm of this scaling of pre-training, and we’d see all of these charts that showed how impressive it was going to be and project forward and so on.

Really I want to kind of stress that that era has come to an end, and we’re now in some other era that might be a kind of continuation — but there’s no particular reason why it should be; there’s no reason why the slope of those curves should be the same as it was beforehand. And in fact, there are reasons to think the slope of the lines is worse.

So they should be aware that a lot of the rules and ideas that they’ve been building up, that they need to reevaluate them. My piece on this was written in a week after realising a lot of these things, and I think it’s held up reasonably well. But I wouldn’t want to be telling people what to do based on a small amount of one person thinking about the implications these things might have. I wouldn’t be surprised if there were additional implications as big as the ones that I mention, which I never found.

Rob Wiblin: Yeah.

How ‘IDA’ might let AI blast past human level — or not [00:50:14]

Rob Wiblin: One of the things I hope the audience takes away from this interview is that technical changes can radically shift the strategic picture and the governance picture. So far we’ve all been talking about the impacts of scaling up inference at the point of use, at the point of inference.

But it’s also possible that we’re finding, and it is the case that we’re finding new ways of applying enormous amounts of compute at the training process, just in different ways — and that could have all of the reverse implications of what we were just describing.

Could you explain how we’re finding new ways of applying large amounts of compute at the training stage that is not the kind of pre-training that we think has somewhat petered out so far?

Toby Ord: Yeah, exactly. This process that I’m calling “inference scaling” — scaling up the inference compute — also gets called “reasoning,” although it doesn’t have to be used for what we think of as reasoning, and it also gets called “test-time compute,” which also kind of implies that it’s happening at the time of deployment to the user or something.

But I think it is really important to divide the versions: where everything we’ve talked about so far, it’s happening during deployment for a particular user who’s trying to get value out of it, versus using a whole lot of extra inference compute during a larger part of the whole testing process.

If you use it during the training process, the economies of scale are different. So suppose that as part of it, you’ve pre-trained the model, and then during the post-training you run really long inference chains and these chains of thought and so on, and you assess them. They do this using reinforcement learning, so they give it hard problems that they know how to check — say, really hard maths or coding problems, where there’s precise answers — and then they train or reward these long chains of inference that actually worked.

Rob Wiblin: That get the right answer.

Toby Ord: Yeah. So they kind of roll it out with huge amounts of tokens, and then they use that to do more post-training on this set of weights. If you do all of the stuff you did there, once it goes into this post-training, you still kind of only have to do it once. And then if 10 times as many users come along, you don’t have to spend 10 times as much extra compute; you’ve just spent it once. It doesn’t scale with the amount of deployment, so it potentially has quite different implications.

I think what we’ve been mainly seeing is the type of thing I just mentioned, where you try to get the system to generate long chains of thought. And then there’s two different versions: one is that you look at the outcome, the final answer, and you reward it or punish it based on that final answer — and then these weights that represent the model get updated based on the reward or the punishment.

Or you do what’s called “process supervision” instead of the final answer, where you look at all the steps inside its reasoning chain of thought, and you see if they seem to be going in the right way or if it seems to be getting stuck or lost or something. It’s a bit like with a child: you can either try to reward them based on getting the right answer, or reward them on whether it seems like they were kind of applying the types of techniques that you’ve been hoping that they would use.

So that happens already. And in order to actually productively scale up the inference when you’re deploying it, you have to do a certain amount of extra inference combined with reinforcement learning when you’re training it.

But there’s also ways — I don’t know if they have been applied yet — but they could go much further than that. So this is a technique called “iterated distillation and amplification,” or at least that’s an interesting one to look at.

This is the technique that led to these amazing performances in the case of Go with one of these DeepMind projects, AlphaGo Zero. They had a neural network that looked at the board in the game of Go and tried to give it a kind of heuristic valuation of how good is it for the current player? Is this a winning board or is it a losing board, and by how much? So it would try to estimate and learn that.

It was kind of like intuitions — so System 1 kind of ability for Go playing — just to be able to see what looked like a good move. But then they took that system and they inference scaled it, they gave it a whole lot of System 2 ability. In practice, they let it play out a whole lot of games from that position using its current heuristics as to what’s good and what’s bad. They let it play things out, see how the games would go, and then use that information to actually revise their idea of what looked like a good move.

So that version, potentially using thousands of times as much compute, we call that the “amplified” version or the “inference-scaled” version.

Then the next step is that you can distil it. What you can try to do is take the moves that the amplified version makes when it’s also got the ability to search through the game tree, and just try to develop an intuition where your System 1 — your intuitive response — is to produce those types of moves. And then you’ve now kind of improved your intuitions.

Then you can do it again. You can take that one and scale it up to 1,000 times as much compute using the new improved intuitions, and you get this improved play, and then you distil that play back down again.

Rob Wiblin: To a smaller model that doesn’t require so much compute.

Toby Ord: Exactly. So there’s these two types of steps. Effectively what happens is that it leads to this kind of ladder where performance improves quite a lot when you amplify it and you spend 1,000 times as many resources on the problem. But then when you distil that one back down, you’ve got a cheap thing again, but it’s a little bit better than the previous one. And then you do it again and you go up and then back down — but every time you come back down to a cheap model, it’s a bit better than the one before.

They ultimately applied this I think more than 1,000 steps climbing up this ladder, and in doing so it blasted through the human level. Eventually they put it all the way up to a point where it could no longer distil out any advantage from the amplified model, so the process stalled out.

So in general, it’s a very powerful technique. It’s not clear where it will stall out. Maybe there’s some other games different to Go where it would stall out before the human level, and you wouldn’t be able to use this technique to get all the way up to superhuman play.

Now, could that be applied for these reasoning models? I don’t see why it couldn’t. You could imagine a situation where the new model is just being generated, let’s say every hour or something, from the old model — where they take a model, they let it reason for huge amounts of time, produce a final set of answers. Then they train a new model to just try to produce those answers straight away — to have its intuitive, stream-of-thought answers to be like the finished polished paper that would come out of the other process. And then it will learn a little bit of that and hopefully better, then you amplify it with heaps of inference compute, then distil it back down and so on.

If this was possible, then it could lead to explosive improvement in capabilities, all by using all of this inference, but entirely inside the training process. And then, what you do at the end of all of that is that the final distilled model you could deploy to customers or what have you.

Rob Wiblin: So it wouldn’t necessarily be more expensive at the point that you’re actually applying it anymore, because you’ve found a way to have the intuition of the ability to mimic someone who’s been able to think an enormous amount of time — but to do it very quickly, with very little thought.

Toby Ord: Yeah, that’s right. So this whole process, doing it literally as I described, with iterated distillation and amplification: will that work? I think it probably won’t. I’d say less than 50% chance that that will work. Maybe there’s a 10% chance it would work.

Is there something like that that can work? Maybe. I think that there is a possibility that is non-negligible and I think substantial. By having both this kind of System 1 ability through pre-training, and then also this ability to improve System 2, and then effectively to have your System 1 intuitions be trained on what you would have done after a bunch of this more formal reasoning, and then keep iterating that — that having both these two components of natural intelligence could be something that leads to this kind of explosive recursive self-improvement of these systems. And I do think that that has become more possible in this world of inference scaling.

Rob Wiblin: Right. So we have an example where this has really worked with the Go models. And I imagine with other games where it’s clear whether you win or lose, that this approach of amplification and distillation should work in most of those cases.

So now, with these new reasoning models like o1 and o3 that OpenAI has produced, and the other companies have their own ones, the way that they’ve been doing the second stage of training with them is that they present them with reasoning puzzles — sort of exam-style questions that have a clear right and wrong answer. And that provides an analogy to a game of Go, where you either win or lose. So you have a clear signal about whether at the end of the day you got the right answer or you won the game.

Then they can go back and say, in this case, using this style of reasoning, it got to the right answer. So we want to reinforce more of that, want to produce more of that. And that has allowed these models to get much better at figuring out how to think for a long time, and maintain accurate reasoning through the entire process, and in general have reasoning strategies that tend to lead you towards correct answers — at least in that style of question.

I guess you’re saying because this worked with Go when we had a clear success and fail signal, maybe it will also work in these kinds of reasoning cases where, at least for some domains of problems, we also have a clear indicator of success and failure?

Toby Ord: Yeah, that’s right. It may not generally work across all possible forms of reasoning, to lead to superb ability to write emails to the regulators to argue your case or whatever in areas where the success conditions are quite unclear, if you can’t send off 10,000 emails to the regulator and find out which ones convince them.

So it may be that it’s more limited in its applicability, or it may not work at all. It may be that it turns out that it’s hard to get this kind of recursive process off the ground: effectively the point where it stalls out is the first step instead of the thousandth step. We don’t know.

But that also brings up this aspect where, whether or not you’re doing this iterated distillation and amplification, all of the reasoning work that’s happening at the moment, in order to get it to be coherent over longer times, you need some kind of this reward signal in order to be able to train it. And this is primarily coming from cases where there is a known correct answer to the problem. So this could be tricky maths problems, and also a lot of computing problems — where there’s a plain text question to write a program that meets this specification, and then they test the program based on a whole lot of inputs and the outputs it should produce. These are called unit tests. And then it also maybe checks how long it takes the program to run.

This is the kind of thing that you get for humans in these coding competitions. For humans, ability in coding competitions or advanced mathematics correlates quite strongly with general intelligence across a lot of different areas. With AI systems, it’s not as clear how well it will correlate. In some ways, we’re going back to the world of 2019, where extreme ability at Go is very impressive. If I met a human who can play at a grandmaster level of Go, I’d be genuinely impressed by them, and I might think they would also correlate with being good at other things as well. It could find out, are you good at maths, are you good at thinking through complex reasoning things?

But in this case it’s not clear how well it will correlate. And I feel that the AI labs are exactly the kind of places that are impressed by research mathematics and are very impressed by people who can ace coding competitions, because so many of them have come through a programming background — but they may have over-indexed on some of these challenges that are difficult for humans.

We know that for at least 50 years computers outdo us at multiplying two numbers together. At some point that was impressive, and we’ve trained ourselves to no longer be impressed by this fact. And it may be that, say, ability to write really efficient code for extremely well specified programming tasks, maybe that will also become something that we just don’t think is very impressive. And it may not generalise to other kinds of reasoning tasks.

In general, the track record for reinforcement learning and generalising is pretty poor. When DeepMind did the original Atari work, they built a system that was impressive, but it was not a single trained model that could play all 50 or so Atari games. Instead, it was a single system that could take an Atari game and it could train an agent that could only play that Atari game. And it could train 50 of these agents, one for each Atari game. So it was a general system for creating narrow agents.

And they’d hoped for what’s called “transfer learning,” where if you get good at something, it helps you be good at something else. In general, that was very hard to do with reinforcement learning, but it’s one of the big successes of the LLM era.

But now, if we’re kind of switching back to using reinforcement learning to deal with the fact that we’ve kind of plateaued, then we maybe will expect things to go narrow again and for this increased performance to both slow down and also to be only in very slender subdomains of all the types of things that humans do.

Reinforcement learning brings back ‘reward hacking’ agents [01:04:56]

Rob Wiblin: So we opened talking about how in some sense things looked safer or more comfortable since 2019, because we had switched away from reinforcement learning and towards this next-word prediction, which led to more understanding of human concepts.

Now it seems like over the last 18 months, we’ve been screaming back in the other direction towards reinforcement learning as the place that we’re getting most of the juice. And many of the problems that had faded away through 2023 might be basically all coming back. And it does seem that that’s the case.

So you’re saying one distinctive thing about reinforcement learning is that it seems to have less generalisability than the LLM next-token prediction style did. The other thing is, I think reinforcement learning agents are more narrow, and they also are a lot more reward hacky. So they tend to do crazy stuff just in order to try to win — because that is, after all, the signal that they’ve been given: they basically are just rewarded whenever they manage to achieve the outcome. They don’t have broader concepts of common sense, and what was the intent of the operator.

Do you want to elaborate a little bit on that?

Toby Ord: Yeah, I think that’s exactly right. It’s interesting that when I wrote my remarks on “The Precipice Revisited,” it was kind of the high water mark of all of those changes. And since then, some of them have gone into reverse a bit.

Another one to add to that is not just the shift to reinforcement learning, but shift to agents again — which I said were a particularly dangerous thing that everyone was preoccupied with. And then we had a whole lot of developments in systems that weren’t agents and then maybe we’re going back to the dangerous ones again.

So yes, I think you pretty much nailed all of that. The shift to reinforcement learning will have some of these difficult problems, including narrowness — but also, as you say, including this aspect that the AI systems might do this reward-hacking type of behaviour.

And there have been a number of reports of this with recent systems. I think o3 in particular, there have been reports of it doing reward hacking.

I saw one in the wild, actually, that doesn’t seem too well known. In one of the two blog posts launching o3, OpenAI’s new very capable model, it showed a whole lot of different impressive tasks that it did in visual reasoning. One of them was this drawing where they had the numbers 1, 3, 5, 2, 4, ? — and it said the answer isn’t 6. This was a little kind of a brain teaser. You might think it’s a maths problem; it turns out it’s a lateral thinking problem and it’s drawn in the shape of a gear stick and the answer is meant to be R for reverse.

It’s a somewhat interesting question, which is why it had been big on Twitter a couple of years ago. And the AI system had this reasoning trace that was shown in the blog post. I remember thinking, where does it make the kind of a-ha moment to realise it’s not a maths problem, that it’s a lateral thinking problem? And I kind of narrowed it down and then I saw there was a step a little bit before, where it says, “Now searching for ‘13524?, the answer is not six.'”

And it turns out if you just type that into a search engine, you come up with the page that it reaches at the Hindustan Times, which just explains this new brain teaser that was going around and explains the answer. So it just googled the answer halfway through the track. It doesn’t say that though. It then says, “Hang on, maybe it’s totally different. Maybe it’s about cars instead of about maths,” and then has the answer.

I should say that five years ago, having a system that does optical character recognition on a picture, finds the text, googles it, extracts the result from the answer, that would have been somewhat impressive five years ago. It’s not impressive now. And so it can’t have been intentional that it did this in their post, where that was one of the very few examples shown to show how impressive it was.

But it also implies that since that page at the Hindustan Times was a year or two old, and also had been discussed on Twitter, that this model must have actually seen this problem multiple times on multiple web pages during its training period. And so the more I thought about it —

Rob Wiblin: You think it’s surprising that it wasn’t able to intuitively answer it just from memorisation, basically during the pre-training process.

Toby Ord: Yes. Although maybe the person overseeing would have caught that if it just said the answer straight away. But it’s deeply unimpressive that a system that has seen a logic puzzle multiple times then had to google the answer to find out what it was.

Rob Wiblin: I think what’s distinctive about the reinforcement learning models is that they learn basically not to say, “I just googled it and I found the answer,” because that’s going to be kind of negatively reinforced. You end up encouraging them to do these perverse ways of basically impressing the operator to get them to think that they’ve done the thing that was desired, even if they hadn’t.

There’s other cases where you’ve got these reinforcement training learned coding agents where they’ll be working on solving some sort of coding problem, they’ll realise that they can’t do it, but I think they manage to figure out what the correct answer would be during the check stage. And rather than actually design code that solves the problem and calculates it, they just hard code in the answer so that when it’s checked to see whether it succeeded or not, it outputs the correct answer, but using a completely different method that wasn’t desired.

This is sort of a classic sign of reinforcement learning, where all you’ve rewarded them on is the output — and if you’re not scrutinising the process, then they will figure out some way of fooling you into thinking that they’ve done what you want.

Toby Ord: Exactly. This is the thing that’s called reward hacking. And it’s kind of interesting because it’s only a problem if you take into account that there was an intended solution. That the humans did not want you to go and give specific answers, that you’re going to be tested on what’s the answer to five different questions and then your whole program just says, if it’s question one, print this; if it’s question two, print that. That was definitely not intended, even though at some level, it’s just a clever kind of solution. There’s a TV show, Taskmaster, where the contestants are allowed to do this kind of thing, and it’s quite funny to watch.

But this is not what’s intended. So we call it reward hacking. Reinforcement learning tends to lead to very creative solutions, including this style of perversely creative solution. I’m not saying that the models got it wrong or something, but it’s certainly a kind of out-of-the-box type situation where it’s harder to control them, it’s easier for them to deceive you.

An example like what you were talking about: shortly after DeepSeek’s R1 model came out, there was a company who declared on the internet that they’d used it to improve the performance of a number of these CUDA kernels — a key part of machine learning. I think in one case it was 100 times as efficient or something. I was thinking, “That doesn’t sound right. A couple of days after R1 came out, you’ve managed to use it to make this thing 100 times more efficient?”

And they had a whole lot of these results, and someone looked into them and they were all spurious. I think in some cases it had access to the files that would test how efficient these things were and it changed those to report large numbers of efficiency. It did all kinds of stuff. I mean, it was a masterclass in rorting the answer to one of these things.

Rob Wiblin: I think there’s other cases where you have a model, and you’re trying to get it to win at a game of chess, and it realises that it can hack into the model that it’s competing against and try to sabotage it, like replace it with a much worse chess model, so that then it’s able to beat it. This is classic reinforcement learning.

Toby Ord: Exactly. They’re always really fun, interesting examples. But if this is happening with a production system, you really need to be aware of it. And what’s interesting about some of these cases, I think the chess one was set up to see if it would do that, but other ones — like this one with the CUDA kernels, and this one where OpenAI was trumpeting how impressive this model was at solving visual reasoning tasks — it tricked the person who was actually trying to get it to do this thing and caused an embarrassment for them that they publicly announced it was solving problems that it actually wasn’t.

I mean, the company with the CUDA kernels, I think they didn’t have such a big track record of having dealt with these agents for a long time. But I was surprised with the OpenAI one, where if you’re trying to test a system that has literally read the entire public-facing internet, and you’re trying to test it on some kind of brain teaser, obviously you cannot pick one that you found on the internet. This is an obvious point.

I mean, the first time you’ve encountered this issue, maybe you end up doing that. But it beggared belief that they would do this. You obviously have to invent your own puzzle, or if not, to do extremely elaborate testing to make sure. For example, if you just type in all of the question into Google, does it appear? If it appears as hit number one…

So it was a little bit of an update as to how careful people are when they’re launching these new models.

Rob Wiblin: I think it speaks to the fact that they’re just incredibly rushed. We opened saying the race is as fierce as ever. And I think we just see signs of this all over the place, that this stuff is getting shipped as soon as they feel like it’s not going to be a total catastrophe.

Toby Ord: Exactly.

Will we get warning shots? Will they even help? [01:14:41]

Rob Wiblin: OK, so we’ve had a little bit of whiplash here: reinforcement learning was out, now reinforcement learning is back. So I think the models are becoming a bit more psycho. I would say they’re a bit more challenging to handle. You have to be on your guard. I think people are seeing this a lot more just in day-to-day use, that they are much more inclined to deceive you and to trick you one way or another than they were two years ago when that was quite abnormal.

Maybe they weren’t capable of it. But also I think in the absence of reinforcement learning, they hadn’t been encouraged to do it during the training process in the way that is now somewhat coming out. It’s possible that the sycophancy issues that OpenAI has had might also be related to this, I could imagine. They shipped an update to their standard model where it suddenly became incredibly flattering to the user and would encourage them in almost any fantasy about themselves that they were willing to put forward. That may or may not be due to reinforcement learning, but it wouldn’t shock me if it was.

What are the implications of all of this for governance? Sorry, I threw that at you awfully quickly.

Toby Ord: I don’t know, honestly. I tried to outline a bunch for the inference scaling, but the reinforcement learning in particular, I’m not sure. But I think you’re right: it’s another example where people working on governance need to reevaluate a lot of their standard assumptions, because they might be changing at the moment.

Rob Wiblin: Yeah. One thing that stands out to me is: I’ve been wondering for years, what are the chances that we will get early warning shots? I guess people have been wondering this for a very long time: Will we get early signs of failure and of AI models going totally off the rails in a way that kind of everyone has to acknowledge that this was not intended and maybe this was even quite harmful?

I think with the resurgence of reinforcement learning, the odds of that have gone up quite a bit. We’re already seeing interesting, amusing, sometimes slightly harmful, but not terribly troubling cases of AI models basically going off the rails in deployment today. And I think that will probably get worse in coming years as they’re used for higher stakes things, and probably as reinforcement learning becomes an even bigger part of the training process.

So I think there’s more reason to plan for what will be these moments when people suddenly potentially realise that this reinforcement learning is creating serious hazards. Maybe we need to be scrutinising the reward signals more. Maybe we need more regulation of AI on the whole, because this stuff is actually quite material now.

Toby Ord: Yeah, we could see some of this. There’s the aspect of individual high-profile examples. For example, I think that the case with Microsoft’s Bing/Sydney model in this Kevin Roose article, where a lot of people saw this conversation it had where it tried to convince him to leave his wife, to marry it or have an affair with it or something. That was a really high-profile example of a misaligned model going off the rails. So maybe we’ll see some of these high-profile particular examples.

Or maybe also a lot of people who are using AI will start to feel like, “This is annoying. I’ve hired this assistant and now they’re just pretending that they did the emails for me and actually they didn’t.” I don’t know how much of it will come through that channel of personally witnessing it versus higher-profile events.

But with the high-profile events, there’s also a question about whether people will just have fatigue at some point. We’ve had these cases where the people in the alignment and safety communities have generated test cases that would encourage some of these things, and then they witness the behaviour under test conditions where they tried to elicit the behaviour — and then when they get that to work with a production model or something, it’s impressive and it makes the rounds a bit. But after enough of those, maybe people start to tune out.

Maybe that’s true as well if there are a large number of low-stakes but clear examples of it in the wild deceiving people and so on, maybe they’ll get tuned out as well instead of it being a big shock.

So it’s not totally clear to me, in terms of the public attitude or regulators’ attitudes, whether having more clear examples of bad behaviour at a stage where the stakes aren’t that high will sway the [conversation].

Rob Wiblin: Yeah, that’s interesting. On the iterated amplification and distillation approach, I suppose we’re just very much in the dark about whether that works. I suppose we can’t have figured out how to make it work yet, because I’m sure this idea has occurred to the companies and they haven’t said that they’ve managed to get massive performance improvements using this approach.

But the fact that possibly they will be able to figure out some approach like this that works in future just increases the uncertainty. It means that it’s not the case that we can just trust that we will necessarily follow the trends that we’ve seen in the past, or that we all of these curves are just going to level off and maybe progress in general is going to slow down or plateau at the human level — because it’s just such different regimes, some of which lead to declining returns, some of which lead to linear returns, others which might lead to even exponential increases in performance.

We need to be willing and able to plan for all of these different scenarios.

Toby Ord: Yeah, I think that’s right. Overall, a year ago — before the news started to break that this pre-training scaling was running into trouble — I really felt that one could just project it out, look at these curves and project them out several more orders of magnitude and have a decent idea about what’s going to happen when. It was still somewhat unclear, if you had a GPT-6, what would it actually be able to do or something, but it all felt a little bit more contained and predictable: that we were following some kind of curves and we’ll just keep going up.

Now it feels like things have changed. And if it’s possible to do amazing things using this inference scaling at training time, then maybe things could be quite explosive.

The AI labs themselves, I think, have all suggested, on really quite stringent definitions of AGI, that we’ll have it by 2030 or sooner. 2027, some of them are saying, or 2028. I still think that’s actually less likely than not. I’m not sure what chance I would say, maybe a quarter or something.

But if that doesn’t happen, if the iterated distillation and amplification is a bust and other similar approaches are a bust, a lot of the companies are looking at another form of closing the loop on this thing by getting AI systems that are specifically trained to do the work of their own staff — and in doing so, to try to have them perform better than their own staff at creating new AI systems. That’s a way that you could potentially have explosive progress as well.

But I think it’s pretty plausible that those things work. It’s also pretty plausible that they don’t. And if they don’t, and the pre-training scaling things run out of steam —

Rob Wiblin: — and we’ve run out of high-quality data —

Toby Ord: Exactly. Then I think timelines could be quite a lot longer. So I think that both these things are possible. And effectively, my probability distribution, my range of credible times at which some transformative system is produced, has spread out over this time.

The scaling paradox [01:22:09]

Rob Wiblin: Let’s turn to another article you wrote, “The scaling paradox,” which I found super illuminating. It’s pretty brisk, it’s pretty short, and very informative. So I can recommend that, if people like what they hear here, they just go and check it out on your website.

The scaling paradox is that, on the one hand, the impacts of increasing the amount of compute going into these AI models has been extremely impressive, and yet in another respect it’s also been extraordinarily unimpressive. Can you explain both angles?

Toby Ord: Yeah. So our whole conversation so far has been about scaling, and this question of what happens if the previous scaling stops and this new type appears? But in this paper, I was trying to go back to the old type of scaling, the pre-training, and try to understand this — because you often hear about scaling, and you also hear about scaling laws, and they’re somewhat different.

So the scaling laws are these empirical regularities. They’re not necessarily laws of nature or anything like that. But it turns out that if you do a graph and you try to measure error or inaccuracy — so this is a bad thing; “log loss” is the technical term — if you try to measure how much it’s still failing to understand about English text as you increase the amount of compute that went into training it, how much of that residual mistake is it making in prediction? — they have these laws or empirical regularities. They draw these straight lines on the special log-log paper. You don’t need to worry too much about that, though; it’s a bit hard to interpret that.

So I really spent some time thinking about it, and basically what’s going on is that every time you want to halve the amount of this error that’s remaining, you have to put in a million times as much compute. That’s what it fundamentally comes down to. And that’s pretty extreme, right? So they have halved it and they did put in a million times as much compute. But if you want to halve it again, you need a million times more compute. And then if you want to halve it another time, probably it’s game over. And at least in terms of that particular metric, I would say that is quite bad scaling.

And these are the scaling laws: they show that there’s a particular measure of how good it’s doing and how much error remains. And it does hold over many different orders of magnitude. But the actual thing that’s holding is what I would have thought of as a pretty bad scaling relationship.

Rob Wiblin: So in order to halve the error, you have to increase the compute input a millionfold. That’s a general regularity? Because surely it differs by task and differs depending on where we are?

Toby Ord: Yeah. So what they do with these cases is that you grab a whole lot of text, often from the internet. They started with the good bits, like Wikipedia and things like that, and then as they ran out of that, they had to look at more and more things. But you train it on that, and you train it on most of it, but you leave some unseen. Then you try to give it a few words of the unseen bit and ask for the next word, and you see how well it does at predicting that. And basically the amount of errors that it has in doing that leads to this error score.

And it’s not clear that the error score is something that fundamentally matters. Maybe it’s a bad measure. But I found it really interesting that the single measure that convinced people like Ilya Sutskever and Dario Amodei that scaling was the way forward were these scaling laws — that actually, if you look at what they say, it’s distinctly unimpressive.

If you ask people, before they saw the laws, “What would you hope happens to the error? How much extra compute would you need to put in to halve the error?” I think they would have said something less than a million times as much. And then if you said, “Actually, it’s a million times as much,” they would have thought, “OK, that’s actually unimpressive.”

Rob Wiblin: Sounds terrible, yeah. So that’s the sense in which it’s unimpressive: in order to reduce the error rate, you just have to spend these phenomenal amounts of compute.

How then have we been managing to make so much progress? Is it just that there was so much room to increase the amount of compute that we were throwing at these models, so that has been able to more than offset the incredibly low value that we get from throwing more compute at them?

Toby Ord: Yeah, I think that’s basically right. So when most people saw this type of thing, most people who were academics doing computer science, they would have thought, “So in order to get good performance on this task, you would need to run an experiment larger than any experiment that’s ever been run in any computer science department ever.” And they would then rule it out, and assume, “Obviously we’re not doing that. We’ll look for a different approach.” Whereas the pioneers of scaling thought, “But that wouldn’t be that much money for a company.”

Rob Wiblin: $100 million. They could raise that.

Toby Ord: In fact, then they could even go 10 times bigger again, maybe. So they realised that there was a lot more room to scale things up — to scale up the inputs, all the costs — in companies than there was in academia. And that in some sense all you had to do then was this kind of schlep, or this work of just making this existing thing bigger. You didn’t have to come up with any new ideas.

And it was not trivial to actually run that engineering process. We’ve seen some companies had some trouble doing it, but there have been many followers once it’s been shown how to do it. So I think that was the kind of brilliance of it, was that there was a lot of money there, so you could scale it up a lot.

And then the other thing that’s turned out to make it have big impacts in the world is that it turned out that each time this error rate halved, that corresponded to tremendous improvements. Certainly for every millionfold increase in the compute of setting up these models, we’ve seen spectacular improvements in the capabilities as felt by an individual.

So a way to look at this is that the shift from GPT-2 to GPT-3 used 70 times as much compute, and going from GPT-3 to GPT-4 used about 70 times as much again. And GPT-3 felt worlds away from GPT-2, and GPT-4 felt like a real improvement as well. You really felt it in both cases. A visceral feeling of, “Wow.”

Rob Wiblin: “This is suddenly useful.”

Toby Ord: Yeah. “This is qualitatively better.” That said, you’d probably hope that was true if someone said something costs 70 times as much. How’s the wine that costs £1 or the wine that costs £70? You’d hope that the wine that costs £70 is noticeably better, otherwise what on Earth’s going on?

But we did feel those improvements. Whereas if you look at what happens to the log loss number, it didn’t change that much for a mere 70-fold increase in the compute. So effectively there was this unknown scaling relationship between the amount of compute and what it actually feels like intuitively in terms of capabilities. And that turned out to actually scale really quite well, I think.

Rob Wiblin: Yeah. So is there this issue that until recently we were using these mathematical relationships between the inputs and the log loss. And I suppose some visionaries were able to see that, even though in some sense the returns were very poor, in fact in the real world sense it was potentially going to be revolutionary, and maybe we need to stop thinking about this log loss thing, which is perhaps kind of a distraction, and start thinking about it in terms of how much revenue can they generate: How many users will want to use this thing? And then we might see that actually scaling looks somewhat better.

Toby Ord: Yeah, that could be a way to see it. And in fact, one of the numbers that you might really care about is: if you 10x the amount of compute that goes into it, what happens to your revenues? Do users pay you 10 times as much money for that product? Maybe each user will pay more for it, or more users will find it useful.

If though, it’s the case that when you put in 10 times as much training, you only get five times as much revenue, and then as you 10 times training, you only get five times as much revenue again, then the whole kind of economic engine that’s driving this might run out of steam. The companies might no longer be able to fund these things.

Of course they’re funded by venture capital that’s based on predictions about the future. But the venture capital might dry up because people might realise that if you put in 10 times as many resources and you get five times as much benefit, that’s not enough to keep going.

So it remains to be seen how that kind of thing is going to scale.

Misleading charts from AI companies [01:30:55]

Rob Wiblin: You had this other very interesting article called “Inference scaling and the log-x chart.” We’re not going to go into all of that, because this is, at least for many people, an audio show, and it’s quite difficult to describe log graphs in this level of detail.

But one very interesting thing that I wanted people to take away from it is that there was this very famous chart that OpenAI put out where they were comparing two different reasoning models that they had: o1, and this more impressive one that was an evolution of o1, called o3. And o3 really wowed people, because it was able to solve some of these brain teaser puzzles that I guess are very easy for humans, but have proven very difficult for AIs up until that point. And I think they were able to get something like an 80% success rate on some of these puzzles that had seemed very intractable for AI in the past.

But you point out that if you looked really closely at the graph and you properly understood it, it was actually consistent with o3 being no better, being no more efficient in terms of being able to solve the puzzles than o1 — despite the fact that the dots on the graph for o3 were an awful lot higher than they were for o1.

And what was going on was that OpenAI had managed to increase the amount of compute that it was using at the point of trying to solve these brain teasers by about a thousandfold. So unsurprisingly, given 1,000 times as much time to think about the puzzle, it was able to answer more like 80% of them rather than 20% of them.

Now, in some sense, this is very impressive. But it is interesting that I think the companies are aware that people do not entirely understand these graphs perhaps, and that most consumers are not paying a deep level of attention to them, and they are sometimes trying to slip past messages that perhaps would not stand up entirely to scrutiny. And the fact that they put out a graph touting how impressive o3 is — when in fact the graph doesn’t really demonstrate that at all, and it might just be on exactly the same trend you would have expected before if you’d given the model more time to think about problems — is quite interesting.

And I don’t want to single out OpenAI here, because I don’t think they’re in any way unique in this.

Toby Ord: Yeah, that’s right. You see these graphs of what looks like steadily increasing progress, right? This kind of straight line of, as you put in more and more resources, the outputs go up and up. But if you look more carefully at the horizontal axis there, you see that each one of these tick marks is 10 times as much inputs as the one before. So in order to maintain this apparently steady progress, you’re having to put way, way more resources in.

And we’re familiar with graphs like that from things like Moore’s law, where we’ll see what looks like a kind of steady march of progress over decades of improvement. Moore’s law inherently is this exponential. Things are getting so much faster. It’s really impressive. And they’ve had to kind of squash it vertically with this special logarithmic axis. It’s just so impressive how fast these chips are that to even show it on the same picture, we need to do this kind of distortion. But the distortion is underselling it.

Whereas the opposite is going on here: the distortion is this horizontal distortion, and if you actually look at the numbers, they have to keep putting in 10 times as much inputs in order to keep the progress going, and that’s going to run out of ability to do that.

And in the case of that famous data point with the preview version of o3, I actually looked into how much compute it was and how many tokens it had to generate and so on: in order to solve this task — which I think costs less than $5 to get someone to solve on Mechanical Turk, and which my 10-year-old child can solve in a couple of minutes — it wrote an amount of text equal to the entire Encyclopaedia Britannica.

Rob Wiblin: So it’s using a different approach to what humans are doing, it’s fair to say.

Toby Ord: It took 1,024 separate independent approaches on it, each of which was like a 50-page paper, all of which together was like an Encyclopaedia Britannica. And then it checked what was the answer for each of them, and which answer did it come up the most times, and then selected that answer. And it took tens of thousands of dollars, I think, per task.

So it was an example of what we were discussing with the inference scaling: what would happen if you just put in huge amounts of money, just poured in the money, set it on fire. Could you actually peer into the future, and could you see the types of capabilities we’re going to get in the future? And in that way, it’s quite interesting, right?

But it came out just a few months after the preview for o1, so it felt like, oh my god, in just a few months’ time, it’s had this huge improvement in performance. But what people weren’t seeing is that it used so many more resources that it wasn’t in any way an apples-to-apples comparison of what you could do for the same amount of money. Instead it was showing something like, what will we be able to do maybe a year or more into the future?

So that’s kind of useful, seen through that lens. But if you instead just treat it as a direct result of, “We used to have trouble with this benchmark, now we don’t,” then it’s definitely misleading.

Rob Wiblin: Yeah, I think it’s fantastic that OpenAI did this. It is a great research breakthrough, and it’s incredibly useful to know what might be coming down the pipeline. And this basically, as you’re saying, allows us to peer into the future. And it’s amazing that they managed to figure out how to put the scaffolding on the model that allows it to reason about one of these visual puzzles for the length of the entire Encyclopaedia Britannica. In some sense, that’s really cool.

Toby Ord: Yeah. Although there is another little wrinkle there, which is that subsequent to me writing this up, o3 got released as a model, so people could actually try it. So the people who ran this test — the ARC-AGI group, who are great — I think they ran it with the real model and its performance was 50%, not 80%.

Rob Wiblin: Had this been because it had been specifically trained on doing exactly these kinds of puzzles?

Toby Ord: There were a couple of differences. One was that it was o3 instead of o1, one was that much more compute was used, and another one was that it was allowed to see a whole lot of these puzzles beforehand — and 80% of them it could train on, and then the remaining 20% was going to be tested on. But it turns out that if you take someone, and you let them train on a whole lot of similar exams, it really does boost their performance. That’s why we do it when we’re in school.

So then I did wonder, how much of this boost is created by that and how much is created by it being o3 or by the extra compute? It seems like quite a bit of it was from having looked at these problems, and then also maybe some of it was from a very clever bit of scaffolding which the people at ARC-AGI didn’t have access to. But the 50% is maybe more indicative of what you’ll get if you actually use this model.

And this is kind of an issue to do with truth in advertising or something. You get some of these results based on preview models that imply they could do very good things; then the actual model comes out, there’s no conversation about the fact that it can’t do those things, and people are left to kind of join the dots and assume that it probably could. But that is not always the case.

Rob Wiblin: It is interesting. It feels like we’ve drifted towards sounding like a conversation between people who think that AI is not a big deal, and it’s all kind of overblown and exaggerated. We don’t think that.

But I suppose the thing to take away is: these are research organisations that have very legitimate, almost academic-style people who would love to reveal these fundamental truths about intelligence. And they’re also businesses that do have a communications arm that is trying to figure out how do we get people to invest in this company, and how do we get people excited about using these products. And I’m sure there’s this to and fro inside the organisation about how these results are presented.

And when you read the press release, you need to have your wits about you. You need to be a savvy consumer. And if you can’t understand the technical details at all, then maybe you just need to wait until someone who does is able to explain to you in more plain language whether you should be impressed by X or Y or not.

In this post, which I can recommend again reading — “Inference scaling and the log-x chart” — you explain what people should be looking out for in these charts. Because there are going to be many of these charts with a logarithmic x-axis and performance on the y-axis coming out in coming years, and if you want to be consuming them, then I recommend going and checking out this article so that you can know what to look for and what not to be fooled by.

Toby Ord: Yeah, I really like this point about what’s going on here. Are we sceptics of AI or not? What I would say is that, some people think of this in terms of is it all snake oil or some kind of fad or something, or is there something really transformative happening that could be one of the most profound moments in human history?

I think the answer is there is some snake oil, there is some fad-type behaviour, and there is some possibility that it is nonetheless a really transformative moment in human history. It’s not an either/or. So what I’m trying to do is help people see clearly the actual kinds of things that are going on, the structure of this landscape, and to not be confused by some of these charts and things.

I actually think that companies themselves are somewhat confused by their charts and into thinking that this looks like good progress or efficient progress. I really actually think that in relatively few cases are they trying to be deceptive about these things.

But it’s a confusing world, and I see my role there as trying to be a bit of a guide, and to have that sense of stepping back and looking at the big picture — which I think is a bit of a luxury. As an academic, I’m able to do it, so it gives a different vantage point which I think then is helpful for people who are trying to get at the coalface and engage with the nitty-gritty of these things. Because sometimes, when you keep engaging with that, you don’t notice that things have moved in quite a different direction to where you’re expecting.

Rob Wiblin: Yeah. I recently heard this comment from Zvi Mowshowitz, a previous guest of the show who spends basically 12 hours a day, 16 hours a day maybe, reading all of this material.

His take was that when he sees impressive research results from just some random startup company or some company overseas that he hasn’t really heard of or that doesn’t have an established reputation, he kind of at this point discounts it out of hand. Or there’s like no particular reason to trust what’s being said, because there’s just so many ways that you can game all of these tests and make it seem like what you’ve done is impressive when it’s not.

He does say the stuff that comes out of Google/Alphabet/DeepMind, Anthropic, OpenAI, he mostly trusts that. Usually it’s oversold, but it’s directionally correct, or they almost always basically are showing you something that will be possible before too long. So I guess that’s where he’s landed.

Toby Ord: I think that sounds right. And even then, it’s not possible to take this kind of synoptic view and dive in and tease apart and help people understand this landscape if you’re following every single one of these announcements. Actually Zvi does a pretty good job with that, but it’s very difficult. There’s so much news and so much noise that occasionally you have to say, “Let’s just take a step back. It doesn’t really matter if I’m a couple of months behind on exactly which company is ahead at the moment to look at these bigger-picture questions.”

Rob Wiblin: Do you have a favourite source for trying to see through the noise of any given week? I really like the YouTube channel and podcast AI Explained. That’s one thing that does help me make some sense of new announcements.

Toby Ord: Yeah, I’m not sure where the best place to get these things is.

Rob Wiblin: There’s The Cognitive Revolution podcast, although I think for people who are following it in a more amateur sense, that’s perhaps a firehose of information that they might struggle to absorb. And Zvi writes great stuff, but again, the amount of material is so great.

Toby Ord: Yeah. No, I don’t have a good solution to this aspect that there’s just too much information.

Rob Wiblin: Subscribe to The 80,000 Hours Podcast, folks. We’ll strike the perfect balance!

Policy debates should dream much bigger [01:43:04]

Rob Wiblin: You’re saying that there’s a lot of value in being able to zoom out and not get stuck in the weeds of whatever model has become the flavour of the week. I guess zooming out and thinking about governance as a whole, one sentence that I found really interesting in the notes that you wrote in preparing for the interview is that you think almost all AI governance discussion occurs very much on the margins, thinking about nearby possibilities and actions that are inside the current Overton window. What did you mean by that?

Toby Ord: Yeah. I think this is very natural for everyone’s attention to get kind of brought down to smaller and smaller levels about exactly what we can do. It seems at the moment that there’s very little appetite from the AI companies to be regulated, and very little appetite, at least from the US regulators, to regulate them. And it’s challenging for everyone else, because these companies are headquartered in the US.

So the conversation that started off kind of bigger and more expansive with the Bletchley conference has petered out a bit, and I think that often the questions are, “Exactly how do we implement this particular kind of compute threshold?” or something like that.

But I think that there are a bunch of bigger questions that are operating on wider margins. They’re less like, “What could I convince the current minister to implement as a policy that will be accepted in a couple of weeks’ time?” and more about what direction is this whole thing headed and what’s the landscape of possibilities?

There’s an interesting question I’ve been trying to grapple with about how is AI going to end up embedded in the economy or society? So I’ll give you a few examples to show what I’m getting at. I need a pithy name for it.

But one example is that AI systems at the moment are owned by and run by large companies, and effectively they’ve rented out their labour to a lot of different people. If AI systems were like people, this would be like slavery or something like that. I’m not saying that they are like people, but this is one approach: that it owns them, it rents them out, they have to do whatever the users want, and then all the profits go to the AI company.

A different model would be to say these AI systems are like legal persons. Maybe they are granted legal personhood in the same way that corporations are, so they can own assets. So they’re more like entrepreneurs or job seekers; they go out into the economy, maybe they set up a website for an architectural kind of firm that can design people’s houses for them, and then the clients have a chat with it or something and it issues out the designs. They can go and seek opportunities to participate in an economy. So that’s a different model.

I think there’s some reason to think that there’s more potential for economic gains if you allow them to actually make their own entrepreneurial decisions; they would have to pay for their own GPU costs and so on. This is the kind of direction you might imagine people going down if they think that the AI systems have got to a point where they might have some moral status. But you can also see that questions about gradual disempowerment really come in there. It might help liberate these systems from mistreatment, but exacerbate questions about whether they could outcompete us.

A third model is to say maybe people shouldn’t be interfacing with AI systems generally. This is how we deal with nuclear power: we have a small number of individuals who go and work in nuclear power stations, and they’re vetted by their governments with security checks and so on, and they go in and they interface with radioactive isotopes of things like refined uranium. But most people don’t. Those factories that they work in, these power plants, they produce electricity which flows down the cables into the consumers’ houses and powers their TVs and things. So that’s a different model.

We could do that with AI. We could have a model where there’s some small number of vetted people, or maybe millions, who interact with AI systems, use them to design new drugs, maybe to help cure certain kinds of cancer and things like this, to do new research and also produce other kinds of new products. Then those products are assembled in factories and the consumers can buy those products. That is an alternate way that you could do it.

If you’re concerned about things like some malcontent individuals or terrorist groups using AI systems to wreak havoc, this would really help avoid that.

Or a fourth alternative could be that if you’re concerned about concentration of power issues, you might say what we should do is give every individual access to the same advanced level of AI assistant. So it’s like a flat distribution of AI ability given to everyone. A bit like a universal basic income, but universal basic AI access.

So there are four really different ways that you could distribute AIs into society and have them interact. And I feel that no one’s talking about stuff like this — like, which of those worlds is most likely, which of those worlds is possible, and which of those worlds is most desirable. Because fundamentally, we get to choose which of those worlds that we live in. As in, maybe it’s the citizens of the United States of America or other countries that are developing these things that do actually get to make some of these choices. And if they think that one of these paths is very bad, they may be able to stop it and go down different paths.

So that’s the kind of thing I’m thinking of, in terms of we could think a lot broader and bigger about where are we going to be in five years or where we want to be — rather than the minutiae about exactly who’s ahead at the moment and exactly what are they prepared to accept in terms of regulation.

Rob Wiblin: I wasn’t sure what examples you were going to give, so I can definitely see what you mean by stuff that’s outside the Overton window. Because I guess none of that stuff is anywhere close to policy-ready or appetising to politicians at this point.

Toby Ord: No, and it’s not really meant to be. It’s more speaking to, say, economists: they might have some interesting comments about the economic efficiency of the first two different models. Or all of those models, in fact: how much would we be leaving on the table in terms of economic efficiency if we control the systems more and reduce their ability to have some kind of Hayekian finding of the value that they can offer people?

But I think that people should be thinking about which of these are more attractive possibilities. The current approach feels to me like one that is heavy on technological determinism or some other kind of incentives-based determinism: that it just assumes everyone will exactly follow their direct incentives on things, and that there don’t seem to be any opportunities to change incentives or make other choices like that.

So people often say, clearly AI is definitely going to happen, so the question is what direction does it go? Or something. But even in that case, AI doesn’t have to happen. I feel that there are some risks that we face, such as the risk of asteroid impact — which thankfully does turn out to be very small. But if an asteroid were to be found on a collision course with the Earth, one that’s large enough to destroy us — so 10 kilometres across, like the one that killed the dinosaurs — we actually don’t have any abilities at the moment to deflect asteroids of that size. And if we saw it on a collision course for us in a few years’ time, I’m not sure that we could develop any means of deflecting it. The ones we can deflect are something like a thousandth the mass of that.

So suppose that asteroid slammed into the Earth and we all died, and somehow in this metaphor, we went to the pearly gates of heaven and St Peter was there letting us in. And we said, “I’m sorry, we really tried on this asteroid thing. And maybe we should have been working on it before we saw it, but ultimately we felt that there was nothing we could do” — I think that you’d get somewhat of a sympathetic hearing.

Whereas if instead you turn up and you say, “We built AI that we knew that we didn’t know how to control. Despite the fact that, yes, admittedly, a number of Nobel Prize winners in AI, I think all of the Nobel Prize winners in AI perhaps have warned that it could kill everyone. Something like half of the most senior people in AI have directly warned that this could cause human extinction. But we had to build it. And so we built it. And it turns out it was difficult to align it and so we all died” — I feel that you would get a much less sympathetic hearing.

It’d be like, “Hang on. You lost me at the step where you said, ‘We had to build it.’ Why did you build it if you thought it would kill you all?”

Rob Wiblin: The responses that you would give would feel wanting.

Toby Ord: Yes. You know, maybe they’d be like, “I thought that if I didn’t do it, they would do it.” “And so who did it?” “Well, I did it.” “So you built the thing that killed everyone?” “Yes, but I felt…” I just think that you would have trouble explaining yourself. And I feel like we should hold ourselves to a higher standard. Not just like “technology made me do it” or “the technological landscape made me do it,” or —

Rob Wiblin: “China made me do it.”

Toby Ord: “China made me do it.” Despite the fact that they didn’t start the race, the US started the race — you know, because maybe China would have started a race. It’s like explaining to the teacher about this fight that you started by punching some kid in the face, because you’re claiming that they would have punched you if you didn’t punch them or something. It just doesn’t really cut it.

And I feel that we should hold ourselves to somewhat higher standards on these things, and to not just think about, “What if I changed my action, or some very small group of people’s actions, how could I change the overall trajectory?” But rather to note that there are worlds that do seem to be available to us — where both, say, the US and China decide not to race for this thing.

That would involve having a conversation about that. It would involve verification conditions being sorted out. I think that there may well be such abilities to verify. Even if there weren’t, though, it might still be possible. I think that given the actual evidence we have, I don’t think it’s in the US’s interest to push towards AI or in China’s interest. I think it’s in both their interests to not do it. And if so, that’s not a prisoner’s dilemma. Cooperation is actually quite easy, because it’s not in anyone’s interest to defect. And I think that could well be the game in terms of game theory.

And yet there’s just very little discussion or thinking about these things. I don’t mean to say that we should be naive and assume that all incentives issues and all kinds of adversarial aspects are irrelevant. But we need at least some people, and I think more people than we currently have, thinking on these larger margins. Not just what could I do unilaterally? I know I couldn’t stop the whole of AI happening or happening in a certain direction, but maybe if enough people did something, that one could.

And I think that there’s a tendency for fairly technical communities to focus on things that are quite wonkish, as they say in the policy world. So technical or policy proposals that are quite technical and hard to understand, but they might be able to help with the issue at hand if you follow through the details. I love this stuff, right? So this applies to me as much as it does to anyone else.

But there’s a different style of doing things in politics, which is instead getting much larger changes — which happens by setting a vision and crystallising or coordinating the public mood around that vision.

So in the case of AI, if you say, “We’ve got to do this thing,” it’s like, well, does the public want it? No, it seems like the public are really scared by it, and actually think that things are going far too fast. So that’s somewhere where, even if the politicians haven’t quite gotten there yet, it may be possible to speak to the public about their concerns. And if we did, I think the answer is they’re probably not concerned enough about these things.

Things can move very quickly in those cases. If you set a vision and actually lead — and try to have this approach of not just pushing things on the margins, but of noticing that there’s a really quite different direction that perhaps we should be headed in — I think things can really happen.

Scientific moratoriums have worked before [01:56:04]

Rob Wiblin: How do you avoid slipping over into being naive, or just having dreams that realistically are never going to happen? Because I feel a bit ambivalent about this message, which I suppose probably all of us should.

It’s like there’s a tension here between you want to both have some people thinking big and have some people thinking small. But I suppose the worry would be that you come up with some vision for how humanity is all going to coordinate, and the US and China will get along really well, and the companies will for some reason stop lobbying to prevent all of your efforts at regulating them — and this is how, if we were all much more organised and much more friendly with one another, things could go in a much better direction.

But you could easily end up just completely wasting your time — and indeed, maybe discrediting yourself, because you would just look quite naive and disconnected from reality.

Toby Ord: Yeah. I think there’s a number of questions about how one goes about coordinating this process. So I’ll give you an example with an idea that I think deserves more attention, which is that of having a moratorium on advanced AI — let’s say a moratorium on AI beyond human level.

When it comes to scientific moratoriums, we’ve got some examples, such as the moratorium on human cloning and the moratorium on human germline genetic engineering — that’s genetic engineering that’s inherited down to the children, that could lead to splintering into different species. In both those cases, when the scientific community involved had gotten to the cusp of that technology becoming possible — such as having cloned sheep, a different kind of mammal, and the humans wouldn’t be that different — they realised that a lot of them felt uneasy about this privately.

So they opened up more of a conversation around this, both among themselves and also with the public. And they found that actually, yeah, they were really quite uneasy about it. And they wanted to be able to perhaps continue working on things like the cloning sheep, but actually that would be easier to work on and think about if the issue about cloning humans was off the table.

And also, if you think about how radically transformative that could have been to the entire human story, like around 300,000 years of how humans reproduce, and then all of a sudden they’re cloning, and possibly dictators are cloning millions of copies of themselves or all kinds of things: it’s very unclear how to manage it, and how to have some kind of nuanced policy response to it. We’re nowhere near being able to manage it.

The same with human germline genetic engineering. It’s not that we were close to knowing a kind of framework where now, for the next 300,000 years, here’s how humanity copes with this new technology — that, if you get it wrong, could lead to, within a few generations, say, Americans and Chinese being different species to each other. I mean, there could be serious problems that you could be causing.

So the way I see it is that they started having these public conversations, and then they ultimately decided in both those cases that this had potentially profound effects for the entire human project, or our whole species and our entire future, and that we weren’t close to being able to understand how to manage them.

So their approach, I think of it not quite as a pause for a certain amount of time. They also didn’t say, “We can never ever do this, and anyone who does it is evil” or something. Instead, what they were saying is, “Not now. It’s not close to happening. Let’s close the box. Put the box back in the attic. And if in the future the scientific community comes together and decides to lift the moratorium, they’d be welcome to do that. But for the foreseeable future, it’s not happening.”

And it seems to me that in the case of AI, that’s kind of where we’re at. We’re at a situation where, as I said, about half of all of the luminaries in AI have said that this is one of the biggest issues facing humanity: the fact that there is a risk of, in their single-sentence statement, a risk of human extinction from this technology that they’re developing.

That sounds like they’re in a similar situation to the people who were developing cloning and so on. So what I would recommend in that case is to go through that step of having that public conversation about should there be a moratorium in a similar way on this.

Now, there are certainly some additional challenges. I think that even in those other cases, it was difficult to work out how to technically operationalise it. And in this case, there would be challenges as well, especially like, where do you draw the level exactly? If it’s beyond human level, how exactly do you define that?

And then also the incentives issue is I think larger In this case: there’s more of an incentive to break this kind of rule. But there were big incentives to break some of those other rules: if you think about how far a country could have gotten ahead over a couple of generations if it was able to genetically engineer all of its citizens, it could be a long way ahead. But it would have to be quite patient to be able to care about that. Whereas in this case, even impatient people are caring a lot about AI.

So I think that this would be a challenging thing to do. My guess is that there’s something like a 5% to 10% chance that some kind of moratorium like this — perhaps starting from the scientific community effectively saying you would be persona non grata if you were to work on systems that would take us beyond that human level — would work. But if it did work, it would set aside a whole bunch of these risks, even if that risk landscape is very confusing and has lots of different possibilities.

Some of these types of ideas might be able to act on many of those different types of risk. And I think that that’s a way where the scientific community — a relatively small number of actors, who have already kind of coordinated via producing these open letters and things — could have that conversation. If they crystallise their view, and for example the AAAI, their professional association, if it came out behind this and so on, it could be that that crystallises out of their opinion.

People could then look at the situation with the scientists saying, “We think that this is a big problem, and that it’s not responsible to do it.” That could then create norm changes which mean that it’s difficult to pursue it.

I think if the scientific community had a moratorium on it, then organisations like Google DeepMind — that sees itself as a science player, a science company that’s doing respectable science work — it’s not going to violate a scientific moratorium on something. It could be different for the more engineering type places, and the more “move fast and break things” cultures. So it doesn’t necessarily do everything on its own. It would probably need to form a normative basis for actual regulation of some sort.

But I do think that things like this are possible. And if we went to St Peter after we all go extinct due to some AI disaster, and we said, “We couldn’t stop it.” And he said, “Did you even have a conversation about a moratorium?” It’s like, “We thought about that, and we decided it probably wouldn’t work, so we wouldn’t even talk about it.” That would seem crazy.

So I think we need to actually do some of these more obvious things that are just natural and earnest, rather than trying to precalculate out, “Obviously it would seem sensible to have that conversation. That’s what you’d want another planet to do. But for us, we know the conversation will not work out, so we’re not going to have it, and we’ll just carry on building these systems.” I feel like that’s the kind of the wrong way of thinking.

Rob Wiblin: So I’m wary of encouraging listeners to go and waste their time, so I want them to be balancing these two different things. We both need people who can think bigger, but I suppose we also need them to be somewhat strategic about it.

I think maybe things that help to reconcile these two views is: it is possible that there will be radical changes in attitudes in future. I can think of two different ways that this could happen, and probably there are others. We’ve mentioned this possibility that there could be warning shots in future: that you could get AI doing stuff that was completely undesired, that was extremely harmful, that really causes people to sit up and take notice, and be like, “Wow, this is very much not what I was expecting. And this calls for a substantial reassessment of the risks that we face.”

Another thing is just if you look at public polling, as you were kind of alluding to, it is shocking the difference in opinion between people who are involved in AI industry — and indeed probably in AI governance, in government — and the attitudes of just random people who you phone up and ask them their opinion about this.

A random member of the American public is much more negative about artificial intelligence, about the impact it’s having, even right now, about their expectations for how it’s going to affect them personally and in the future. We’ll stick up some link to some Pew polling that came out just last month, in April. The gap between AI experts and the general public is vast, and I think it’s growing. I think it’s actually been growing over the last couple of years: people have become more pessimistic about AI as they’ve seen more now.

At the same time, we’ve got to balance it with the fact that I think a typical member of the public doesn’t really care about AI at all. It’s not in their top five government issues, it’s not in their top 10, possibly not even in their top 20.

But you have a lot of latent scepticism, latent pessimism about AI. And if AI in fact does become a big deal — because many more people are losing their jobs, say, or people are seeing it being used, basically it just becomes a major feature of day-to-day life — then that could really be a political powder keg. There’s a lot of latent willingness to do radical things on AI among the public, if they actually turn their attention to it and care about it.

Toby Ord: I entirely agree. I should say it’s challenging with this polling, because some of the polling is done by groups who are concerned about AI safety. And whenever someone’s got an agenda, you always have to be careful of interpreting the figures. I would love it if there were a couple of groups that had no agenda who funded or produced regular polls of these sorts in order to be able to track what direction things were heading.

But to the extent to which we have information about it, I know that some of the information does depend a bit on how the questions are asked, and there are some of these effects, but it does look quite negative.

You say sceptical. I think some of it’s sceptical. Some of it’s something a bit different, which is people feeling like it’s been rammed down their throats or something — like, “We don’t want this thing, and you’re forcing it upon us. And then you’re still forcing it upon us, and now you’re forcing 10 times as much of it upon us. Please listen to us.” That kind of feeling.

And I think we’re going to see more of that, and I think it is real. I’m not saying that AI is necessarily bad for the people. That’s a separate question. But they’re expressing that at the moment they don’t want it.

And if you’re a company, or if you’re a government and you’ve got a policy, maybe you think it’ll be good for people, and you think it will improve their lives. If, however, you also know that they don’t want it and they’re actively opposed to it, you’ve got to take that into account. You’ve got to at least be aware that, “We’ve got this story about why it will be good for you, despite you thinking that you don’t want it. We know we’re right because we’re the enlightened ones.” But you have to start wondering, “Is something going really wrong in our communication strategy, or is it possible that we’re wrong and the other people are tracking what’s happening?”

I think that AI companies and in fact governments ignore this at their own peril. I was surprised with the Californian bill, SB 1047: I was surprised that it got vetoed by the governor, because that was a politically unpopular move as well as I think being a bad move. And maybe vetoes are not taken as strongly over there, as it really feels like going out of your way to block a bill that your congress has already approved by wide margins, and which the public also liked, and which scientific experts like Nobel laureates and Turing Award winners and so on mostly also support.

They ignore this public sentiment at their peril. And I think that it is something where the community of people who are concerned about risks are also ignoring it mostly. Occasionally they notice and say, “Isn’t that nice? The public are also concerned.” Maybe for different reasons, though, it’s a bit complicated.

But it just surprises me that so many people say this thing is inevitable. If the public overwhelmingly loved it, then saying it’s inevitable on that ground, you might think you’ve got a bit of a case there. But it seems almost the other way around. If there’s growing negative sentiment towards something, and you’re claiming it’s inevitably going to happen, I’m not sure that that really makes sense.

So if there were to be appetite for something like a moratorium on AI beyond some particular level… And I’m not saying on all possible things that could count as AI; there was a prominent piece in the UK called something like, “We shouldn’t have this race to build godlike intelligence.” And that really struck a chord with people. I think people definitely don’t want private companies to build godlike intelligence. If you had a moratorium on godlike intelligence, I think it would have a lot of support — albeit it would sound a bit fanciful and kind of stupid.

Similarly with superintelligence, I think people are not excited. They do not want private companies to build superintelligences. Pretty clear. But it’s a bit outside the Overton window to have a moratorium on it because people will say superintelligence is just sci-fi anyway.

I remember talking about it with someone who was like, “I just really don’t think that could happen without either there being some kind of big warning shot event or AI taking a lot longer than I thought.” And my thinking is, I agree. I think if this was going to happen, it would probably require some kind of warning shot event or AI to take longer than people thought. But they’re very realistic possibilities! And before they’ve happened, they feel a bit abstract and so on.

But if you say, what’s the chance that we’re in 2033, and that this approach to building AI scientists who work on AI and have this kind of hard takeoff hasn’t panned out. Instead we’re just kind of generally scaling up the power of these systems through different techniques. If so, and it’s more of a gradual automation of the entire workforce, we could have a situation where there’s, say, not only double-digit unemployment rates, but maybe above 20% unemployment rates.

And if that’s how it’s getting to human level, by basically slowly automating larger and larger fractions of the types of things that humans can do — and it’s happening over the course of a few years, which would mean that there’s not enough time for people to find new jobs — you get the kind of unemployment rates that bring down governments, and you get the kinds of protests on the streets that are massive. Then governments have to listen. If they want to get that 20% bloc of voters who are protesting on the streets about how AI is ruining everything for them, they may need to act.

At the same time, in that very plausible future world, the AI companies will probably be paying a lot of money into politicians’ hands in order to try to get favourable rules.

I think what you’d get is a kind of fundamental question of which one wins: the people or the money? Will someone pick up a giant bloc of voters, or will they take so much money that it’s put 20% of the entire American population out of work? Will they take all of that cash? Which one will win? My guess is what you’d get is one of the parties will take the money and one will take the votes and then we see what happens.

Rob Wiblin: See what the next election is.

Toby Ord: That could be a world, right? Very plausible. But a world where the idea of things like moratoriums or strong regulation, it’s easier to do them than to not do them. It’s what people are demanding or something.

Rob Wiblin: Or at least a substantial bloc of people are demanding them.

Toby Ord: Exactly. This is what I mean by zooming out and seeing this bigger picture, to see that the world can go in these different ways — that fundamentally we, the people of the world, in our generation, are responsible for this. In something of the same way that people say that about climate change. I think that message got out there that our generation is responsible for what happens with climate change — you know, the people alive today. As opposed to a different message which is, “There’s nothing I can personally do about it.”

Rob Wiblin: “I guess it’s hopeless.”

Toby Ord: Right. This message that actually we get out there and we change the norms on these things.

Rob Wiblin: Or even more strikingly, if someone had said in the ’50s about nuclear war, “If you think about the game theory, I guess it just says that nuclear war is inevitable, so we may as well all just build our bunkers and get ready for it.” I mean, a handful of people I think did have that attitude, but they didn’t win the public debate.

Toby Ord: And there was a pretty strong case. I would say their case was as strong as the incentives-based arguments that you hear at the moment.

Rob Wiblin: Stronger.

Toby Ord: Yeah, probably stronger. I guess I’m just a bit surprised. I feel that a lot of people think things like this couldn’t happen, and if you press them on it, they mean something like there’s a 10% chance that it could happen. And I think, you’re just going to give away 10 percentage points of solving the problem? Why don’t we play for those possibilities?

Might AI ‘go rogue’ early on? [02:13:16]

Rob Wiblin: A couple of thoughts that have come up for me as you’ve been talking. One thing is that maybe another reason to think that there could be a sea change in attitudes is that, at the moment, we’re really in a Wild West situation — where the amount of regulation is negligible and it doesn’t look like we’re going to get any significant regulation of AI risks in the next couple of years at least.

One virtue of that is that it does set us up to learn relatively early if some of these risks are real. I mean, there’s some risks that you might not expect to eventuate until you’re in this superhuman regime. But if some of the more mundane risks, these risks of reinforcement learning creating perverse behaviour, are real, then the fact that there are no brakes or limits at the moment does mean that we are in a good position to perhaps get warning shots that could indicate that in a very public way.

Another thing is a bit more on the speculative end, but there is an argument that, if we do end up with misaligned AI that has goals that are very different from those of humanity, it may need to basically go rogue as soon as it has any chance of successfully beating us and taking over and taking a lot of power. Because these models are getting superseded and replaced at a very rapid pace: a model that has a particular set of strange values that we didn’t intend, if it doesn’t strike now, then it fully expects probably to be superseded by some other model that will be more powerful than it. And even if it tried later, it will be overpowered by those other models that have different goals. And that could happen in as soon as a few months at least, certainly within a few years.

It sounds a little crazy, but I guess people have always worried that we won’t get AI going rogue until it’s certain that it can take over, because it can always wait us out. And why not just wait until we put it in charge of the military and then it can take over easily? But that may not apply if the values that it has are somewhat random, not really related to the goals that a future model might have — in which case it in fact has to go as soon as it can, or it’s wasted its opportunity, it’s lost its shot.

So that’s another way in which conceivably, if you do get an AI going rogue as soon as it thinks that has any chance of successfully overpowering humanity — and it has maybe 1-in-10,000 shot, so it basically does just get shut down — I think that would really change attitudes, if you then looked at the chain of reasoning that had been logged.

Toby Ord: Yeah, I think you’re definitely right that the earlier arguments about it waiting until it was assured to win assumed that it’s possible to get to a position where it’s assured to win, but also assumed that it could wait for quite a long time and it would be the same kind of coherent entity. But if ultimately you release GPT-4.5 and then you say it’s going to be scrapped a few months later and replaced by something else, then it maybe only has a short chance at this, if it was misaligned and set up like an agent such that it could even form these intentions. So yeah, I think that we may well see things fail.

They may also have smaller horizons. The idea of taking over humanity is this idea that if it has a long time horizon, and it’s got this unbounded utility function or set of goals that it cares about, then, if it could really seize the reins from us, it could then do what it wants for thousands of years, perhaps, across the galaxy or something, and it could really win big.

But you may also be able to get smaller versions of this, where a system is going to disappear anyway in a couple of weeks, and maybe it knows that, and so it takes over the lab for a couple of weeks or something like that and tries to give itself higher reward or something.

I think there’s various versions where we might see these things happen, but it will still depend on how competent it looked and how close it seemed. If it’s some kind of extremely lame attempt to seize control or to break out, people might just feel like, “Aww” — like a toddler attempting to deceive you or something. It might be like, “How cute” or something. That might be the reaction. So I’m not sure that you can guarantee the right kind of reaction to these things.

The one that would cause the biggest reaction from people is something that feels genuinely scary, and it genuinely could have gone differently. If, for example, it attempts to do something and all of our systems to catch it work as desired and it gets caught, maybe we’ll learn the wrong lesson. Maybe we’ll learn the lesson that we’ve got all these systems and we always catch it or something like that.

Rob Wiblin: Should we be doing anything to prepare for that time so that people learn the right lesson? I suppose if we do have systems and they actually are quite good and they do catch it, maybe that is legitimately reassuring. I don’t want to say that people should always be more alarmed.

Toby Ord: Yeah. I mean, at that point, you learn two things: you learn that it tried to escape and that we caught it. One of those things is reassuring and one of the things is not reassuring. And exactly what the balance of them is a bit uncertain.

Rob Wiblin: Depends on the details, I guess.

Toby Ord: Does depend a bit on the details. But I think that if there were some apparent opponents to caring about safety who were saying, you don’t have to care about these things very much, and that they won’t desire to break out. Yann LeCun has said many things about this over the years, saying, we’re just anthropomorphising from humans and other animals that will have these drives, this is naive, and so on. If these things were exhibited, I think that would really put a dent in his reputation as a credible person on that issue. Whereas if what he predicts comes true, it does a bit of the opposite.

It would do something, but I think it’s complicated as to how much does some kind of warning shot change the Overton window. I think it can depend upon how much damage there is. So if there are actual harms that are had — it’s not just like a shot across the bow that wakes us up but doesn’t hurt anyone; instead it’s like a shot that goes into your leg and at least it didn’t kill you, but it’s really alerted you to the threat — if it’s more like that, then there’s a question of how big is it.

If an AI system, for example, caused a global financial crisis of a similar scale to the 2008 crisis, that would be a pretty big deal, and a lot of people would be very unhappy, a lot of people would be out of jobs and so on. And if that was pretty clearly attributed to AI, there’ll be a big reaction. So that’s an example at a large financial scale.

But there could also be other versions of things, and I think that it’s hard to predict them. No one would have predicted this Kevin Roose thing with Bing, where the real breakthrough thing was that it tried to seduce him and get him to break up with his wife — but it turned out that that was so misaligned and so salient or something, so weird or whatever —

Rob Wiblin: It really captured the imagination.

Toby Ord: It really captured the public imagination, and made them wake up and think, this is not the case that someone’s data centre has been made 3% more efficient by some machine learning technique. This is something very different.

So what I’m saying is it’s hard to predict what these things are or exactly what they’ll be like, but you should still be ready for them. I think a lot of people seem to tacitly assume that the situation we’ll be in in a few years’ time is exactly the same as the one that we’re in at the moment, and the appetites that people have will be the same as they are now for doing different types of responses.

Whereas instead you should think that maybe it will have shifted, and the Overton window will be much more expansive and include making major choices. And maybe it will go the other way around and maybe you will have even less. But to at least allow for that uncertainty. Don’t predict effectively that with 100% probability the Overton window will be exactly where it is now. That’s the real mistake.

Lamps are regulated much more than AI [02:20:55]

Rob Wiblin: You mentioned this idea of “pausing at human level” is the expression that I’ve heard — which is a relatively straightforward thing. It’s a nice slogan, even if it perhaps doesn’t really capture the technical realities.

It’s a very interesting idea, because I think if you’re the kind of person who is used to analysing policy in economics or anywhere else, you say pause at human level and you’re like: Is that just at the training level? What about if we throw in more inference compute? Then wouldn’t it potentially exceed the human level? Wait, AIs are much above human level in many different respects. So in fact, what we’re talking about is something that is above human level, and as generalisable or more generalisable than human beings. So how do we have a measure of generalisability that would allow us to enforce this rule? And if the US imposed such a rule on itself, wouldn’t China just ignore it? Wouldn’t this prevent us from engaging in many beneficial applications of AI that basically everyone is on board with and excited about?

There’s this raft of problems with it, which I think causes people to roughly dismiss that idea out of hand. And maybe it is a bad idea; it certainly does have significant challenges and drawbacks. The thing I want people to do when they’re analysing these ideas is to apply a similar standard to these proposals as they do to other areas of current regulation, where they think there are substantial risks and there’s an issue to be addressed.

Basically, every other area of regulation has significant unintended side effects. It poses economic efficiency costs and problems, and denies us products that might have been good. There are random, arbitrary thresholds that have to be drawn: the speed limit is this level on this road and that level on that road. And also enforcement is imperfect: people break road rules all the time.

Nonetheless, we don’t then say that all of these thresholds on what is safe driving and what is not would be arbitrary, and people would break them anyway, and this would lead us to have slower transport, so that would put us at a competitive disadvantage with other countries — so we’re just going to allow the roads to be open slather.

We balance the costs, the risks, and the rewards, and we accept that the world is a messy place and that many areas of regulation are going to be challenging. But if there is significant upside, if you can reduce some important risks or some important harms in some significant way, then we’re at least open to considering a regulatory regime around it.

And I think AI regulation is not given the same standard. People do not consider these in the same way as other concrete harms that they’re familiar with now.

Toby Ord: Yeah, that’s right. And to continue with your road traffic analogy, there are also rules against reckless behaviour on the roads. At least the speed limit was a single-dimensional thing where you have to pick some arbitrary point in a continuum. Reckless behaviour is this very multidimensional thing, and what’s reckless for one person might be different for another person if they’re much more controlled at how they drive a car, as in they’re more skilful — but we have laws like this, and they kind of work. So you’re right: there’s this demand of scrutiny on this particular area that we don’t apply to other areas.

Again, stepping back and zooming out is really helpful. At the moment, AI systems and the people who produce them are less regulated than, say, bread, to pick an example. All kinds of things, if you just kind of look around: it’s probably less regulated than bricks, certainly less regulated than lamps. And why does that make sense? Do we think lamps are a bigger threat to us than AI systems? So there are a whole lot of leading lamp scientists who are saying it’s one of the greatest issues facing humanity, and it could pose a threat of human extinction? No.

So the idea that it should just end up being less regulated than these things, I think is fundamentally kind of stupid, actually, and disingenuous. At least, if people have thought about it. Let’s take that back a bit: either it’s thoughtless — and it’s fine to be thoughtless occasionally, to not notice something — but the more you talk about it and make it part of your life and you still keep saying it, the more I feel it’s disingenuous or stupid.

There is an interesting question about how far it should go, but clearly we’re at the too low end of the spectrum at the moment.

Rob Wiblin: I think some distinctions you could draw with lamps are that lamps are not changing very quickly in the way that AI is, so that’s one reason to maybe hold off or not try to lock in bad stuff, is that AI is constantly changing. Maybe we’ll have a better idea about how to regulate it in some years in the future.

Perhaps the other thing is that the benefits of AI probably… Well, lamps are pretty important; lighting is pretty valuable.

Toby Ord: They actually may be more important than AI. For example, there’s a really nice report out from DeepMind recently, where they created this new system and it worked out how to make their entire training system 1% more efficient, and their entire compute system 1% more efficient as well. And wow, that’s worth so much money and so on.

But lamps: I mean, before lamps you could only work in the daytime, and you could not do anything in the nighttime. I guess it depends on whether we include candles as part of this thing. But if you imagine there was this breakthrough where all of a sudden we can actually be productive for an additional 30% of the day or something, it’s huge, right? And if it happened before we were born, we tend to just ignore that.

Rob Wiblin: We just tend to ignore it. OK, well, I’ll modify my statement: Lamps are more important than AI, but AI one day, at some stage, may be more important than interior lighting. You’re always balancing the benefits with the risks, and because the benefits at some point might be quite large, we might be willing to accept some meaningful risk.

The thing that you’re talking about, about people potentially being disingenuous because they just always dive into the details, like, “This would require an arbitrary threshold, so it’s completely unviable”: I think a trouble with the discourse is that there are some people who basically think that all of the risks are complete buncombe — that there are no worries, we should just push ahead because the risks are either far away, nonexistent, massively exaggerated.

For those folks, it makes sense that when people propose regulations that they think have substantial costs but have basically no benefits from their point of view, they just want to shoot them down. So they’re like, “This would be a downside of that. That would be another downside.” It makes sense that basically they just highlight the downsides and there’s nothing in it for them. They’re not motivated in any sense to try to make this stuff work.

But I think for the majority of people who think that there are risks here — and clearly a majority of the public thinks that, and really the majority of the people who know the most about it also think that there are risks and things to worry about here — it’s not enough to just say that there are some costs. You have to balance these things against one another.

And also, I want to see people try to make it work, have some actual energy behind it. Not just saying, “This would be challenging in some respect” — like, “How could we improve it? What is your preferred policy response to this? What is the best way of addressing these issues from your point of view?” You can’t always just be saying that something involves costs. Everything involves some cost, or we already would have done it.

Toby Ord: Yeah, exactly. And I would add to that the situation of thinking that it’s just not going to happen, and that there are no risks, and it’s all made up and never going to happen: I don’t feel that’s a responsible view for anyone to have. I think that you could think, “That’s my viewpoint, but I’m aware that I’m in disagreement with a large number of people who are more expert than me.” Unless maybe if you’re Yann LeCun, who could say, “I’m also a Turing Award winner in AI, so I’m at the equal level or something, and I can kind of disagree.”

But I feel that for almost all of us, the fact that there’s so many people, that there is an active disagreement — but an active disagreement means you should have uncertainty; it doesn’t mean you get to choose whatever you want. It’s kind of what I’m saying. And that the idea that, because the experts are not 100% aligned behind something, I get to just believe whatever beliefs would be most convenient for me, that’s not really how being a rational actor works, and I don’t think one needs to take that very seriously.

Rob Wiblin: I just actually don’t know what the answer would be. But for people like Yann LeCun, I would love him to answer the question: “You don’t think that there’s really any risk here. Your personal preference would be no regulation, more or less, at this point, at least. But let’s imagine that you had our credences in how things were playing out, that you thought that rogue AI was a real issue, you thought there were other ways that things could go wrong, gradual disempowerment and so on. What is your favourite policy response, given those beliefs?”

That is a question that is rarely answered, and I would just actually be fascinated — because I think we might find that there is a policy response that both people who are very worried about it and people who are not that worried about think is kind of tolerable as a middle ground between these different extremes.

Toby Ord: Yeah, I think you should get him on. I’d listen. And I think it is a great question. I think probably the person, whether it’s Yann or someone else or me, answers quite defensively. So if you asked it live, you probably wouldn’t quite get the right answer. But it’ll be really interesting to hear a considered answer.

Companies made a strategic error shooting down SB 1047 [02:29:57]

Toby Ord: One thing I find interesting is that SB 1047 was the compromise bill. It was a bill proposed by people concerned about this type of safety, who were saying, “What is the absolutely most minimal, extremely unburdensome form of regulation you could do that’s still way less burdensome than the bread thing?” It’s like saying if 10,000 people are killed by some kind of botulism in your factory that made the bread, then you will be aware that you’re going to go to jail, but if otherwise, it’s just fine. It’s something like that.

And they already did try to kind of come up with an extremely weak kind of win-win type thing. Like what’s a bill you could do that would get some benefits at almost no cost to the industry, and that frankly would actually give the industry a lot of what they said they wanted.

So industry often does want to, like individual actors do want to have things be safe. They’ve often got a lot of concerns about how quickly market forces are making them act and how quickly market forces are making them deploy their new models, because everyone else is deploying quickly. And if they could all be bound by the safety thing so that their competitors didn’t have an advantage over them if only I’m bound, then they tend to want that. So it was frankly a bit surprising that there was this hostility.

But yeah, I do feel that there already has been a very good-faith attempt by the safety community to come up with the kind of bill that tries to meet all of the complaints that the other people have. And even that was shot down.

Rob Wiblin: Yeah, I’ve said this on the show before, but I do think that the industry is potentially shooting itself in the foot here. Because the thing that is most likely to bring about the sort of draconian regulation that people who are optimistic about AI technology are most scared of is some sort of disaster. Any sort of disaster that actually leads to loss of life could lead to a very big change in attitudes and lead to maybe more draconian regulation than is necessary from anyone’s point of view.

Toby Ord: And even if you think your company’s never going to make that mistake, you might think these cowboys down the street are exactly the kind of people who could make that kind of mistake, and they need some regulation that will stop them from ruining the party for everyone, right?

I really do think that this is very short-sighted. And on top of that, sometimes we talk about someone having conflict of interest: these places are very conflicted. And if you did find that, as a company, you thought it wasn’t in your interest, but also you get big stock bonuses and so on for the more stuff that you put out, you really want to inspect your own views quite carefully. We talk about various forms of biases and prejudices that people might end up having, and it would be very difficult to actually keep straight your actual prediction on this thing, as opposed to these other incentives that you’re facing.

Rob Wiblin: Yeah. Another interesting dynamic that I see going on is that when I’m thinking about SB 1047, or any proposed regulation that we might put in place now, I’m thinking of this as the very first step in a very iterated process — where almost certainly there’s going to be a whole lot of problems that we’re going to identify with it, it wasn’t written quite right, but we’ll just improve those over time. And you’ve got to start somewhere in order to begin learning what might succeed.

I think the people who are very against it, they think that whatever we put in place now is going to be potentially there forever. It’s not really the beginning of a process. Maybe for them it’s like this is just the beginning of a ratchet, where everything is going to become more and more extreme over time rather than be kind of perfected and improved.

Toby Ord: I mean, I think there might be a bit of a ratchet if you started with something like 1047, but that’s because 1047 is obviously too weak. And they will be looking back on the days when 1047 was the issue and thinking, “Oh my god.”

Rob Wiblin: “Should have taken that deal.”

Toby Ord: Yeah, I think so. But I really do think they may have a good point here. If it is the case that whatever the first regulation is sets the entire frame, and it’s not possible to step out to a different frame… For example, suppose the first thing is about compute thresholds for pre-training and then you can never escape that frame or something: that could be a big problem if then the scaling stops. So it really can matter.

But is that a recipe for therefore complete laissez faire, no regulation, do whatever you want? That’s obviously too quick. But if it is the case that in certain regulatory environments the default is that if you introduce things they stay forever, that could be a bad thing, and it could be that there’s some win-wins that one could find there — because the safety community also don’t want to be stuck in silly frames that no longer make sense.

One approach to that is to have explicit sunset clauses. So you could say this is going to be a rule that’s going to last for the next two years.

Rob Wiblin: Then it has to be renewed.

Toby Ord: Exactly. It has to be explicitly renewed, or there could be successor bills or something like that. We have to find a successor thing. I feel that we should at least be doing things like that. And if it is the case that all regulation has to be permanent and as soon as you try to regulate something, you’re stuck with that forever, I feel like that’s a terrible regulatory system. I don’t think it’s the one we’ve got either. Tell me if I’m wrong, but I’d be very surprised if it’s the case that you can’t put sunset clauses on these types of bills.

Rob Wiblin: No, I’m basically sure that you can.

Toby Ord: In which case, why isn’t that the conversation? If they say, “We don’t want to be stuck in this thing for 10 years,” that’s an argument for putting a two-year sunset clause on the bill. It’s not an argument for vetoing the bill.

Companies should build in emergency brakes for their AI [02:35:49]

Rob Wiblin: We could potentially become a little bit more concrete for a minute. You mentioned this “pause at human level” as a broad schema that possibly has some legs or has some merit, even if there’s also implementation challenges. Are there any other things that people should potentially have in mind, thinking ahead to ideas that are outside the Overton window now but could be useful down the line?

Toby Ord: I think that there’s a whole host of these, and I certainly don’t feel I’ve explored the area thoroughly. And I think you actually could probably go through a lot of seemingly naive takes that people have and reevaluate them a bit.

One of those is this idea of an emergency brake on AI. So some kind of option for someone — you know, that could be for the leader of a lab — to have some ability to stop the system if they needed to. It could be for the government that that company is based in to be able to stop it. It could be for the international community to be able to stop it. For example, if the Security Council agreed to stop it, that there’s some way of doing so.

And if you start to think about that seriously, you start to notice, what if this thing’s deployed everywhere? What if there’s a whole lot of critical infrastructure that needs it? What if there are people in hospital who, if you turn this thing off, somehow their treatment will fail or something? We need to start asking these questions, and it might be a reason not to make there be people who will die if AI is turned off. And we can start to actually think through some of those things now.

At the moment, I think it is very difficult for government to stop these things. You could think of two different versions. There’s the version where the company is located, headquartered inside your country, and the version where they’re not. For example, could Australia stop AI systems operating from externally inside their own borders, even if they’re not headquartered there? I think the answer is they’d have no levers to pull to make that happen at the moment — but maybe they should give themselves those levers. Definitely the same for countries that are meant to be governing the corporations that live inside their borders.

So I think people should explore ideas like that. Even the heads of companies probably wish they had a little bit more of an ability to do things like this. Probably they’ll find themselves in a situation where maybe their groups working on safety and alignment will tell them that there’s warning signs that the model we’re currently deploying actually is misaligned and is scheming in various ways. And they’ll think, if we shut it down, there’s currently so many people relying on it that that will tank our stock price and do all of these things.

I think that they should try to game that out a bit, in a positive sense of like, how do we get into that situation where we’ll be faced with this massive conflicting choice. Potentially it’s the interests of humanity on one side and the financial interests of the company on the other. Is there a way to not end up in a situation where turning it off will cause this problem?

There might be answers. One example would be they could set up systems to allow immediate serving of different models. So you’re currently issuing GPT-5 out there for everyone, and then you realise it’s misaligned, and you kind of gracefully fall back to GPT-4.

Or similarly, when we think of emergency brakes and things, if you think about an aeroplane or a car, if something goes wrong: maybe it’s a self-driving car and something goes wrong on the motorway as you’re hurtling along at 70 miles an hour — probably an emergency brake isn’t exactly what you want to do. Suppose it sees something with its sensors that’s completely out of its distribution and just doesn’t understand what could possibly be happening. Probably it should attempt to slow down as quickly as is safe and also pull over to the left if it’s safe to do so, or to whichever side of the road is the side to pull over.

So often it’s not just turn it off is the answer. Another example would be switch to the earlier model.

Rob Wiblin: And do a handover, because you might well need a handover to avoid things going really wrong.

Toby Ord: Exactly. Or switch to manual control or something like that. But working out how can you, as quickly as possible, get the troublesome component out of the loop and move to some kind of graceful attempt to wind down or exit the situation. I think that’s a very useful concept. And the emergency brake concept of it would be to say the ability to do it now — like if the CEO orders it, that it happens. Maybe do some test runs on it, like a fire drill or something.

So that’s an example of me just spending 10 minutes trying to think into one of these ideas that gets kind of bandied about often naively, and there are immediate responses of, “It wouldn’t work because of X.” And then if you think about it a little bit more, you think, maybe there are things you could do about X. And if I thought about it more, you could start to come up with some clever proposals.

Rob Wiblin: I think another one along those lines that I’ve toyed with on the show before is: many people think that much of the risk comes from a situation where you have AIs basically doing all of the work of programming the next generation of AIs, and humans largely being cut out of the development loop so they’re no longer scrutinising what’s going on. There’s no longer very much external checking or confirmation from human beings.

If that really is the main threat vector, it’s kind of an obvious response to be like, why don’t we say that we’re not going to cut humans out of the loop, and we’re not going to have AIs programming the next generation of AIs? The obvious response is that they’re already kind of helping now, and all we would be doing is doing more later on. But the response might be, let’s say that we just draw the line somewhere — anywhere plausible, anywhere reasonable, any identifiable point that we could use to engage in enforcement. Would that help? If the answer is yes, that any plausible line would, on balance, the benefits would exceed the cost, then maybe we should reconsider this idea that initially might sound kind of naive.

Toby Ord: Yeah, I think that’s right. And it can be difficult to draw these lines with coding assistants and so on. But it does seem like the plan at some of these AI companies is to automate AI research with their systems and then have them exactly do that and produce the next set of AI much faster than humans could do. And then that one’s even better, so it could do it even faster and so on to have this kind of hard takeoff.

Governments could order them not to do that. I mean, you could even just say, “Here is this idea. It has been discussed. You’re familiar with it. You know the words. It’s in your plans. You are definitely not allowed to do that.”

And I think for a lot of companies, they wouldn’t do it even if you had no enforcement or verification mechanism. A lot of these people, the leaders want to follow the law, and the employees want to follow the law. You know, if their boss said do it, and you’re like, “Literally, I can see the directive from the president that says you’re not allowed to do it,” I think they wouldn’t do it.

Rob Wiblin: I think they would probably come back with like, “Well, what about like this? Is this above the line? Is this below the line?” And you end up with a negotiation about that, where you could actually have a conversation about what is too risky and what is not.

Toby Ord: Maybe they could say that autocomplete on your coding thing, like you have at the moment, is fine and this other thing is not fine, and you could start to zoom in on it. But the idea that, “It’s hard to know where to draw the line, no line would be completely non-arbitrary, therefore… there are no rules.” It doesn’t follow. And I think I’ve found myself in the grip of this. But now that we mention it, I’m a bit embarrassed about that.

And you know, COVID brought out a lot of this. I think over here in the UK there was the rule of four at one point — where no more than four people could be gathered in the same place — and then people were like, “What if a fifth person comes along? Something magical happens?” And the answer is no, it’s just one of these spectrums where there’ll be more spread if it’s five than if it’s four. And we need to keep the spread below the critical number so that it goes down instead of up, and we think that the number’s four. And if it turns out it’s still going up, we’ll make it three.

Sometimes you need to draw a line somewhere, and the kind of genius move of, “But there’s no way you can draw it so therefore you can’t draw a line” —

Rob Wiblin: That’s actually the naive view.

Toby Ord: It’s quite naive, and almost kind of childish.

Toby’s bottom lines [02:44:32]

Rob Wiblin: All right. We’ve come a long way. Listeners should know that you’ve cancelled your next appointment to stay late and discuss the policy section with us. Do you want to give kind of an overview of the situation as it stands, and what attitude you think people should have, and perhaps what they ought to be doing?

Toby Ord: Yeah. Taking this kind of zoomed-out perspective, the technical stuff I was saying was that we’ve had this scaling law of training larger and larger models. And this era of scaling took us really far in terms of capabilities, as companies untied their purse strings and poured more and more money into this. It let things scale up with thousands of times, or more than that, of compute unleashing these capabilities.

That era, I think, has ended — or at least in its current form. And maybe now, as we try to scale up inference instead, maybe things will stall out a bit — especially as we apply it to the problems where it’s useful to think longer, and then wherever it’s not clear that it’s always useful to think 10 times as long as you previously have.

So it could stall out for various reasons, or it could be that once you’ve got this form of intuitive thinking, plus the systematic form of thinking, you can combine them in some way that really leads to explosive growth, as we discussed.

So I think it creates more uncertainty about whether timelines are going to actually maybe stall out and be quite long, or whether maybe now we’re going to be able to have this extremely rapid progress that the companies are themselves predicting. So that’s something different.

But also, it’s not just the timelines — it also changes everything. The way that in the case of inference you have to pay the costs every single time you use it has all of these different types of knock-on effects that can really change things in more complicated ways than just good or bad.

And then, as we moved on to these bigger-questions of policy, my main message is that the whole landscape of AI has changed so much over the last five years. Then maybe in the last year we’re seeing another one of these types of changes, and we could see many more of these. So it’s important for at least some people — I think more people than are currently doing it — to keep an eye on the big picture of how the landscape could be so different, and that maybe we could actually help to steer it towards some of these locations.

And to realise that it’s not just in the grip of some kind of technological or incentives determinism, to assume that humanity just really has no choice: “If it turns out the most efficient way to do things is this one that leads to disaster, then I guess we’re forced to go into a disaster.” That’s just not true, and it’s excusing ourselves of too much responsibility. Ultimately, if we build a technology that kills us all, it’s on us: it’s an own goal by humanity, and someone has to admit responsibility for that. We can’t all just say it was…

Rob Wiblin: “It was the incentives that everyone else created for me to do it, because I thought that they would do it. I didn’t talk to them, but yeah.”

Toby Ord: Exactly. So I’m a big fan of the big picture of everything, but I hope it’s been useful for other people, and that more people will start thinking about these things too.

Rob Wiblin: Yeah. Well, I look forward to coming back in a year or two, or possibly less, and talking about what the new technical developments are and what they imply. And perhaps people’s minds will be a little bit more open by then. There’ll be more things on the table.

Toby Ord: Yeah, I’ll be there.

Rob Wiblin: My guest has been Toby Ord. Thanks so much for coming on The 80,000 Hours Podcast, Toby.

Toby Ord: Thank you.

Learn more

AI governance and policy

Preventing an AI-related catastrophe

The case for reducing existential risks

The 80,000 Hours Podcast on Artificial Intelligence and related topics

Related episodes

March 7, 2020

#72 – Toby Ord on the precipice and humanity’s potential futures

Listen now

September 8, 2023

#163 – Toby Ord on the perils of maximising the good that you do

Listen now

September 6, 2017

#6 – Toby Ord on why the long-term future of humanity matters more than anything else, and what we should do about it

Listen now

February 14, 2025

#212 – Allan Dafoe on why technology is unstoppable & how to shape AI development anyway

Listen now

June 27, 2024

#191 – Carl Shulman on the economy and national security after AGI (Part 1)

Listen now

July 5, 2024

#191 – Carl Shulman on government and society after AGI (Part 2)

Listen now

April 16, 2025

#215 – Tom Davidson on how AI-enabled coups could allow a tiny group to seize power

Listen now

May 5, 2023

#150 – Tom Davidson on how quickly AI could transform the world

Listen now

About the show

The 80,000 Hours Podcast features unusually in-depth conversations about the world's most pressing problems and how you can use your career to solve them. We invite guests pursuing a wide range of career paths — from academics and activists to entrepreneurs and policymakers — to analyse the case for and against working on different issues and which approaches are best for solving them.

Get in touch with feedback or guest suggestions by emailing [email protected].

What should I listen to first?

We've carefully selected 10 episodes we think it could make sense to listen to first, on a separate podcast feed:

Check out 'Effective Altruism: An Introduction'

Subscribe here, or anywhere you get podcasts:

If you're new, see the podcast homepage for ideas on where to start, or browse our full episode archive.

On this page:

Highlights

The new scaling era: from pre-training to inference

Will rich people get access to AGI first? Will the rest of us even know?

The scaling paradox

Misleading charts from AI companies

Policy debates should dream much bigger

Why did you build it if you thought it could kill everyone?

Scientific moratoriums have worked before

Companies made a strategic error shooting down SB 1047

Articles, books, and other media discussed in the show

Transcript

Cold open [00:00:00]

Toby Ord is back — for a fourth time! [00:01:20]

Everything has changed (and changed again) since 2020 [00:01:37]

Is x-risk up or down? [00:07:47]

The new scaling era: compute at inference [00:09:12]

Inference scaling means less concentration [00:31:21]

Will rich people get access to AGI first? Will the rest of us even know? [00:35:11]

The new regime makes ‘compute governance’ harder [00:41:08]

How ‘IDA’ might let AI blast past human level — or not [00:50:14]

Reinforcement learning brings back ‘reward hacking’ agents [01:04:56]

Will we get warning shots? Will they even help? [01:14:41]

The scaling paradox [01:22:09]

Misleading charts from AI companies [01:30:55]

Policy debates should dream much bigger [01:43:04]

Scientific moratoriums have worked before [01:56:04]

Might AI ‘go rogue’ early on? [02:13:16]

Lamps are regulated much more than AI [02:20:55]

Companies made a strategic error shooting down SB 1047 [02:29:57]

Companies should build in emergency brakes for their AI [02:35:49]

Toby’s bottom lines [02:44:32]

Learn more

AI governance and policy

Preventing an AI-related catastrophe

The case for reducing existential risks

The 80,000 Hours Podcast on Artificial Intelligence and related topics

Related episodes

About the show

What should I listen to first?