Transcript
Cold open [00:00:00]
Ajeya Cotra: If you look at public communications from at least OpenAI, Anthropic, and Google DeepMind, in all of their stated safety plans, you see this element of as AIs get better and better, they’re going to incorporate the AIs themselves into their safety plans more and more.
How to create a setup where we use control techniques and alignment techniques and interpretability to the point where we feel good about relying on their outputs is a crucial step to figure out — because it either bottlenecks our progress, because we’re checking on everything all the time and slowing things down, or it doesn’t bottleneck our progress, but we like hand the AIs the power to take over.
Ajeya’s strong track record for identifying key AI issues [00:00:43]
Rob Wiblin: Today I’m speaking with Ajeya Cotra. Ajeya is a senior advisor at Open Philanthropy [now Coefficient Giving] where in 2024 she led their technical AI safety grantmaking. More generally, she’s been doing AI-related research and strategy since 2018, and has become very influential in AI circles for her work on timelines, capability evaluations, and threat modelling. Thanks so much for coming back on the show, Ajeya.
Ajeya Cotra: Thank you so much for having me.
Rob Wiblin: So doing this interview gave me a chance to go back and listen to the interview that we recorded two and a half years ago. And I have to say that you were very on the ball, or there were a lot of issues that came up in that conversation that you were bringing to people’s attention that I think in the subsequent two and a half years seem like a much bigger deal now.
- You talked about METR’s evaluating autonomous capabilities, a line of research that’s gone on to become super influential, very widely read in policy circles.
- You talked about using probes to monitor and shut down dangerous conversations, something that’s pretty standard practice and maybe one of the potentially most useful outputs from mechanistic interpretability.
- You talked about the importance of using chain of thought and scratchpads to monitor what AIs are doing and why. Still probably the dominant technique.
- You talked about the growing situational awareness of AI models and the resulting possibility of deceptive alignment, something that’s now a completely mainstream topic.
- You talked about how, when you train models to not engage in bad behaviour, they don’t necessarily just learn to become honest, they also learn to just hide their misbehaviour better — something that I guess research has kind of borne out does really happen and is a big concern.
- You talked about how you expected models to get schemier as they get smarter, especially once we inserted reinforcement learning back into the mix — something that’s definitely happened.
- And you talked a bunch about sycophancy: how you thought models might end up just flattering people rather than giving accurate information, because that’s kind of something that we enjoy.
So you didn’t come up with all of these ideas or anything like that, but I think you were ahead of the curve, and maybe we’ll get some ahead-of-the-curve ideas in this interview as well.
Ajeya Cotra: Hopefully! Thank you.
The 1,000-fold disagreement about AI’s effect on economic growth [00:02:30]
Rob Wiblin: So you think that a key driver of disagreements about everything to do with AI is people’s different views on how likely AGI is to speed up science and technology, and I guess physical infrastructure and manufacturing. Why is that?
Ajeya Cotra: Yeah. A thing that I’ve been noticing as the concept of AGI has become more and more mainstream is that it’s also become more and more watered down. So last year I was on a panel about the future of AI at DealBook in New York. It was me and one or two other folks who kind of think about things from a safety perspective, and then a number of venture capitalists and technologists.
And the moderator asked at the very beginning of the panel whether we thought it was more likely than not that by 2030 we would get AGI — defined as “AIs that can do everything humans can do.” Seven or eight hands went up, not including mine, because my timelines are somewhat longer than that.
But then he asked a followup question a couple of questions later about whether we thought that AI would create more jobs or destroy more jobs over the following 10 years. So 2030 was five years and seven out of 10 people thought that we would have AGI by 2030. But then it turned out that eight out of 10 people, not including me, thought that AI would create more jobs than it destroyed over the next 10 years.
And I was a little confused. Why is it that you think we will have AI that can do absolutely everything that the best human experts can do in five years, but will actually end up creating more jobs than it destroys in the following 10 years? What’s happening here?
Rob Wiblin: Seems like a tension.
Ajeya Cotra: And when I poked some people later in the panel about that seeming tension, I think they really quickly backed off and they said, “What does AGI really mean?” The moderator had defined it as this very extreme thing, but they were like, “We kind of already have AGI. People keep moving the goalposts, we keep making cool new products, and people aren’t accepting that it’s AGI and they aspire to something higher.”
And I thought that was funny, because in the old school singularitarian futurist definition of AGI, AGI is this very extreme thing. But I think VCs have an instinct to call something AGI that is like GPT-5 is AGI, or something just much milder.
So I think this creates a situation where people feel like they’ve gotten a lot of evidence that AGI isn’t a very big deal and doesn’t change much — because we already have AGI, or we’re going to have it next year, or we got it two years ago and look around us: nothing much is changing.
So I think there’s this expectation where, whether or not we get AGI in the next few years, a lot of people are starting to not really care about that question. They still expect the next 25 years or the next 50 years to play out kind of like the last 25 years or the last 50 years, where there’s a lot of technological change between 2000 and 2025, but it’s like a moderate amount of change.
They kind of expect that in 2050 there will be a similar amount of change as there was between 2000 and 2025. Even if they think that we’re going to get AGI in 2030, they think AGI is just what’s going to drive that continued mild improvement. Whereas I think that there’s a pretty good chance that by 2050 the world will look as different from today as today does from the hunter-gatherer era. It’s like 10,000 years of progress rather than 25 years of progress driven by AI automating all intellectual activity.
Rob Wiblin: You’ve hinted at the fact that there is an enormously wide range of views on this, but can you give us a sense of just how large the spectrum is, and what the picture looks like on either end?
Ajeya Cotra: I would say on the sort of standard mainstream view, if you ask a normal person on the street what 2050 will look like, or if you ask a standard mainstream economist, I think they would think the population is a little bit bigger, we have somewhat better technologies. Maybe they have a few pet technologies that they’re most interested in. Maybe we have this one or that one. So slightly better medicine, people live slightly longer. It’s an amount of change that’s extremely manageable.
I think on the far extreme from there, on the other side, is a view described in If Anyone Builds It, Everyone Dies. In that worldview, at some point, probably pretty unpredictably, we sort of crack the code to extreme superintelligence. Like we invent a technology that rather suddenly goes from being like GPT-5 and GPT-6 and so on, to being so much smarter than us — that we’re like cats or mice or ants compared to this thing’s intelligence.
And then that thing can really immediately have really extreme impacts on the physical world. The classical, sort of canonical example here being inventing nanotechnology: the ability to precisely manufacture things that are really tiny and can replicate themselves really quickly and can do all sorts of things — like inventing space probes close to the speed of light and things like that.
I think there’s a whole spectrum in between, where people think that we are going to get to a world where we have technologies approaching their physical limits, we have like spaceships approaching the speed of light, and we have self-replicating entities that replicate as quickly as bacteria while like also doing useful things for us. But we’re going to have to go through intermediate stages before getting there.
But I think something that unites all of the people who are sort of futurists and concerned about AI x-risk is that they think in the coming decades we’re likely to get this level of extreme technological progress driven by AI.
Rob Wiblin: How strong is the correlation between how much someone expects AI or AGI to speed up science research in particular, and physical industry as well, and how likely they think it is to go poorly or how nervous they are about the whole prospect?
Ajeya Cotra: I think it’s a very strong correlation. I’ve found often that reasonable people who are AI accelerationists tend to think that the default course of how AI is developed and deployed in the world is very, very slow and gradual, and they think that we should cut some red tape to make it go at a little bit more of a reasonable pace.
And people who are worried about x-risk think that the default course of AI is this extremely explosive thing, where it overturns society on all dimensions at once in maybe a year or maybe five years or maybe six months or maybe a week. And they’re saying that we should slow it down to take 10 years maybe.
Meanwhile, the sort of accelerationists think that by default diffusing and capturing the benefits of AI will take 50 years or 100 years, and they want to speed it up to take 35 years.
Rob Wiblin: It’s quite interesting that people who radically differ in their policy prescriptions might be aiming for the same level of speed. That actually maybe they want this period to take 10 years or 20 years: that’s what both of them want, but their baseline is so different, so they’re pushing in completely opposite directions.
What’s your modal expectation? What do you think is the most likely impact for it to have?
Ajeya Cotra: I think that probably in the early 2030s we are going to see what Ryan Greenblatt calls top-human-expert-dominating AI, which is an AI system that can do tasks that you can do remotely from a computer better than any human expert. So it’s better at remote virology tasks than the best virologists, better at remote software engineering tasks than the best software engineers, and so on for all the different domains.
By that time, I feel like probably the world has already accelerated and changed, and narrower and weaker AI systems have already penetrated in a bunch of places, and we’re looking at a pretty different world. But at that point, I think things can go much faster because I think top-human-expert-dominating AIs in the cognitive domain could probably use human physical labour to build robotic physical actuators for themselves.
Whether the AIs have already taken over and are acting on their own, or whether humans are still in control of the AIs, I think that would be a goal they would have of automating the physical as well. And I have pretty wide uncertainty on exactly how hard that’ll be, but whenever I check in on the field of robotics, I actually feel like robotics is progressing pretty quickly. And it’s taking off for the same reasons that cognitive AI is taking off: large models, lots of data, imitation, large scale is helping robotics a lot.
So I imagine that you can pretty quickly — maybe within a year, maybe within a couple years — get to the point where these superhuman AIs are controlling a bunch of physical actuators that allow them to close the loop of making more of themselves, doing all the work required to run the factories that print out the chips that then run the AIs, and doing all the repair work on that and gathering the raw materials on that.
Rob Wiblin: So you’re saying that you’re expecting in the 2030s, it won’t just be that these AI models are capable of automating computer-based R&D, but they’ll also be able to lead on the project of building fabricators that produce the chips that they run on. So that’s another kind of positive feedback loop.
Ajeya Cotra: Yeah. I really recommend the post Three types of intelligence explosion by Tom Davidson on Forethought. He makes the point that we talk a lot about the promise and the danger of AIs automating AI R&D, and automating the process of making better AIs, but that’s only one feedback loop that is required to fully close the loop of making more AIs — because we’re talking about software that makes the transformer architecture slightly more efficient or gathers better data to train the AIs on, but the AIs are also running on chips, which are printed in these chip factories at Nvidia. And those factories have machines that are built by other machines that are built by other machines and ultimately go down to raw materials.
And I think that’s something we don’t talk about very much, because it’ll happen afterward: how hard it would be for the AIs to automate that entire stack, the full stack, and not just the software stack?
Rob Wiblin: So there’s a range of expectations that exist among sensible, thoughtful people who’ve engaged with this, on how much, at peak, is AGI going to speed up economic growth. It ranges from people who say it will speed up economic growth by 0.3 percentage points — so it’ll be a 15% increase or something on current rates of economic growth, and I’d be very happy if it was that good — to people who say at peak the economy will be growing at 1,000% a year or higher than that, thousands of percent a year.
So it’s like 100- or 1,000- or a 10,000-fold disagreement basically on the likely impact that this is going to have. It’s an almost unfathomable degree of disagreement among people. It’s not as if they’ve thought about this independently and they haven’t had a chance to talk. They’ve spoken about this, they’ve shared their reasons, and they don’t change their mind and they disagree by a thousandfold impact.
Ajeya Cotra: Yeah.
Rob Wiblin: You’ve made it part of your mission in life the last couple of years to have really sincere, intellectually engaged, curious conversations with people across the full spectrum. Why do you think it is that this disagreement is able to be maintained?
Ajeya Cotra: I feel like at the end of the day the different parties tend to lean on two different, pretty simple priors, or simple outside views that are kind of different outside views.
I would say that the group that expects things to be a lot slower tends to lean on: for the last 100, 150 years in frontier economies we’ve seen 2% growth. And think of the technological change that has occurred over the last 100 or 150 years. We went from having very little, like electricity was just an idea to everywhere was electrified. We had the washing machine and the television, the radio, all these things happened, computers happened in this period of time. None of these show up as an uptick in economic growth.
And I think there’s this stylised fact that mainstream economists really like to cite, which is that new technology is sort of the engine that sustains 2% growth, and in the absence of that new technology, growth would have slowed. So they’re like, this is how new technologies always are. People think that they’re going to lead to a productivity boom, but you never see them in the statistics. You didn’t see the radio, you didn’t see the television, you didn’t see the computer, you didn’t see the internet — and you’re not going to see AI. AI might be really cool. It might be the next thing that lets us keep chugging along.
That’s one perspective. It’s an outside view they keep returning to. And also maybe a somewhat more generalised thing: things are just always hard and slow, just like way harder and slower than you think. It’s like, what’s it like? Not Murphy’s Law?
Rob Wiblin: Murphy’s law: anything that can go wrong will go wrong. I think this is our experience in our personal lives, that it’s awfully hard to achieve things at work, things that to other people might seem so straightforward. And they’re like, “Why haven’t you finished this yet?” You’re like, “Well, I could give you a very long list.”
Ajeya Cotra: Or like Hofstadter’s law: It always takes longer than you think, even when you take Hofstadter’s law into account. Or like the programmers’ credo, this is my favourite one: We do these things not because they are easy, but because we thought they would be easy.
So there’s just this whole cloud of it’s naivete to think that things can go crazy fast. If you write down a story that seems perfect and unassailable for how things will be super easy and fast, there’s all sorts of bottlenecks and all sorts of drag factors you inevitably failed to account for in that story. That’s kind of that perspective.
Then I think the alternative perspective leans a lot on much longer-term economic history. If you attempt to assign reasonable GDP measures to the last 10,000 years of human history, you see acceleration. The growth rate was not always 2% per year at the frontier: 2% per year is actually blisteringly fast compared to what it was in like 3,000 BC, which was maybe like 0.1% per year. So the growth rate has already multiplied many fold — maybe an order of magnitude, maybe two.
I think that people in the slower camp tend to feel like the exercise of doing long-run historical data is just too fraught to rely upon. But people in both camps do agree that the Industrial Revolution happened and the Industrial Revolution accelerated growth rates a lot. And we went from having growth rates that were well below 1% to having 2% a year growth rates.
And I think that people in the faster camp tend to lean on the long run and on models that say that the reason that we had accelerating growth in the long run was a feedback loop where more people can try out more ideas and discover more innovations, which then leads to food production being more efficient, which then leads to a larger supportable population — and then you can rinse and repeat and you get super-exponential population growth.
Then that perspective says that if you can slot in AIs to replace not just the cognitive, but the cognitive and the physical, the entire package, and close the full loop of AIs doing everything needed to make more AIs, or AIs and robots doing everything needed to make more AIs and robots, then there’s no reason to think that 2% is some sort of physical law of the universe. They can grow as fast as their physical constraints allow them to grow, which are not necessarily the same as the constraints that keep human-driven growth at 2%.
Rob Wiblin: So that’s the justification that they provide for their perspective in broad strokes. But why is it that even after communicating this at great length to one another, they don’t converge on uncertainty or saying it’ll be something in the middle because there’s competing factors? That they just continue to be reasonably confident about quite different narratives about how things will go?
Ajeya Cotra: I’m honestly not sure. I’m partial to the “things will be crazier” side of things, so I’m not sure I’ll be able to give a perfectly balanced account. But I feel like one thing I’ve noticed in terms of people who think it’ll be slower is that their worldview has a built-in error theory of people who think things will go faster. So the worldview is not just that things will keep ticking along, but everyone thinks there will always be some big new revolution.
Rob Wiblin: Everyone’s always expected it to speed up, almost every time.
Ajeya Cotra: And they’ve always been wrong. So there’s that dynamic, which is, from their point of view, I think it’s totally reasonable. It’s like even if there isn’t some super knockdown argument in the terms of your interlocutor where you can point to a mistake that they’ll accept, or even if you look at the story and think it’s kind of plausible, you still have this strong prior that —
Rob Wiblin: Someone could have made the same argument in the past.
Ajeya Cotra: This kind of thinking. Someone could have made the same argument about television, someone could have made the same argument about computers. None of these played out.
So I think that’s a big factor. I also think these are complicated ideas and there hasn’t been that much dialogue. I think there could be more, and I think there could be more dialogue that is trying to ground things in near-term observations also.
But yeah, I think that’s a big part of it. I think they have an error theory built in that makes it so that the object-level conversation about, “here’s how the AI could make the robots, and here’s how the robots could bootstrap into more robots” and so on: that whole way of thinking doesn’t feel very legitimate or interesting. Or they have a story where that type of thinking always leads to a bias towards expecting things to go faster than they actually will, because it’s hard for that kind of thinking to account for all the drag factors and all the bottlenecks.
Whereas I think on the other side, people who think things will go faster feel like everyone is always kind of blanket assuming that there are going to be bottlenecks. And then they bring up specific bottlenecks, and those specific bottlenecks, when you look into them, they might slow things down from some sort of absolute peak of 1,000% growth — but they’re not reasons to think that 2% is where the ceiling is, or even that 10% is where the ceiling is. So they also have this kind of error theory of the bottleneck subjection.
Could any evidence actually change people’s minds? [00:22:48]
Rob Wiblin: So it’s incredibly decision-relevant to figure out who is right here. I think almost all of the parties to this conversation, if they completely changed their view, and the people who thought it was going to be 1,000% decided it was going to be 0.3%, they would probably change what they’re working on, or they would think it was a decisive consideration probably against everything that they were doing previously. And vice versa: if people came to think that there would be a 1,000% speedup, then they would probably be a whole lot more nervous and interested in different kinds of projects.
So how can we potentially get more of a heads up ahead of time about which way things are going to go? It seems like sharing theoretical arguments hasn’t been persuasive to people. Is there any kind of empirics that we could collect as early as possible?
Ajeya Cotra: So one thing that I think will not address all of this, but is a step in the right direction, is really characterising how and why and if AI is speeding up software and AI R&D.
So METR came out with an uplift RCT, which I think was the first of its kind, or at least the largest and highest quality. They had software developers split into two groups: one group was allowed to use AI, the other group was disallowed from using AI. And they studied how quickly those developers solved issues like tasks on their to-do list.
It actually turned out that in this case AI slowed down their performance, which I thought was interesting. I don’t expect that to remain true, but I’m glad we’re starting to collect this data now, and I’m glad we’re starting to cross-check between benchmark-style evaluations — where AIs are given a bunch of tasks and scored in an automated way — and evidence we can get about actual in-context, real-world speedups.
I really want to get a lot more evidence about that of all kinds: like big uplift RCTs, and it would be great if companies were into internally conducting RCTs on their own rollouts of internal products to see are teams that get the latest AI product earlier more productive than teams that don’t. Even self-report, which I think has a lot of limitations, is still something we should be gathering.
So I guess my high-level formula would be: look at the places where adoption has penetrated the most, and start to measure speedup in actual output variables. I think it would be really cool if there was a solar panel manufacturing plant that had really adopted AI, and we started to see how much more quickly they could manufacture solar panels, or how much better they could make solar panels.
Rob Wiblin: Yeah. Is it possible to do this at the chip manufacturing level? Maybe that’s the most difficult manufacturing that there is, more or less. So we might think that you get more of an early heads up if you do something that’s more straightforward like solar panels, but we’d really like to be monitoring, across all kinds of different manufacturing, how much difference is any of this making.
Ajeya Cotra: Totally, yeah. I think the most important thing, or the thing I ultimately care about, is the AI stack: chip design, chip manufacturing, manufacturing the equipment that manufactures chips, and then of course the software piece of it too. The software piece is the earliest piece, but I think we should be monitoring the degree of AI adoption, self-reported AI acceleration, RCTs, anything we can get our hands on for the entire stack — because I think the moment when the AI futurists think things are likely to be going much faster coincides with when AI has fully automated the process of making more AI. So that’s really something to watch out for.
And then on a separate track, you also want to just be looking at the earliest power users, no matter where they are, just because you can get insight that transfers to these domains.
Rob Wiblin: Is there anything else we can do?
Ajeya Cotra: I don’t know. I’m really curious about this.
Rob Wiblin: Do I understand right that last year you put out a request for proposals when you were at Open Phil, looking to fund people who had ideas for how we would resolve this question?
Ajeya Cotra: Yeah, I put out a pair of requests for proposals in late 2023. One of them was on building difficult realistic benchmarks for AI agents. At the time, very few people were working with AI agents, and only a couple of agentic benchmarks had come out, including METR’s benchmark that I discussed on the show last time.
So I was really excited about it. I felt like it was a moment to move on from giving LLMs multiple-choice tests to giving them real tasks — like “book me a flight” or “make this piece of software work: write tests, run the tests, iterate until the thing actually works.” And that was a very new idea at the time, but also the time was sort of right for that idea, and there were a lot of academic researchers who were excited about moving into the space. So we got a lot of applications for that arm of our request for proposals, and we funded a bunch of cool benchmarks, including Cybench, which is a cyberoffence benchmark that’s used in a lot of standard evaluations now.
But then we also had this other arm, which was basically like types of evidence other than benchmarks — like surveys, RCTs, all the things we talked about. We got much less interest for that, and I think it just reflects that it’s harder to think of good ways to measure things outside of benchmarks — even though everyone agrees benchmarks have major weaknesses and consistently overestimate real-world performance because benchmarks are sort of clean and contained and the real world is messy and open-ended.
But one thing that I’m excited about that came out of the second RFP is that Forecasting Research Institute is running this panel called LEAP, the Longitudinal Experts on AI Panel. They take like 100 or 200 AI experts, economists, and superforecasters and have them answer a bunch of granular questions about where AI is going to be in the next six months, in the next year, in the next five years — both like benchmark scores, but also things like, “Will companies report that they’re slowing down hiring because of AI?” or, “Will an AI be able to plan an event in the real world?” or these kinds of things.
I’m very excited about that, and I think honestly having people make subjective predictions, explain how those predictions are connected to their longer-run worldviews, and then check over time who’s right might be the most flexible tool we have. So I’m very excited to see where LEAP goes.
But it is challenging to get indicators that are clearly early warnings, so that we can actually do something about it if the people who are more concerned are right, but that are also clearly valid and not easy to dismiss on the other side as just not realistic enough to matter.
The most dangerous AI progress might remain secret [00:29:55]
Rob Wiblin: So as part of this, you’ve been thinking about how one way that this could really go wrong is if the companies that are developing cutting-edge AI may begin to see themselves internally how much it’s helping them and that perhaps it’s speeding them up enormously, but they may decide not to share that information with the rest of the world.
Ajeya Cotra: Yeah, and they may decide not to release those products, if there’s one company that’s well ahead of the others. Like in AI 2027, it was depicted that the company that was ahead in the AI race was so far ahead of its competitors that it could afford to just keep its best stuff internal and only release less good products to the rest of the world.
Rob Wiblin: It could afford it in the sense of it didn’t need to make money by selling the product?
Ajeya Cotra: Its competitors were far enough behind that they couldn’t undercut it or compete with it by releasing a better product. Like in the story, the company in the lead, [OpenBrain], is basically just releasing products that are slightly better than the state of the art of its competitors.
Rob Wiblin: I see. They’re so far ahead that they can just choose to always basically have their product be somewhat better. They can just release whatever level of their own internal machine would be the best to the external world.
But I guess it would be unfortunate if there are people who do know this, but the broader world doesn’t get a heads up — so we could have known six months or a year earlier in what direction things were going, but that was kept secret. Maybe for the leading AI company they’d prefer to keep it secret, but for the rest of us, we would probably prefer that the government has some idea what’s going on.
So you’ve been thinking about what sort of transparency requirements could be put in place that would require the companies to release information that would give the rest of us clues as to where things are going. What sort of transparency requirements could those be?
Ajeya Cotra: I think there’s a whole spectrum of evidence about AI capabilities. On the one hand, the easiest to test but the least informative is benchmark results. And companies do release benchmark results when they release models right now. So they say Claude Opus 4 was released, and they have a model card that says it has this score on this hacking benchmark, it has this score on the software engineering benchmark and so on, as part of a report about whether it’s dangerous. GPT-5 had the same thing.
I think that’s great that they do that. But in my ideal world, they would release their highest internal benchmark score at some calendar time cadence. So every three months they would say, “We’ve achieved this level score on this hacking benchmark, this level score on software engineering benchmark, this score on an autonomy benchmark.”
That’s because, as you said, danger could manifest from purely internal deployment, because if they have an AI agent that’s sufficiently good at AI R&D, they could use that to go much faster internally, and then other capabilities and therefore other risks might come online much faster than people were previously expecting. So it’s not ideal to have your report card for the model come out when you release it to the public, unless there’s some sort of guarantee that you’re not sitting on a product that’s substantially more powerful than the public product. So maybe it’s fine to release your model card and system card along with the product if you also separately have a guarantee that you won’t have too much of a gap between the internal and the external.
So on the end of things that are currently discussed, that’s how I would tweak information that’s currently reported to be somewhat more helpful for this concern.
But then there’s a bunch of other stuff that is not currently reported that ideally it would be really great to know. Stuff like how much and how are they using AI systems internally. One thing I’m very interested in is companies will sometimes report, kind of to brag, about the percentage of lines of code that are written by their AI systems. Various CEOs have said, “Internally, 90% of our lines of code are written by AIs” and things like that. I think it’d be great to have systematic reporting of those kinds of metrics.
But those metrics aren’t the ideal metric I’d be interested in. One thing I’m interested in is what fraction of pull requests to your internal codebase were mostly written by AI and mostly reviewed by AI — so humans are not involved for the most part in both sides of this equation. I’d be very interested in watching that number climb up, because I think it’s an indication both of AI capabilities, and of how much deference they’re giving to AIs.
And eventually, if things are going to go crazy fast, the AIs have to be doing most things, including most management and approval and review, because if humans have to do that stuff, then things can only go so fast. So I really want to track how much higher-level decision-making authority is being given to the AIs in practice inside the companies.
I think there are probably a bunch of other things that we could send basically as a survey: How much do you use AIs for this type of thing, for that type of thing? How much speedup subjectively do you think you get? If you’re running any internal RCTs, I would of course love to know the results of that.
Rob Wiblin: What about just requirements that, inasmuch as they’re training future generations of AI models, they have to reveal to at least some people in the government how they’re performing on normal evals of capabilities? So they can kind of see the line going up, even if they’re not releasing it as products for whatever reason. And if the benchmarks start curving upwards far above previous expectations, then that could lead them to sound the alarm?
Ajeya Cotra: Yeah, I think that is a good thing to do, but I don’t think that just benchmarks alone will actually lead anyone to sound the alarm. Because the thing with benchmarks is that they saturate.
Rob Wiblin: They always have that S-curve shape.
Ajeya Cotra: They always have the S-curve shape. And the benchmarks we have right now are harder than the previous generation of benchmarks. But it’s still far from the case that I feel confident that if your AI gets a 100% score on all these benchmarks, then it’s a threat to the world and it could take over the world. I still think the benchmarks we have right now are well below that. So what’s probably going to happen is that these benchmarks are going to get saturated, then there’s going to be a next generation of benchmarks people make, and then those benchmarks are going to tick up and then get saturated.
So I think we need some kind of real-world measure before we can start sounding the alarm. And the ultimate real-world measure is actually just observed productivity. Like if they are seeing internally that they’re discovering insights faster than they were before, then that’s a very late but also very clear signal. And that’s the point at which they should definitely sound the alarm, and we should know what’s happening.
Rob Wiblin: How is this idea being received by the companies? On the one hand, it seems like transparency requirements is the regulatory instrument that the companies have objected to the least. It’s the one that they’ve been most willing to tolerate. On the other hand, the whole message of, “We don’t trust you to share information with the rest of the world, and we think that you might screw us over basically by rushing ahead and deliberately concealing that,” I could imagine that could be a little bit offensive to them. Or at least if that is their plan, then they probably want to find some excuse for not having this kind of oversight.
Ajeya Cotra: I think that the response just tends to differ based on the actual information that’s being asked for. So benchmark scores, like I said, they release at the point of releasing a product — which I think is fine for now, but I would like to move it to a regime where they release benchmark scores at some sort of fixed cadence even if they don’t have a product release. Benchmark scores are not considered sensitive information.
But this other stuff that I think is a lot more informative on the margin is much more fraught. They don’t necessarily want to share with the world the rate at which they’re gaining algorithmic insights, because you want to maintain some mystery about that for competitive reasons. It’s risky for you if it’s a little bit too fast, because then competitors will start paying more attention to you and trying to copy you and trying to find out what’s going on. It’s also risky for you if it’s too slow, because then that’s kind of embarrassing.
Rob Wiblin: Investors lose heart.
Ajeya Cotra: Yeah, investors lose heart. And another thing I didn’t mention earlier is that I would really like them to be reporting their most concerning misalignment-related safety incidents. So has it ever been the case that in real-life use within the company the model lied about something important and covered up the logs? I really want to know that. But then of course it’s clear that reporting that is very embarrassing to companies.
One thing that might help here is that there are a number of companies now, so perhaps they could report their individual data to some sort of third-party aggregator, that then reports out an anonymised overall industry aggregate score. But I don’t think that solves all the issues, because there are few enough of them that people would be able to guess.
So I think there’s a lot of competitive challenges and IP sensitivity challenges and just PR challenges to overcome here with some of the more penetrating internal information. But I think it’s important enough to the public interest that we should try and find a way to navigate that.
Rob Wiblin: Yeah. So it’s not unusual for government agencies to be able to basically demand commercially sensitive information from companies for regulatory or governance purposes. I actually worked at one when I was in the Australian government. I was at the Productivity Commission, which had extraordinary subpoena powers to basically demand almost any documents from any company in the country. A rarely used power, but it wasn’t the only agency that had that capability.
Ajeya Cotra: And what kinds of things would you ask them?
Rob Wiblin: Well, I never actually saw this power being used. I guess people were proud of the fact that we had that authority. But I think you would usually do it for competition reasons, trying to tell whether companies are colluding, potentially, or whether there’s an insufficient degree of market competition and there would be a reason to intervene.
I would imagine almost certainly there’s government agencies in the US that have a similar remit. So if they actually could keep that kind of information secret, then maybe the companies would be more happy to share it with people who were specialised basically in reading this, comprehending this data and figuring out what to do with it.
Ajeya Cotra: Yeah, I think that could be a solution, but I’m a little sceptical. I think that releasing this information publicly is probably a lot better than releasing it just to a government body — basically because we’re building the plane of AI safety research as we’re flying it, and it’s not like there’s a box-checking exercise that any kind of government agency that’s often understaffed, especially with technical staff, could do.
It’s more like we want this information out there in the open, and then we want people to do some involved analyses of it. And our sense of what information we even want is probably going to be shifting over time, and it’ll probably go better if there’s a robust external scientific conversation about what indicators we want to see and what that would mean and when we should trigger an alarm. And if that’s all being routed through governments with like 10 people or even 50 people who have to deal with it, I think it would be very hard for them to interpret the evidence quickly enough and well enough and be confident enough to sound the alarm and then have people actually listen to them.
Like if I imagine sounding the alarm on something like the intelligence explosion, I kind of picture it having to be a society-wide conversation. Kind of like sounding the alarm about COVID. Or something I have in my mind is when Joe Biden had that disastrous debate performance that led to weeks of conversation that ultimately led to him being removed from the ticket: it would have been very hard, I think, for a small, narrow group of people sort of entrusted with the authority to make the same thing happen.
Rob Wiblin: Because you want common knowledge, and you want lots of attention focused on the issue, as well as just some technocrats being aware.
Ajeya Cotra: As well as the opportunity for a bunch of technical experts who may not be paying that much attention now, because maybe they think this stuff is all science fiction, to jump in at that moment and offer their takes. I think it would be very powerful if someone like Arvind Narayanan, who’s known for being very sceptical of these stories, actually looked at the data, changed his mind, and said, “Yeah, this is happening now, and it’s dangerous.”
It’s very hard to get those kinds of common knowledge dynamics if everything is just sent to governments. That said, of course I think sending things to governments is better than not sending it anywhere. So I also think that’s good.
Rob Wiblin: So inasmuch as the plan A would be that we want them to be sharing this information such that anyone in the public can find out, I guess they’ll probably resist any legislation imposing this to some extent, for partially legitimate reasons that it is probably going to be frustrating for them. Inasmuch as people are trying to set priorities for what sorts of asks do you make and which sort of fights do you pick, would this be very high on the list for you?
Ajeya Cotra: I laid out a whole spectrum of ideal information-sharing practices, and I don’t think going all or nothing on that whole package is a top priority fight to pick. But I think the algorithm of thinking really hard about what pieces of information we would want to know in order to know for ourselves if the intelligence explosion was happening, and getting the highest-value items on that list or the biggest bang for buck items on that list, to me feels very high.
And I think that’s the strategy that people working on AI-safety-related legislation have landed on. The RAISE Act in New York and SB 53 in California are both quite transparency oriented, and both oriented around, for example, whistleblower protections, which are an important policy plank underlying transparency.
Rob Wiblin: Do you think that information about an emerging intelligence explosion might just leak out to the public anyway, because staff at the companies would feel uncomfortable with that proceeding in secret?
Ajeya Cotra: I think that’s very plausible. I still think that information that leaks in the form of rumours in San Francisco, like tech bro parties, doesn’t have the ability to impact policy and decision making all the way in DC or London or Brussels in the same way as information that is just clearly unrefuted and very salient and sort of official.
So I think that the AI safety scene in the Bay Area has benefited from having close social ties to people who work at AI companies, getting a sense of what might be coming around the corner. But that’s not something that you can use to really pull an alarm or advocate for very costly actions, so is it really enough? We need more.
White-knuckling the 12-month window after automated AI R&D [00:46:16]
Rob Wiblin: So let’s imagine that, via whatever mechanism, society does get a heads up that we are starting to see the early stages of an intelligence explosion. What would we do with that heads up?
Ajeya Cotra: I think one just extremely important factor is: at that point in time, how good are AI systems at everything besides AI R&D? So the alarm has sounded, and we learn that AI has fully or almost fully automated R&D at the leading AI lab, perhaps all the AI labs. This is causing those labs to go way faster than they were going with mostly human-driven progress in the previous era.
So at that point in time, whatever AI progress you thought was going to be made by default in the next 10 years — or the next 20 years, next 30 years — might be made in a year or two, or even six months, depending on how much AI is speeding everything up. At this stage, AIs might not be that dangerous — but we might be about to move very quickly through the point in time where they’re not so dangerous to the point in time where they have godlike abilities.
I think that what we want to do as a society, if we gain confidence that we’re at the starting point of this intelligence explosion, is to redirect as much of that AI labour as we can from further AI R&D to things that could help protect us from future generations of AIs — both in terms of AI takeover risk and also in terms of a wide range of other problems that might be created for society by increasingly powerful AI.
And at that point, it’s still not in the narrow, selfish interests of whichever company is in the lead to do that — because if they were to slow down unilaterally, then someone behind them could catch up. But hopefully, if the alarm has sounded and we have a clear picture of we have six months or 12 months or 18 months until radical superintelligence, then this might be a window of opportunity to coordinate, to use AIs for protective activities instead of further AI capability acceleration.
Rob Wiblin: The challenge we have is AI is becoming much smarter very quickly, and we feel very nervous about that. And the opportunity that’s created is that we have a lot more labour and we have much smarter potential researchers than we did before. So why don’t we turn that new resource towards solving this problem that I guess at the moment we don’t really know how to fix?
I think some people who are not too worried about AI look at society as a whole, or they look at history and they say that technology has enabled us to do all kinds of more destructive things, but we don’t particularly feel like we’re in a more precarious situation now, or at much greater personal risk now than in 1900 or in 1800, because advances in destructive technology have been offset by advances in safety-increasing technology — and on balance, probably things have gotten safer.
So the idea is that potentially it’s going to be a vertiginous time, but perhaps we could pull off the same trick in this crunch time period?
Ajeya Cotra: Yeah, and I think that a lot of people who are more concerned about AI risk are very dismissive of this plan. It sort of sounds like a crazy plan. It’s really flying by the seat of your pants, like expecting the thing that’s creating the problem to solve the problem.
But in a sense, I do think humanity has repeatedly used general purpose technologies that created problems to solve those problems. Like automobiles, something as mundane as that: cars created the opportunity for there to be carjackings and for there to be drive-by shootings, and it empowered bad actors in various ways. But of course, if the police and law enforcement have cars as well, that is a balance.
When you imagine a future with some crazy new advanced technology, and you imagine all the problems it creates, it can be hard to, with the same level of detail and fidelity, imagine all the responses to those problems that are also enabled by that technology. So you could imagine someone worrying about the rise of fast vehicles, and neglecting to think about how all the ways that they cause bad things could be kept in check by people using vehicles for law enforcement and similar.
Similarly with computers. You can hack things with computers, but computers also enable you to do a lot of automated monitoring for that kind of hack and automated vulnerability discovery. And different kinds of law enforcement: you couldn’t imagine a police force not using computers. So I do think the basic principle is sound, that if you’re worried about problems created by technology, one of the first things on your mind should be how can you use whatever that new technology is to solve those problems.
But I think that this is an especially narrow window to get this right. And you’re not imagining cars creating broad-based rapid acceleration of all sorts of new technologies and potentially just a 12-month window or two-year window or six-year window before everything goes totally crazy.
So I do think that it’s important to not blow through that window, to monitor as we’re approaching it and to monitor how long we have. But I think I’m fundamentally fairly optimistic about trying to use early transformative AI systems, like early systems that automate a lot of things, to automate the process of controlling and aligning and managing risks from the next generation of systems, who then automate the process of managing those risks from the generation after, and so on.
Rob Wiblin: Yeah. It’s interesting that you say that this approach has often been dismissed, because I feel it’s very in vogue now. Every couple of days someone presents it or I read something about it in one guise or another.
I guess one reason why years in the past it might have felt unpopular is people were mostly focused on the issue of misaligned AI: they were concerned about an AI that has it in for you and would like to take over if it had the opportunity. And that’s maybe the worst application of this out of all of them, because you’re asking the AI to align itself, but you don’t know whether it’s assisting you or trying to undermine you.
I mean, you could try to make that work. People have suggested proposals where you could try to get useful, honest work out of an AI that doesn’t want to help you. But it’s a lot easier to see how you potentially solve problems other than alignment. If the alignment part we feel like we’ve got a good handle on, but there’s a huge list of other problems that are being created during the intelligence explosion — like the fact that AI now, if people get access to it, could invent other kinds of destructive technologies that we don’t yet have good countermeasures for: in that case, it’s just clear how the AI could just help you figure out what the countermeasures ought to be.
Ajeya Cotra: So I don’t think that I agree with this. I do think the prospect that these early transformative AIs are misaligned is a huge obstacle to this plan that needs to be shored up and handled and specifically addressed.
I don’t think that it necessarily bites harder for getting the AIs to do alignment research than for getting the AIs to do anything else helpful — because if they have it out for you, they don’t necessarily want to help you shore up your civilisation’s defences. So if you’re imagining trying to get a hardened misaligned AI to help you with biodefence, if it’s misaligned, and it, for example, wants the option of threatening you with a bioweapon in its arsenal in the future, it would similarly have an incentive to do a bad job at that, as it would to do a bad job at alignment research.
So in general, I think there’s one big concern, which is: will the AIs that we’re trying to use at that point in time have motivations that give them incentives to undermine the work we’re trying to get them to do? I think they certainly would have incentives to undermine alignment research if they were misaligned, but I think they would also have incentives to undermine efforts to make ourselves more rational and thoughtful, like AI for epistemics — because if we’re more rational and thoughtful, then maybe we’ll realise they’re probably misaligned, and that would be bad for them. They would also have incentive to undermine our defensive-acceleration-style defensive efforts, because that would make it harder for them to take over.
Rob Wiblin: That makes sense. I think the distinction I was drawing is, for people who thought that the alignment problem was extremely hard to solve and we were way off track to solving it, the idea of getting the AI to solve the problem is kind of self-contradictory because I wouldn’t trust the AI at all. Anything that it proposed I would assume it was sabotaging us.
If you’re on the side of thinking that the alignment problem is actually the easier part of things, think that’s a relatively straightforward technical problem that we are on track to solve, but there’s this laundry list of 10 other issues, it’s then very obvious that we’ll have the brilliant AGI, so why don’t we just use that to solve all the other things? And also I’m inclined to trust it and believe it.
Ajeya Cotra: Yeah. I do think that if you are not worried about alignment at this early stage, everything becomes easier. It becomes an even more attractive strategy and path. But I think the canonical “using AI for AI safety” or “using AI for defence” plan does imagine that we’re not sure at the beginning that they’re aligned. We may not be highly confident that they’re extremely misaligned and fully power-seeking and looking to take over at every opportunity, but we’re not imagining that we know with confidence we can trust them.
So figuring out how to create a setup where we use control techniques and alignment techniques and interpretability and whatever other tool at our disposal to get to the point where we feel good about relying on their outputs is a crucial step to figure out — because it either bottlenecks our progress, because we’re checking on everything all the time and slowing things down, or it doesn’t bottleneck our progress, but we hand the AIs the power to take over.
Rob Wiblin: Which specific problems arising from the intelligence explosion are you envisaging wanting to get the AGI to help us out with?
Ajeya Cotra: One obvious one is just AI alignment: How can we ensure that either these AIs that we’re using to help us right now, or future generations of AIs, and future generations that those AIs help us to create, how can we ensure that that whole chain is motivated to help humans, and is honest and is basically doing what we say and steerable? That is sort of the foundation of everything else.
But then there are also other things that are not really about AIs at all, that are just about broad societal defences. So if we think that the advent of extremely powerful AI will create a flood of new cyber vulnerabilities that are quickly discovered in a bunch of critical systems, like weapons systems and the power grid and so on, can we preemptively use those same AIs that are good at finding those vulnerabilities to find and patch them before bad actors can use the AIs to find them?
Another thing is biodefence. So you had my colleague Andrew on your podcast recently that talked about his ambitious plan to rapidly scale up detection of novel pathogens, rapidly scale up medical countermeasures when they’re detected, and rapidly scale up the manufacturing of PPE and cleanrooms and things like that. If we have AI systems that are good at that kind of research problem, and also maybe we have at that point robots, so a lot of that manufacturing itself can be automated and can go a lot faster than if humans had to do that stuff, that would be a big boon to biodefence.
Then there’s somewhat more speculative things. You can think of this as like a psychological defence, maybe, but there’s stuff around, can we use AIs to make our collective decision making a lot smarter, a lot wiser, a lot better? Can we make it so that we’re better at finding truth together? Can we make it so that we’re better at coming to compromise policy solutions that leave lots of people happy?
Rob Wiblin: How do you ensure that advances in AI don’t lead to a war between the US and China, that kind of thing?
Ajeya Cotra: That too. But even more mundanely, stuff like how, over the last 10 or 15 years, social media has led to a degradation of political discourse. Could AI tools help you just find the policy from among the vast space of possible policies that a large number of people actually like and can credibly put trust in, and so on?
Rob Wiblin: So I interviewed Will MacAskill and Tom Davidson from Forethought earlier in the year. Their organisation has a long list of what they call “grand challenges,” which they suspect all of them are probably amenable to this kind of AGI labour during crunch time. I think other ones are:
- Ensuring that society doesn’t end up locked into particular values that prematurely cuts off our ability for further reflection and changing our mind.
- The potential use of AI or AGI inasmuch as it’s very steerable and follows instructions to be used in power grabs by the people who are operating it.
- There’s space governance, this question of if we actually do start to be able to use resources in space, how would we share them? How would we divide them such that in particular there’s not conflict ahead of time — because people anticipate that once you start grabbing resources in space, you’re on track to become overwhelmingly dominant.
- There’s epistemic disruption, which you mentioned.
- New competitive pressures concerns that you can end up in a sort of Malthusian situation if you have competition between many different AIs.
And possibly some others that are missing here. We don’t know which of these are going to loom large at the time. Some of them might feel like they’ve kind of been addressed, or perhaps that we were hallucinating issues that aren’t so severe. But there’s many different ways that we could potentially apply it.
Ajeya Cotra: Yeah, I agree. All of those problems that Tom and Will highlighted seem like real problems to me. Maybe my approach would be to, from our current vantage point, lump a lot of that under “AI for helping us think better and helping us find solutions that we’re mutually happy with.” So it’s like AI for coordination, compromise, negotiation, truth seeking, that cluster of things.
Because something like the question of space governance, like how do we divide up the resources of space if there are some existing factions that have an existing distribution of power? No one really wants the sort of destruction that comes from everybody racing as hard as possible to get there first. But there’s a complicated space of negotiated options beyond that, and I think AIs could potentially help a lot with that sort of thing.
Rob Wiblin: You said in your notes that you think this approach is basically what all of the frontier AI companies say. This is their safety plan, more or less. Is that right?
Ajeya Cotra: Yeah, I would think so. If you look at public communications from at least OpenAI, Anthropic, and Google DeepMind, this jumps out more or less in these different cases. But in all of their stated safety plans, you see this element of, as AIs get better and better, they’re going to incorporate the AIs themselves into their safety plans more and more. I think some are more explicit than others about expecting some specific crunch time that occurs when AI is rapidly accelerating AI R&D, but everybody is picturing AI is playing a heavy role in the safety of future AIs.
Rob Wiblin: What assumptions are necessary for this approach to make sense? Or what kinds of setups could actually just make it a bad plan?
Ajeya Cotra: I think fundamentally you need it to be the case that there exists a window of opportunity, before AIs are uncontrollably powerful or have created unacceptable levels of risk, where they are really capable and really change the game for AI safety research. And that there’s some meaningful window of time where you can notice as you’re approaching it. And even by default, without crazy slowdown, it lasts at least six months or lasts a year.
If you think instead that once your AI hits upon some generality threshold, it, within a matter of days or weeks, becomes crazy superintelligent, this plan doesn’t work.
Rob Wiblin: There’s no time to respond.
Ajeya Cotra: You wouldn’t even notice probably before it’s too late.
Then I think there can also be unlucky orderings of capabilities where this plan wouldn’t work. You could have AIs that are really specifically good at AI R&D and they’re really not good at anything else, not even AI safety research that’s very similar to AI R&D. They’re just extremely good at AI R&D. Maybe the only thing they’re good at is making it so that future generations of AIs have better sample efficiency and can learn new things more efficiently.
Then you could have a period of six months or a year where you know this is happening and you have these AIs, but you’re still sort of hurtling towards a highly general superintelligence without being able to use these AIs for anything else necessarily, because they’re just not good at anything else.
Rob Wiblin: There’s something that’s a bit self-contradictory about that, because an AI that is extremely smart but all it can do is improve the sample efficiency of the next model is in a sense not very troubling in itself. Because it doesn’t have general capabilities, that kind of model isn’t going to be able to take over or invent other technologies. It’s only at the point that it has the broader capabilities, broader agency, that it actually is able to make problems.
But I guess you’re saying you could have a long leadup where that’s all that it can do. And then at the last stage…
Ajeya Cotra: Yeah. And then at the last stage it might go back to the first stage scenario I talked about, where it’s like the narrow AIs that are just like savants at AI R&D hit upon an algorithm, in almost like a blind search — almost like if you imagine AlphaFold: it is brilliant at figuring out how proteins fold, but it isn’t broadly aware — you could imagine such AIs or an algorithmic search process hitting upon an architecture or a training strategy that then can go foom really quickly.
And so in this leadup you’re like, yep, AI is accelerating AI R&D, it’s crunch time. We have six months left, we have three months left. But these AIs are not the AIs that you can use for anything useful.
Rob Wiblin: I guess many of the problems that we’d like it to help with — social issues, political issues, philosophical issues in some cases — what do you think are the chances that the AI companies… I think they’re working harder to make them good at coding and to make them good at AI research than any other particular thing. And those are more concrete, measurable problems than solving philosophical questions. So it seems like it is really a live risk that, unfortunately, the balance of capabilities will end up being pretty disadvantageous for this plan.
Ajeya Cotra: Yeah, I think that the further afield you go from work that looks like doing ML research and software engineering, the greater a penalty there will probably be. The AIs currently are much better at helping my friends who do ML research all day than me. I do weird thinking and go on these kinds of podcasts and write emails to people, making grant decisions and stuff like that. It’s much worse at that stuff. You can see already that it’s got a very specialised skill profile.
Fortunately, I do think that there’s a big chunk of AI safety research that does look very similar to ML research. And my friends who are getting big speedups from AI are safety researchers and they’re doing the kinds of work — control, alignment, et cetera — that I think will be some of the most important things you want these AIs to be helping with at the very beginning.
But yeah, stuff like AI for epistemics, AI for moral philosophy, AI for negotiation, AI for policy design: all that stuff just may not be that good. It doesn’t necessarily have to be good by default, and that’s a big concern of the plan.
Rob Wiblin: Another worry would be that the AI models end up being able to cause trouble before they end up being capable enough to figure out solutions.
A classic case there would be: imagine that we put a lot of effort into — I guess it would be a bit stupid to do this — training an AI model that’s extremely good at developing new viruses or new bacteria, basically changing diseases to make them worse. I mean, there are people who are using AI to develop new viruses. I guess they’re using it to develop medical treatments, but that sort of stuff can then be repurposed for other things. But if that highly specialised model arrives first, before you end up with a model that has a sufficient understanding of all of society and biology and medicine to figure out what the good countermeasures are, then we’ll need a different approach than this one.
Ajeya Cotra: Yeah. And in general, I think of AIs doing defensive labour as a prediction about the world that you want to try and be thinking about as you make your plans. It’s not a guarantee, and in many cases the answer will be to specialise now in doing the kinds of things that might be hardest for the AIs to do then.
And I think stuff like building a bunch of physical infrastructure to stockpile a bunch of PPE and vaccines and things like that is a prime candidate for something that just inherently takes a long lead time, and that the AIs might not be that advantaged at at the point that they’re good at doing the scary things that it’s meant to protect against.
Rob Wiblin: Yeah. That was going to be another concern of mine: that inasmuch as the AIs are very helpful, you might imagine that they’re very helpful at the idea-generation or the strategising stage, but they might still be quite bad at actually running a business or actually figuring out how to do all of the manufacturing.
So they could come up with a great strategy for countervailing new bioweapons, where they’re like, “Here’s the widget that you should use. Go and make 10 billion of them.” Then we’re like, “Can you help us with that?” It’s like, “No, I’m not very good at that. Good luck.”
Ajeya Cotra: Yeah. I think that in general, you should expect AIs to be much better at things that there are tighter feedback loops on, where you can recognise success after a short period of time. That’s one of the reasons why they’re really good at coding, because you can just train them on this very hard-to-fake signal of: did the code run after you did whatever you did with it?
And in general, I think idea generation versus actually executing on a one-year plan has some of this element of, you can read a white paper and be like, yeah, that’s pretty good. And you can push the thumbs-up button and generate an AI that’s pretty good at generating white papers that you think are neat and probably would work. But it’s much harder to train the AI to run the team of thousands of humans and robots that are actually executing on the plan.
AI help is most valuable right before things go crazy [01:10:36]
Rob Wiblin: Why is the crunch time aspect, or the intelligence explosion taking off actually even relevant to when we would want to start doing this? Because you might just think, if AI can help us do research or do work to solve any of these problems, then as soon as it’s able to do that, we want to do it — whether or not an intelligence explosion is kicking off or not.
Ajeya Cotra: To some extent, that’s right. I think the reason that I focus so much on the intelligence explosion is twofold.
One is because at that point I think we might have a pretty short clock to figure out a bunch of stuff, and the default trajectory might look like 12 months to extremely powerful, uncontrollable superintelligence that could easily take over the world. It changes our calculus of you want to focus on very short-term things rather than things that have long lead times, at least at crunch time, if not before.
The other thing is that I think crunch time can help alleviate some of the challenges we’ve been talking about with AIs not being good at the full spectrum of things we want them to be good at — because by definition, at that point, AIs are really good at further AI R&D. And one of the things we could do with AIs that are good at AI R&D, at least in most cases, is to try and direct their AI R&D towards filling out the skill profile of AIs, and getting them to be good at some of the types of things that we want them to be good at that they aren’t so good at right now.
So at that point, you might have just much more capability at your disposal, and it might be much more worth putting in the effort to try and fine-tune and scaffold and do all these other things to make your AI that’s good at moral philosophy or your AI that’s good at biodefence.
Rob Wiblin: So you’re thinking about this strategy not just as a description of what other organisations potentially should work on, or as a description of what AI companies are already planning to do, but also because you think maybe this should influence what Open Philanthropy plans to do over coming years. And potentially that Open Philanthropy’s best play might be to have billions of dollars waiting at this relevant crunch time and then disperse them incredibly quickly, buying a whole lot of compute to get AIs to solve these problems.
Ajeya Cotra: Yeah. I mean, just like how right now 80%+ of our grant money goes to salaries to pay humans to think about stuff and do research and do policy analysis and advocacy and all these other things, so too in a few years it might be the case that AIs are better than most of our human grantees, and our money should mostly be going to buying API credits or renting GPU time to get the AIs to do a similar distribution of activities.
Rob Wiblin: An alternative approach to this would be that, at the point that we get a heads up that we think an intelligence explosion is beginning to take place, we do everything we can to pause at that stage, to slow down, basically to arrest that process — so that rather than having to rush in three or six months, get the AIs to fix all of these issues, we buy ourselves a bunch more time. Why not adopt that as the primary approach instead?
Ajeya Cotra: I think that the plan I described is compatible with pausing right at the brink of an intelligence explosion. In fact, I would hope that we do that, because I think by default, having 12 months to get everything in order is just not enough time.
But I think of it as doing two things. One is making the pause less binary. So if you think of the default path as almost 100% of AI labour goes into further rounds of making AIs better and making more AIs and making more chips and so on, and you think of a pause or a stop as 0% of the world’s AI labour is going in towards those activities, there’s a whole spectrum between 0% and 100%. Then I think of it as doing another thing, which is sort of answering the question of what you do in the pause. You do all this protective stuff and you have these AIs around to do it with.
And once you have that frame of making the pause less binary and thinking really hard about what you do during a pause, I think you might often end up thinking that it’s worth going a little bit further with AI capabilities — because, especially if we tilt the capabilities in a certain direction, we might at the end of that get AIs that are much better than they are right now at biodefence while still not being uncontrollable, still not being that scary.
And you can imagine a bunch of little pauses and little redirections and so on during that whole period. I would hope that at some point in the period we do activities like policy coordination and so on that cause us to have longer in this sweet spot of AIs that are powerful enough to help with a lot of stuff, but not so powerful we’ve already lost the game.
Rob Wiblin: We should probably clarify that although you think this is among our best bets, in an ideal world, you think that we would go substantially slower through all of this — because as good a plan as this might be, we’ll really be white-knuckling it and not be confident that it’s necessarily going to work.
Ajeya Cotra: Yeah. I think that if a really clear early warning sign triggers that we are about to enter into this intelligence explosion, fast-takeoff space — where we go in the space of 12 months from AI R&D automation to vastly superhuman AI — then I would vote for, at that time, shifting that trajectory to be 10 times longer or even longer than that, and trying to make that transition as a society in 10 years instead of one year, or 20 years instead of one year.
This is maybe a bit of a quibble, but I still wouldn’t advocate for pausing and then hanging out for 10 years and then unpausing, because I actually think that slowly inching our way up is better than pause, then unpause, and then having a jump.
But going back to what we said about how your default expectations of trajectories influence what you think should happen, I think the default is going through this in one year. And I would certainly rather it be 10 or 15 or 20 years. But I think that the frame of using AIs to solve our problems applies regardless of whether you’re white-knuckling it in one year, or maybe eking out an extra two months, or if you manage to get the kind of consensus and the common knowledge that allows the world to step through it in 10 years.
Rob Wiblin: Yeah. Inasmuch as we’re slowing down to do something, this is a big part of the thing that we’re slowing down to do.
Ajeya Cotra: Yeah.
Rob Wiblin: So this is a big part of the companies’ plan for technical alignment. If this doesn’t work out, why do you think it’s most likely to have failed for them?
Ajeya Cotra: I think that if it fails, it’s probably most likely to fail because they just didn’t actually do a big redirection from using AIs for further AI capabilities to putting a lot of energy towards using them for AI safety.
Because they say this is their plan, but they don’t really have any quantitative claims about, at that stage, what fraction of their AI labour — or their human labour, for that matter — is going to go towards the safety versus the further acceleration. And they’ll be facing tremendous pressure at that point from their competitors to stay ahead.
So my guess is that, unless they have much more robust commitments than they have right now, they probably just won’t be directing that much of their AI labour. So if they have 100,000 really smart human equivalents, maybe only 100 of them are working on AI safety — which is maybe still more than they had before in human labour, but not that much compared to how quickly things are going.
Rob Wiblin: Unless they have really strong commitments. I guess other mechanisms would be that it’s legally required: at this point, the government basically insists that most of the compute go towards this, or at least most of it is not going towards recursive self-improvement.
Or I guess if the companies could reach some sort of agreement where they’re saying, “We would all like to spend more of our compute on this kind of thing, so we’re going to have some contract where we’re going to spend like 50% of all of our compute and then we don’t lose relative position in particular.”
Ajeya Cotra: I think that particular contract is probably going to run into big antitrust issues.
Rob Wiblin: Might be a little illegal, but maybe we could carve out an exception to antitrust with this one. I guess a different mechanism, inasmuch as the government is taking a massive interest, they could help to try to coordinate this one way or another.
Ajeya Cotra: Yeah, I think that’s a possibility. I do think it’s a bit tough. This is not the kind of thing it’s super easy to make laws about, because it’s really not a box-checking exercise. When you write the legislation that half the compute must be spent on safety rather than capabilities, what do you count as safety research?
And how are you enforcing this? Do you have auditors in there being like, “What are you working on? What are you working on?” to all the team leads in the companies? You know, checking off that they have that it’s 50% safety. I could imagine stuff like that. I think it would require extremely technically deep regulators that we just don’t really have right now.
Rob Wiblin: I thought that you might say that the most likely reason for this to fail was that it just turned out that alignment is incredibly hard. That you get egregious misalignment even at relatively low levels of intelligence, and we don’t really figure out how to fix that early enough to get useful work out of them.
Ajeya Cotra: I think that’s a possibility. I don’t think it’s the most likely way it fails. On my views, the most likely way it fails is that they don’t go super hard on it. But it’s also plausible that they’re just trying to get the AIs to help with alignment, and the AIs are just misaligned and the control procedures and other things are ineffective. So they deliberately only help with further AI R&D, and don’t help with alignment and safety and biodefence and all these other things you’d want them to help with.
I would hope that at that stage the transparency regime is strong enough that that fact is broadcast really widely, and then that could inspire a change in policy that causes us to slow down. But then in that world, it’s a bad world even if we do slow down a lot, because we’re just on our own. We have to do this stuff without the AIs’ help, because we can’t get them to help us.
But I’m actually reasonably bullish about control techniques getting early AIs that are not super galaxy-brained superintelligences to be helpful for a range of stuff that they’re good at.
Rob Wiblin: Another way that they could end up actually just not making that much of an effort is if the window is relatively brief, and it just takes a long time to get projects off the ground, and they haven’t really planned this ahead. So they end up debating it back and forth, and then by the time they’ve figured out that they actually do want to do this…
I suppose it’s nominally in these various papers, but I wonder whether they actually are thinking ahead about how this would feel, and whether they’ll have the decision-making capability to decide to redirect enormous resources towards this other effort.
Ajeya Cotra: Yeah, I do think anything that requires a large corporation to be super discontinuous in something it’s doing is facing big headwinds as a plan. So I would hope that they’re smoothly increasing the amount of internal inference compute that is going towards safety as the AIs get better and better, so that the jump doesn’t have to be huge at that final stage.
And if we could elicit honest reports without creating perverse incentives, that’s something I’d want to know about: how much human labour is going to safety versus capabilities, and how much internal AI inference is going to safety versus capabilities, and how much fine-tuning effort is going to safety versus capabilities. I think they have a much better shot if they’re stepping it up over time on some kind of schedule.
Foundations should go from paying researchers to paying for inference [01:23:08]
Rob Wiblin: OK, so that’s the AI companies — who I guess we’re imagining would mostly be focused on this strategy for AI technical alignment.
But you’ve been thinking about this more in the context of Open Philanthropy and what niche it could fill. What would Open Philanthropy need to do if dumping billions of dollars onto this plan became its mainline strategy?
Ajeya Cotra: I think that for now the biggest thing we need to do is very similar to the biggest thing I think society needs to do for preparing for the intelligence explosion: really trying to track where we’re at right now in terms of how useful AIs are for the work that we do and the work our grantees do. And pushing ourselves to automate ourselves and pushing our grantees to automate themselves, and tracking how good is AI at the stuff Forethought does, how good is AI at the stuff that Redwood Research or Apollo does, how good is AI at the stuff that our policy grantees do?
That’s one thing: just socialising within ourselves that it’s probably a big deal when the AIs start to get really good at any given good thing we’re funding. And once we start to see signs of life there, we should be prepared to potentially go really big on that. Like you said earlier, I do think crunch time isn’t 100% a special thing. We absolutely shouldn’t be waiting until crunch time to do anything at all; it’s just the prediction that crunch time is the point when a lot of things that were hard to automate before become easier to automate.
So if it turns out, for example, that AI is really good at math research, which I think is plausible, then maybe we should be trying to deliberately shift our technical grantmaking towards more mathy kinds of technical grantmaking, because that is an area where you can churn a lot more, that’s just so much more tractable.
So I think just having a function that is looking out for these things, and is maybe just poking Open Phil and Open Phil’s grantees to consider shifting their work towards more easily automatable things, like consider repeatedly testing whether their work can be automated is a big thing.
Then I could imagine down the line something like even just having separate accounting for the rest of our grantmaking versus grantmaking that is going towards paying for AIs for our grantees. We already pay for ChatGPT Pro subscriptions and ChatGPT API credits for tonnes and tonnes of grantees. I think making it a bit more salient in our minds, what fraction of our giving is going towards that, and do we endorse its size? And is there any place where we should be going bigger, and are we on track? Is the percentage climbing the way we think it should be? Does that seem in line with the way AI capabilities are climbing? If we think crunch time is going to start in six years, are we on track to have inference compute be a large fraction of our spending at that time?
Rob Wiblin: If I think about this psychologically, I could imagine, if I was leading Open Philanthropy or I was one of the donors being advised — and we did have these transparency requirements and we did start getting a sense that an intelligence explosion might be kicking off — I could imagine dithering for a long time, rather than deciding to commit billions of dollars towards this.
Because there’s only a particular amount of money, there’s only a particular size of endowment, and I think I would be very scared that we’ll be going too early or this is a bad idea, or we’re going to have egg on our face afterwards, because it will turn out there were some early signs of intelligence explosion, but it’s not really going to work out. And then we’ve spent $10 billion and we have nothing left to show for it. You’d feel really bad if you made that mistake. Does that sound like a plausible way for things to go?
Ajeya Cotra: Oh, totally. I think even beyond just being scared of making a mistake on this front, it’s just that organisations have particular ways they do things, and there’s processes.
And right now, Open Phil’s process for grantmaking looks like usually someone fairly junior has an opportunity come across their desk, either through one of our open calls or through some contact they have, and that junior person pulls together some materials to convince their manager it’s a good fit. Then that manager sort of convinces someone higher up that it’s a good fit. And you can have two layers or three layers or sometimes four layers of information cascading up the decision-making process that we have in place as an org, and then it’s approved.
And if the right thing to do is to spend a billion dollars on some particular strain of work that’s super automatable, you wouldn’t trust some random junior person to make that call. You might need to have a different process for that. I don’t know what that process would look like, but I think that would be one thing to figure out.
Rob Wiblin: I guess for this incredible scaling of funding and effort to take place, you’re going to be incredibly bottlenecked on people, or there won’t be that many more people involved, so it would have to be the AIs not just doing the object-level work, but also deciding what problems to work on. Like managing the project and overseeing other AIs basically just taking up the entire org hierarchy. So that’s the picture that you’re envisaging?
Ajeya Cotra: I think there’s two possibilities here. One possibility is that, by the time it’s the right move to dump a bunch of money on crunch-time AI labour, Open Phil itself has already been largely automated. That’s actually like an easy world, because in that world we just have a visceral sense that AIs are really helpful, because maybe we’ve slowed down our junior hiring and all our programme associates are AIs right now, and we are totally transformed as an organisation — so the conviction to pull the trigger might be easier to achieve. And then actually we have a bunch of labour: maybe we have 1,000 people on the AI team instead of 45 that we have now, and they can figure out all this stuff much more quickly.
But I think the concerning possibility is actually that there’s jaggedness, where maybe AI is extremely good at math and maybe AI is extremely good at technical AI safety and certain specific kinds of manufacturing that could be really useful for a PPE play. But we haven’t automated ourselves. It’s not that good at doing our jobs, because there wasn’t much of that stuff in the training data. We’re just not well set up for AI labour.
Rob Wiblin: It still makes horrible mistakes sometimes, so you can’t fully trust it.
Ajeya Cotra: Yeah. It makes horrible mistakes in a way that you can put it in a setup in software or manufacturing where you catch those mistakes, but you need humans to do that on the Open Phil side. So we’re not very automated, we don’t have a visceral sense of, “It’s time now. This is the moment. AIs are really good; we got to go big.” But it’s still the right thing to do to pour a bunch of money into AI labour on these few verticals that are heavily automated.
Will frontier AI even be for sale during the explosion? [01:30:21]
Rob Wiblin: We’ve maybe actually been burying the lede a little bit here on what the biggest challenge is for an external group like Open Phil to implement this plan, which is: will you even be given access to the very best models that are being trained? And at this crunch time, when there’s a crunch on demand for compute, will you actually have enough computer chips? Will anyone be willing to sell to you for you to do this kind of work? Can you go into that?
Ajeya Cotra: Yeah. I think there’s two challenges here to getting access to enough labour as an external group.
One is whether they will even sell to you. Like I said earlier, in AI 2027 and a lot of stories of the intelligence explosion, you get to a point where one company has pulled far enough ahead of its competitors that it keeps its internal best systems to itself and only releases systems that are considerably worse than its internal frontier, that are just good enough to be ahead of its competitors’ released products. There can be a growing gap in how intelligent the best internal systems are and how intelligent the best externally accessible systems are, and the AI company may deliberately choose not to sell to willing customers because they want to keep their secrets to themselves.
Another possibility is they might be willing to sell to you, but the price just might be way too steep, because the opportunity cost of using that compute to sell to you to do whatever you want to do with it is training further more powerful AIs, and they might be willing to pay quite a lot for that.
I think both are challenges. The second one is in some sense more straightforward to address, which is you try to hedge against this possibility by having some portion of your portfolio really exposed to compute prices, and hope that maybe that looks like, in the extreme case, just having GPUs yourself that in peacetime you just rent out to other people doing commercial activity with it, but then during crunch time you redirect to doing AI labour.
Although in that case, you’ll have to furthermore figure out how to get the latest AI models onto those chips that you own. So you might have to cut deals to make that happen. But also, in less extreme cases, you might just purchase a bunch of Nvidia or purchase a bunch of liquid public stocks that are exposed to AI to make it more likely that you can afford AI capabilities at the time.
Rob Wiblin: So there could be a huge run-up in the price of GPUs or compute at this time, but you can partly hedge against that possibility by having most of your investments be in Nvidia or other companies that sell GPUs — so that if their price goes up, you benefit on the investment side, and that helps to offset the increasing price?
Ajeya Cotra: Yeah.
Rob Wiblin: OK. And then on the software side, there’s a question of whether you have access to the very best models that are being trained. On the one hand there’s this story you could imagine where the companies are very close together, the models are roughly the same, margins are very low. They’re very keen to put out models as soon as possible in order to remain competitive. On the other hand, you could have one leader that’s starting to keep things a little secret.
Do you have a particular take on which of these scenarios you think is more likely to come about?
Ajeya Cotra: I think that, at least at the beginning part of crunch time, when the AIs are just starting to automate a lot of AI R&D, my bet is that things will at that point be relatively commercial, relatively open. The leading few companies are within a month of each other in their capability frontier. Or maybe it’s hard to say who’s in the lead because one company specialises in one aspect: their model is a little spiky on pre-training and another company’s model is a little spiky on software engineering, or something like that.
The reason I think that is basically just because it’s what a naive Econ 101 model would predict would happen. It seems like these companies don’t have big moats, and it also seems like what we’ve seen happen over the last few years.
Rob Wiblin: Kind of describes the present day, more or less.
Ajeya Cotra: It describes the present day. And that’s a change from a few years ago, where I do think OpenAI had way more of a lead, and it seemed more plausible that there would be a monopoly or a duopoly.
But there are reasons to push in the other direction, which is basically: if you have a superexponential feedback loop, you have a bunch of actors that are growing at an increasingly rapid rate — like first at 2%, then at 4%, then at 8% — and they don’t interact with one another, you do get a winner-take-all dynamic, where if they’re growing on the same growth curve, but one gets to a particular milestone first, that leader gets more and more powerful and wealthy relative to the laggards. This is in contrast to exponential growth, where if everyone is growing at 2% forever, then the ratios between more and less wealthy nations or companies stay fixed.
So there is a reason to think that specifically around the time of the intelligence explosion, gaps will begin to grow again. But I think probably around the start it will most likely be the case that you can buy AI labour if you can afford it, you can buy API credits, you can go on chatgpt.com.
And then I have a lot of uncertainty about how it evolves from there.
Rob Wiblin: What do you think is the chance that the leading company will try to keep the level that they’re reaching secret?
Ajeya Cotra: I think it depends a lot on the competition landscape they face. Basically if the other companies are really far behind, then I think there’s a pretty strong incentive and reason to keep your capabilities secret. You give up quarterly profits, but maybe you don’t care about that because you’re running on investment money anyway. And if you can get your AI to help you make better AI, to help you make better AI, and so on, you could emerge with superintelligence that might give you a power that rivals nation-states or the ability to just decisively control how the future goes — and that might be very attractive to a sort of power-seeking company.
I do think it does involve forgoing short-term profits though, which means that if competitors are close at your heels and your investors are breathing down your neck to deliver quarterly earnings —
Rob Wiblin: I guess you can’t go and tell all of your investors, “Don’t worry, we have a superintelligence” — because I think then word will get out.
Ajeya Cotra: Well, and then also your plan is to screw over the investors. In this case, your plan is to create a superintelligence, not to pay them back: create a superintelligence and take over the world maybe. They won’t like that. There’s a mismatch in incentives between the investors and the CEO, and the CEO is sort of being a bad agent to their principal.
So basically, the more things look like an efficient competitive market with very little slack, the more the leading company will be sort of forced to provide access to the rest of us.
Rob Wiblin: To what extent do you imagine the companies would be enthusiastically bought in on assisting with this plan? This strategy is their predominant approach to AI technical safety. I think even the optimists agree that there are other issues that society is going to have to deal with; in fact, they say this all the time, the leaders of the companies, that we’re going to need a new social contract, it’s going to upend everything. It’s going to be a big deal.
I imagine that inasmuch as they’re nervous about the effects that the technology is going to have, they’ll be very happy if someone came to them with a pre-prepared plan for here’s how we’re going to deploy all of this compute in order to solve all these other problems.
Ajeya Cotra: I think it’s unclear. Certainly they have some incentive to be into this. But the two alternative uses of AI labour that might be more attractive to them are, one, power-seeking for themselves. Just building up an enormous AI lead over everyone else, and then sort of bursting onto the scene with an incredible amount of power and the ability to challenge like the US government or nation-states might be attractive to some people. I think that would be a very evil strategy to pursue, but it’s definitely in the water.
The other thing is more mundane. It’s just using these AIs to make normal goods and services, to make the products and the media content and the other services that people most want to pay money for in a short-term sense. It’s very similar to how right now we don’t spend a huge fraction of society’s GDP on biodefence and cyberdefence and all these other things.
Rob Wiblin: And moral philosophy.
Ajeya Cotra: And moral philosophy. That’s not what people want to pay for. And AI is just a thing that accelerates the creation of products and services people want to pay for, and this isn’t very high on the list.
Rob Wiblin: I guess most people are not looking to become dictator of the world or to take on huge amounts of power. But the kinds of people who end up leading very risky technology projects are not typical people. They’re somewhat more ambitious than the typical. So I suppose we can’t potentially rule that out as a possibility.
Ajeya Cotra: Yeah.
Rob Wiblin: A possible challenge would be that even if you have an enormous amount of compute, there might just be only so fast that you can go, because you require some sort of sequential steps or there’s some step that is just bottlenecked in time you have to do.
People talk about things where you do an experiment that just actually takes a certain amount of time to play out. But more generally, at least with LLMs, for example, they produce one token after another. And having twice as much compute doesn’t necessarily allow you to basically complete an answer twice as fast without limit.
How much is that an issue here, inasmuch as we’re trying to solve problems in very short calendar time?
Ajeya Cotra: I think that is likely to come up, especially for physical defences like manufacturing PPE or scaling up the ability to rapidly create medical countermeasures, and then also for social and policy things.
I can imagine that AIs could be very helpful in figuring out what kind of agreement between the US and China would be mutually beneficial and how we could enforce it, but the way human decision making works still probably requires humans from the US and China to come together and talk about it, have a conference or convening and come to a decision that they ratify and they feel good about. And that could be a bottleneck, yeah.
Rob Wiblin: Are there any other examples of similar bottlenecks? In terms of solving theoretical problems, I suppose you can speed things up enormously by having many different instances of the same model try to brainstorm different solutions and have them evaluate one another. That allows you to kind of have many different efforts in parallel.
Ajeya Cotra: Yeah, totally. But I do think for deep theoretical problems, you can speed things up by having efforts going in parallel, but the right solution that’s out there somewhere involves multiple leaps, where it’s hard to think of the next insight without having the foundation of the earlier insight. So really, even if you have 100 AIs working in parallel, what will happen is that one of them comes up with the first step of the insight and then everyone is working in parallel on finding the next insight, but you still need to go three or four steps in.
Pre-crunch prep: what we should do right now [01:42:10]
Rob Wiblin: So what sort of stuff do we need to be doing in advance? For example, planning meetings ahead of time for diplomats between the US and China: we need to do that at the very early stage, in anticipation that eventually we might have a deal that they might want to ratify. That sounds a bit crazy, but are there other examples of things that you need to do before this all kicks off?
Ajeya Cotra: Yeah, I think that in general you want to be thinking about what would the AIs at the time be most comparatively disadvantaged in? They’ll have all these advantages over us. They’ll understand the situation much better at that point in time than we do now. They’ll be able to think faster, move faster and so on.
But I think what we can contribute now would be things that just inherently take a long lead time to set up. That might include physical infrastructure, like the bioinfrastructure that my colleague Andrew is working on building out.
It might also include just social consensus. It takes some amount of time for an idea to be socialised in society, to have it as an accessible concept that maybe we should try and create some sort of treaty between the US and China to allow AI to progress somewhat slower than it might naturally, and use a bunch of AI compute to solve all these problems. I think that kind of thing takes years to become something that’s in people’s toolkit, in the water, such that they actually think to have the AIs go down that path and figure out the details of that.
Rob Wiblin: So what should people be doing if they think that this makes sense or it’s something that they’d want to contribute to? Are there other organisations that should similarly be planning ahead and thinking about how this might look for them? Or could individuals be thinking about how they could contribute to adopting this approach for their own particular projects?
Ajeya Cotra: In terms of other organisations, I think it would be especially great for government entities to be thinking about adopting AI. I know that there’s just a number of random little types of red tape that make it harder for governments to adopt AIs than for anyone in industry to adopt AIs. And I think we might end up in a situation where the regulatees — the industry people — have fast cars, and the regulators have horses and buggies because of this differential adoption gap.
And more broadly, if your company is not already going maximally hard on adopting AI for your personal use case — and you work on defences, AI safety, moral philosophy, all these good things — it’s probably worth having a team that’s just on the lookout for how could you adopt AI as soon as it becomes actually useful for you?
A grantmaking trial by fire at Coefficient Giving [01:45:12]
Rob Wiblin: Let’s talk a bit about the career journey that you’ve been on since we last did an interview two and a half years ago.
Back then you were doing general AI research and strategy for Open Philanthropy. This is in 2023. Then in 2024 you started leading the AI technical grantmaking. And then towards the end of that year you decided to take four months off and take a sabbatical. Tell us about all of that.
Ajeya Cotra: So I had been at Open Phil for more than six years before I made my first grant. I was involved in some grantmaking conversations earlier, but the first grant I actually led on was somewhere in mid- or late 2023, and I had joined Open Phil in 2016.
So it was kind of interesting. In some sense, if you just took the outside view that this is a philanthropy that’s giving away money, my work there was very strange. It was kind of thinking about these heady topics and then writing these long reports that I published on LessWrong about them. And I always felt a little like maybe I should dip into grant making, because that is our core product in some sense, it’s what we do. But I had always been sort of drawn away by deeper intellectual projects. So even though I always vaguely had the thought that I should do grantmaking, it never really happened for me.
Actually, I think the thing that pushed me headfirst into grantmaking was the FTX collapse. So actually, sorry, my first grant must have been in 2022 instead of 2023 — because at that point there were hundreds and hundreds of people who had been promised grants by the FTX Foundation where their grant wasn’t going to go through, or they were worried it was going to be clawed back, or it was partially not going through.
And Open Phil put out this emergency call for proposals for people who had been affected by the crash. I had some thoughts and takes on technical research, and also just the organisation needed help — like surge capacity for this emergency influx of grantmaking. So in a matter of maybe six weeks or so, I made 50 different grants after not having made any grants at all.
That was a really interesting experience. I discovered there were elements of it I really liked, but there also was just something about the way you made grants where you just really couldn’t dig into any particular thing very much. Especially in the context of something like the FTX emergency, you just had to be making these decisions really quickly.
But I felt like I had thoughts about how grantmaking could be done, at least in the technical AI safety space, with more inside-view justification for the research directions we were funding than we had previously. So in early or mid-2023, I tried to go down that path.
Rob Wiblin: So in 2022 you did this huge burst of grantmaking, trying to help a bunch of refugees from the FTX Foundation, basically. But then I guess you would have noticed that there’s probably no overarching strategy behind all of the grants that you were making. And you were like, we need to have a bigger-picture idea of what we’re actually trying to push on and why.
Ajeya Cotra: Yeah. So I was focused on grants to technical researchers. These were often academics, sometimes AI safety nonprofits, and they would be working on often interpretability or some kind of adversarial robustness. And they seemed like reasonable research bets, but I felt kind of unsatisfied — and I think this is going to be a theme of me and my career — about how the theory of change hadn’t been really ground out and spelled out as to how this type of interpretability research would lead to this type of technique or ability we have, and then that could fit into a plan to prevent AI takeover in this way, or similarly for any of the other research streams we were funding.
This had actually been a big thing that deterred me from getting involved in Open Phil’s technical AI safety grantmaking for a long time, even though I was one of the few people on staff who thought about technical AI safety outside of that team. It was because in the end it seemed like most grant decisions in this 2015–2022 period turned on heuristics about, “This person’s a cool researcher and they care about AI safety.” Which is totally reasonable, but I wanted to have more of a story for, like, “This line of research is addressing this critical problem, and this is why we think it’s plausibly likely to succeed, and this is what it would mean if it succeeded.”
And we never really had that kind of very built-out strategy — because it’s very hard, it’s a lot to invest in building out a strategy like that. But having been thrown headfirst into grantmaking with the FTX crisis, I was like, maybe I do want to try and take on the AI safety grantmaking portfolio, which at the time didn’t have a leader because all the people who had worked on that portfolio had left by that point — some to go to FTX Foundation, actually — so it was this portfolio that had been somewhat orphaned within the organisation and it was clearly like a very important thing.
I was like, maybe we could approach it in this kind of novel way for us in this area: to really try and form our own inside views about the priorities of different technical research directions, and really connect how it would address the problems we most cared about.
Rob Wiblin: It sounds like you find it unpleasant or anxiety-inducing to make grants where you don’t have a deep understanding of not so much what the money is being spent on, but you don’t have a personal opinion about whether it’s likely to bear fruit. Is that right?
Ajeya Cotra: Yeah. Or I think it’s a bit nebulous what the standard is that I hold myself to. But I think for my research projects — when I think about timelines or I think about how AI could lead to takeover or how quickly could the world change if we had AGI — I think I can often, with like months of effort, get to the point where I can anticipate and have like a reasonable response and a reasonable back and forth with a very wide range of intelligent criticisms for why my conclusion might be totally wrong and totally off-base. I feel like I know what the sceptics that are more doomy than me will say, and I know what the sceptics that are less doomy than me will say, and I could have an intelligent conversation that goes for a long while with either side.
And that is a standard I aspired to get to with why we supported certain grants. And I could do that with some of our grants. But I wanted the programme to get to the point where, if somebody came to me and said, “Interpretability just actually hasn’t seen much success over the last four years; what do you make of that?,” I wanted to be at reflective equilibrium on my answers to questions like that, and wanted to be able to say something that went a bit beyond like, “Yes, but outside view, we should support a range of things.” That is something that I think emotionally is unsatisfying to me, if it’s a big element of my work.
Rob Wiblin: It’s maybe worth explaining why it is that Open Phil doesn’t aspire to get to that level of confidence with most of its grants. Why is that?
Ajeya Cotra: It just takes a long time. I think there’s two things: it just takes a lot of effort, and then the other thing is that even if you put in that effort, you don’t want to fully back your own inside view. I think I wouldn’t endorse that either.
So it’s this one-two punch where developing your views about exactly how interpretability or adversarial robustness or control or corrigibility fits into everything is a tonne of work: you have to talk to a tonne of people, you have to write up a bunch of stuff. And in the meantime, you’re not getting money out the door while you’re doing all this stuff.
And then, having done all this stuff, where are you going to end up? You’re going to end up in a place where there are reasonable views on both sides. And it’s a complicated issue; we probably want to hedge our bets and defer to different people with different amounts of the pot and so on.
So I think people have a reaction, that’s very reasonable, of: we’re going to end up in a place where we’ve thought it through, it was a lot of work, and it’s still very uncertain. We still want to spread our bets, so why not just get to the point where we just short circuit all that and spread our bets and lean on advisors. And I have sympathy for that. Hopefully I represented that perspective reasonably well.
But I just feel like in my life, in my experience, having done the homework really qualitatively changes the details of the decisions you make in ways that I think can be really high impact. One thing that I’m able to do, having gone through the whole rigamarole of forming views, is work with researchers to find the most awesome version of their idea by the lights of my goals, and pitch them on that and sort of co-create grant opportunities.
And I think there’s just something that I maybe won’t be great at defending, but I just feel like there are other nebulous benefits beyond that, and I really like operating that way.
Rob Wiblin: So in 2024 you actually took on responsibility for this whole portfolio.
Ajeya Cotra: In late 2023, yeah.
Rob Wiblin: Late 2023. But I guess your personal philosophy of how to operate is somewhat in tension with how Open Phil as a whole is tending to operate —
Ajeya Cotra:
Just in tension with, in the short term, making a large volume of grants. I think that’s it.
Rob Wiblin: Yeah, yeah. So what did you end up doing in the role?
Ajeya Cotra: I ended up pursuing a compromise. One thing that just comes with the territory of this role is that there have been grantees that we made grants to in the past that are up for renewal. Part of the responsibility of being the person in charge of this programme area is that you investigate those renewals and make decisions about whether we should keep the grantees on or not. And for those grants, I tried to follow what an Open Phil canonical decision-making process would be there.
So I tried to pursue kind of a barbell strategy for a while. On the one hand there were either renewals or people who knew us who reached out to us to ask us to consider grants, where I wouldn’t hold myself to the standard of like, really on the technical merits, understanding and defending the proposal, but would lean more on heuristics like, “This person seems aligned with the goal of reducing AI takeover risk; this person has a broadly good research track record” and so on, and try to make those grants relatively quickly.
But then I would also be trying to develop a different funding programme or some grants that I really wanted to bet on, where I would try and work to hold myself to that standard and try and really write down why I thought this was a good thing to pursue.
It turned out that the second thing basically turned into making a bet in late 2023 to mid-2024 of AI agent capability benchmarks and other ways of gaining evidence about AI’s impact on the world.
Rob Wiblin: The stuff that we were talking about earlier, where you’re trying to get an early heads up about whether the AIs are going to be really effective agents. I guess in 2023, we were really unsure how that was going to go. It seemed like agents in general have been a bit disappointing, or it hasn’t progressed as much as I expected, or probably as you expected. But at that point it seemed like maybe by this point they’ll be just operating computers completely as well as humans. And you really wanted to know if that was the future we were heading for.
Ajeya Cotra: Yeah, yeah. So I launched this request for proposals. Open Phil has done technical safety requests for proposals before, but this was by far the narrowest and most deeply justified technical RFP that we had put out at that time, where I was like, we are looking for benchmarks that test agents, not just models that are chatbots; these are the properties we think a really great benchmark would have; and these are examples of benchmarks we think are good and not so good.
And we had a whole application form that was in some sense guiding people or trying to elicit the information about their benchmark that we thought would be most important for determining whether or not it was really informative. Mostly this was just, “Be way more realistic. Have way harder tasks than existing benchmarks. Even if you think your tasks are hard enough, they’re probably not hard enough.” There was a lot of push in that direction.
So it was a very opinionated and very detailed and very narrow RFP, and we ended up making $25 million of grants through that, and then another $2–3 million from the companion RFP — which was just broader, all kinds of information from RCTs to surveys about AI’s impact on the world.
And I’m pretty happy with how that turned out. It was, like you would expect, a lot of effort poured into one direction. And if you were sceptical of this high-effort approach to grantmaking, you could argue that I could have just put in way less effort, and funded twice as much volume in grants across 10 different areas, picking up the low-hanging fruit in all those areas.
Rob Wiblin: So I guess halfway through 2024, you started feeling pretty burnt out, or like you wanted to take a bit of a break. Why was that?
Ajeya Cotra: So right around when I switched from doing mostly research to doing grantmaking — and especially when I was trying to ramp up this programme area that had this more inside-view, more understanding-oriented approach to AI safety research — Holden [Karnofsky], who had been running the AI team up to that point, decided to step away and left the organisation. And he was my manager.
I think that I had a working relationship with Holden that involved a lot of arguing and discussing about the substance of what I was working on. And when he left, leadership was stretched more thin because someone in leadership was gone, and the people who remained in the leadership team didn’t have as much context and fluency with all this AI stuff as Holden did.
So I wrote up this big memo, like, “We should do AI safety grantmaking in a more understanding-oriented way, and we should develop inside views, and here’s why I think that would be good.” I think what I wanted was for my manager or leadership to argue with me about the object level on that, and for there to be some sort of shared view within the organisation about how much this was a good idea or what are the pros and cons of it and how much we want to bet on it. But I think that was just kind of unrealistic, given the other priorities on their plate and given their level of context in this area.
So I ended up having to approach it in a more transactional way with the organisation. Rather than, “Let’s talk about whether this is a good idea,” it was more like, “I want to do it this way.” And they were like, “Yeah, we don’t know if that’s the best way to do things, and we have some scepticism, but –”
Rob Wiblin: “You can do that if you want.”
Ajeya Cotra: “– you can do that if you want.” So I felt kind of lonely because — and this is something I learned about myself over the course of trying to run this programme and then going on sabbatical and reflecting on it — I really like to be kind of plugged into the central brain of the organisation I’m part of. And I didn’t feel like I had a path to do that, and instead what I had a path to do was to stand up this thing — which I tried to do, but it just felt a bit tough going.
Rob Wiblin: It sounds like you were a bit on your own.
Ajeya Cotra: Yeah, I felt a bit on my own, and I’m not a very entrepreneurial person, I think. I’m ambitious in some ways, but I just really have a high need for constantly talking to other people.
And I tried to achieve that sense of team by hiring people under me to help me with this vision. But I think I was not very good at hiring and management. Partly it was because this vision was pretty nebulous and I probably needed to spend more cycles working out the kinks in it by myself and really solidifying what it is and what’s the realistic version of doing an understanding-oriented technical AI safety programme. So it was very hard to hire, because you kind of had to hire for someone who really resonated with that off the bat, even though it wasn’t a very well-defined thing. So that took a lot of energy.
Then I think with people I was managing, I have always struggled, and in this case still struggled, with perfectionism in management. So I have this long history of trying to get people to serve as writers who write up my ideas. It never works for me, because they don’t do it just the way I want it. And I’m myself a pretty fast writer, so working with a writer as their editor and getting their writing output to be something I’m satisfied with often ends up taking more time than doing it myself.
I found the same happened to some extent with grantmakers. At one point we had a number of people who spent part of their time working on the benchmarks RFP. And I think it’s possible that I would have just moved through the grants faster if it were just me working on it, which is a bit tough. I think this is a weakness or challenge a lot of new managers go through.
And I was going through that at the same time as feeling like some of the feedback and engagement I got from above me was much less than it was before, and I had to sort of prove this new way of doing things. And I thought, and still think, that there was a lot to the arguments I was making, but also it was not a wild success when I took a swing at it by myself.
Sabbatical and reflections on effective altruism [02:05:32]
Rob Wiblin: So September last year, you decided to step away and just take some time away from work, after eight years of working very hard full time. What do you end up doing with that time?
Ajeya Cotra: It was a mix of things. I just did a lot of life stuff. Like I found a new group house to move into, or started a new group house. So that was cool. Did more just trying to take care of myself. I started an exercise habit. I’m off that exercise habit now again, so we’ll see.
Then I did a lot of reflecting on why this work situation ended up being so hard for me, and also just my journey through my career as a whole and what are the patterns in when things were hard for me.
I also just jumped in and helped with some random projects going on. The Curve Conference, a conference that brings together AI sceptics and AI safety people and people on all sides of the issue of AI’s impact on society, was having its first iteration while I was on sabbatical. So I was able to get involved with that more and try to be helpful more than I could have been if I had a full-time job, which was really cool.
I did some writing. Most of that writing hasn’t been published, but it was still good for me to do.
It kind of went by really fast, honestly. There was a lot of stuff to think about and a lot to do.
Rob Wiblin: What sorts of reflections did you have on your career so far, and your motivation, and what had been difficult in 2023 and 2024?
Ajeya Cotra: In terms of 2023 and 2024 specifically, I really do feel like I want to be an advisor and a helper to the central organisation. And I had been that in many ways over the previous six years, so the transition to being more entrepreneurial, and more like, I have a little startup making grants in my area and the organisation is investing money in me but not necessarily a lot of like attention, and I didn’t necessarily have a path to make arguments that then influenced stuff in a cross-cutting way, that was hard.
So I think that was interesting to learn about myself, that if I don’t have that, I will still gravitate towards trying to meddle in everything else that’s going on, and if I don’t have a productive path to meddle I’ll feel sad. That was one big thing.
I think another big thing is how much depth do I want? I do think I really have a drive to really get to the bottom of something, or I’m always like thinking about the counterargument and the counterargument to the counterargument. Even when I was very young, I really liked math tutoring and I really liked math in general because you could just dig and dig and get to an answer — and that’s just inherently an uneasy fit with grantmaking or just investing.
Rob Wiblin: It’s like venture capital that Open Phil is engaged in, in a way.
Ajeya Cotra: Yeah. So that was also interesting to reflect on. And like I said, it was somewhat strange that for my first six or seven years at Open Phil I actually just did rather deep research. Even though we were a grantmaking organisation, I just wasn’t doing grantmaking.
Rob Wiblin: Is that in part because Holden really wanted this deep research? He wanted to more deeply understand the idea personally, and he thought it was healthy for the organisation?
Ajeya Cotra: Yeah, I think that’s right. I think he had a lot of drive and demand for really figuring out timelines, really figuring out takeoff speeds, and exactly what our threat models are for whether AI could take over the world, and building that all up. I think he has a lot of the same instinct I have of, it’s just really good to do your homework, and it’s really good to have the response to the top 10 counterarguments, and the response to those responses, and just really know your stuff.
So he was the driver of a lot of the work that I did, and I think if you’d rerolled the dice and Open Phil had been run by different leadership, it’s probably pretty unlikely we would have gone as deep as we did into doing our own AI strategy thinking, because the thought would have been that we should fund a place like [Future of Humanity Institute], or now Forethought, to do that stuff instead of us.
Rob Wiblin: In your notes, you said that you spent a fair bit of time reflecting in this period about what it had been that you liked about effective altruism as an ecosystem and as a mentality, and what things you didn’t like so much about it. Tell us about that.
Ajeya Cotra: It’s been a long time since you’ve talked about effective altruism on the show, so I’ll open with what it even is, which is this movement or idea that you should think explicitly and seriously and quantitatively about how you can do the most good with your career or with your money that you’re donating — and that different career paths and different charities you could donate to could differ by orders of magnitude in how much good they do. So if you are working on reducing climate change, it could be orders of magnitude more helpful to work on researching green technologies versus to work on getting people to turn off their lights more or conserve electricity more in their personal use.
There’s this ethos that if you’re really taking this seriously, and you really care about helping the world, you stop and think and you do the math — in the same way that if you had cancer or your spouse had cancer, you would do the research and figure out what treatments had what side effects and what treatments had what success rates, and you would ask a lot of questions of the doctor. There’s this ethos that that’s what it looks like when you take something seriously. And a lot of people, when they’re doing good in the world, they do what makes them feel instinctively good. And there is a whole other approach where you respect the intellectual depth of that problem.
I was really drawn to this. I fell headfirst into the EA rabbit hole when I was 13 — so it’s been more than half my entire life that I’ve been extremely involved in this community, this way of thinking. I think there were maybe three big things that I really liked about this approach.
One is just that EAs challenged themselves to care about people and beings that were very different from them, very far away from them in time and space. So even the most “vanilla” EA cause area of global poverty, the vast majority of money that goes to alleviating poverty given by individuals in rich countries goes to helping other individuals in rich countries, even though money could go much further overseas in countries where people have a much lower standard of living. And the reason people donate locally is that they feel more affinity for people who are closer to them and more similar to them.
And EA also has a lot of strains that challenge people to extend care to animals, to extend care to future generations that may live thousands of years or millions of years in the future, and to artificial intelligence also, if it can be something that has consciousness and can feel pain and so on. And that was really appealing to me.
But then there’s a way of going about doing things that was also very appealing to me. They were very nerdy, they were very intellectual. They were really thinking stuff through and almost like innovating methodologically on like, how can we figure out which charities are better than which other charities? And there are lots of interesting arguments thrown around for this.
And they were very transparent. There was just a culture of open debate and admitting your mistakes. GiveWell, an early pillar of the early EA movement, had a mistakes page on its website where it just discussed mistakes it had made. They were very honest and high integrity in an interesting way that doesn’t obviously follow from caring about other beings more. For example, GiveWell refused to do donation matching because donation matching is usually a scam where the big donor would have given that much anyway, even if you hadn’t made your donation.
So that whole package was really attractive to me. I think it really hit a lot of psychological buttons for me at once, and really felt like my people and the way I wanted to live my life.
Rob Wiblin: So there’s being more compassionate to a wider range of beings, which I guess is still the case and probably still something you like about the effective altruist approach. There was also going into enormous intellectual depth and just really debating things out. And then there was also the very high integrity about honesty, like not allowing any chicanery whatsoever or even the hint of chicanery.
Ajeya Cotra: An extremely fastidious and exacting level of integrity that other movements, even other pretty high-integrity movements, weren’t aspiring to.
Rob Wiblin: Even beyond what people are even asking for, potentially.
Ajeya Cotra: Yeah, you’d just proactively say, “By the way, did you know donation matching is a scam? That’s why we’re not doing it. Even though we would get more donations to help poor people.” It’s interesting that that was such a natural part of the early EA movement. Even though you’re sort of giving up on impact, you know?
Rob Wiblin: Yeah, it’s not necessarily implied. I guess it’s a practical question whether it is or not.
So as things evolved, you found that the second one, the intellectual depth, was now lacking from your job. Were there other things that were changing that made you less enthusiastic?
Ajeya Cotra: I think the intellectual depth was very much there in other parts of the EA ecosystem, especially AI safety and thinking through how exactly would you control early transformative AI systems and things like that. Like I said, my heart was always pulled towards those kinds of questions, even though I worked at a grantmaking organisation.
Rob Wiblin: It feels like on some level you really were a more natural grant recipient rather than a grantmaker. You should have gotten something to really go in deep on some questions.
Ajeya Cotra: Yeah, I think that if I had graduated college in 2022 instead of 2016… In 2016 I graduated college, I went to GiveWell. And a big part of why I went to GiveWell at the time was that they had the most intellectual depth on this question of what are the best charities. If I had graduated college in 2022, I probably would have done MATS, which is this programme to upskill in ML AI safety research, and then tried to join an AI safety group. I think I’m naturally drawn to actually doing the research in some sense.
So in that sense, it was sort of a mundane issue that my job, especially after Holden left and the demand for that kind of research evaporated a little bit at the leadership level… If I were to start over again, probably I wouldn’t have applied to join Open Phil. I probably would have applied to join an AI safety group.
But then I think the third thing of just this extremely, almost comically high level of integrity that I really liked was also eroding over the years.
I think that when a lot of the focus of the EA movement was convincing really smart people to donate differently, and being extremely, unusually high integrity was actually just a really valuable and powerful asset. Obviously people like me and very wealthy people that were early GiveWell donors really liked that GiveWell had a mistakes page, and really liked that whole ethos and that whole package: it helped them trust that the recommendations were actually real recommendations, and they weren’t being spun something, and they weren’t being sold something like all the rest of the charity recommendation ecosystem.
But then when you move away from that being your primary method of change — when instead you’ve actually attracted quite a lot of funders, and now you’re trying to use that money and the talent that you’ve attracted to achieve things in the world, maybe things that involve like a lot of politics — then being extremely transparent can be very challenging. Especially because donors want privacy, or if you’re running a political campaign, you don’t want your opponents to know exactly your strategy and the ways that you think you might have made mistakes. This is not how most of the real world works, you know?
Rob Wiblin: Yeah. It’s not the case that the world’s most impactful organisations are consistently incredibly transparent or even incredibly high integrity.
Ajeya Cotra: Yeah, yeah. So there was this tension between the goals. I felt like I should only care about the goals of EA. What EA told me, and it made sense to me, was that the point here is to help others as much as possible. The point is not to conform to an aesthetic or do things in a way that feels cleanest or prettiest.
But at the same time, I think I was to some extent kidding myself about how much of my own motivation and my own attraction to the concept came from just the goals. Like just pillar one, altruism, versus pillars two and three — of that intellectual depth and intellectual creativity; and this crazy high level of openness, transparency, having absolutely nothing to hide, letting all comers come — for me, as a fact about my psychology, the latter two things were actually really important for my motivation. And they were over time just smaller and smaller features of what it was like to do EA, to try and pursue EA goals in my career.
Rob Wiblin: We should say for people who don’t know that over this period the environment that Open Phil was operating in became a lot more challenging and a lot more hostile, I guess. For years it had been funding all kinds of AI-related stuff, but as AI became a much bigger industry, it became apparent what sorts of concerns different people had. Its work in some ways started to just clash with very large commercial interests potentially, and also just alternative ideologies that had different ideas about how things ought to be regulated or how things ought to go.
So we’re now in a world where there were people who would sit down and think, “How can I f*** with Open Phil? What can I do to give these guys a terrible day? What have they published that we could spread that will be embarrassing for them?” And in that kind of environment, where people just literally want to cause trouble for you, it’s a lot less attractive to be maximally forthcoming about all of your internal deliberations and why you made all of your decisions. All of us would potentially be a bit more conservative in that kind of environment.
Ajeya Cotra: Yeah. Even before the latest round that started in 2023 of AI policy heating up, Open Phil compromised a lot on its initial wild ambitions for transparency. At the beginning, there was this idea that we would publish the grants we decided not to make, and explain why we decided not to make them when people came to us for grants.
Rob Wiblin: There’s a reason most organisations don’t do that.
Ajeya Cotra: There’s a reason most organisations don’t do that. For our earliest two programme officer hires, we have a whole blog post that we wrote about their strengths and weaknesses as a candidate and alternatives we considered and how confident we are that this will work out. We stopped doing that. So there is a level of transparency that’s just, I still in my heart want that, but it’s absolutely insane.
And then I think the adversarial pressure that you mentioned makes it so that Open Phil, as an organisation that funds a lot of this ecosystem, has a lot to lose. If we go down, a large number of helpful projects have a much harder time getting funding. We have to be a lot more risk averse than many of our grantees, even though those grantees are also facing an adversarial environment. I think the way many of them navigate it is to sort of fight back and explain their perspective and define themselves in the public sphere. My instinct is to just do more of that and to say more and respond, but it’s harder to do that from Open Phil’s position for a number of reasons.
Rob Wiblin: So over the years, a lot of people, usually critics, have said that effective altruism has some things in common with religious movements. To what extent have you found that to be the case, and to what extent have you found that not to be the case?
Ajeya Cotra: I mean, I think EA aspires to be, and very much succeeds at being, a lot more truth-seeking than the world’s religions and a lot more truth-seeking than a lot of other communities and movements in the world. So in that sense I think there’s a disanalogy that’s extremely important.
I do think it’s not a bad analogy in some ways, because I think for people who really are deeply involved in the EA community, it provides like a map of the good life. It’s like a vision of what it means to be good and have a good life. It’s unlike a political movement in that it doesn’t just have a set of policy prescriptions for the world, but, like many religious movements, it intersects with politics.
And there are people who approach political questions like whether you should ban gestation crates for pigs through the lens of their commitment to EA. And it’s not just a community; it’s not just a social club. I think people get solace and friendship from their local community of EAs, like people do from their local church community.
But it is more than that. It is trying to say something about the sweep of the world and your place in it and what it means to live a good and meaningful life, and it intersects with politics and community and a bunch of other things while not being exactly the same as it.
Rob Wiblin: Yeah. I would think a key way that it’s not like a religion is that it feels more like in many respects a business to me, or like a startup or an organisation that has quite a functional goal. That’s a different aspect of it.
Some people like the ideas, they like the blog posts, and they don’t engage with the community whatsoever, and I suppose for them it’s going to be a different experience. And actually there’s many people who participate in the community of people who would say they’re involved in effective altruism but actually are not that interested in the projects or necessarily even the effort of helping people.
People kind of sample the aspects that they like. But for many of the staff who work in organisations that have other people who would say they’re really into effective altruism, it’s much more pragmatic, I would say.
Ajeya Cotra: Yeah, I think that is how it ends up manifesting for a lot of people. But I don’t think that’s really what EA is. Or I think it’s a mistake to collapse EA into a set of three or four goals in the world — like reducing suffering of animals in factory farms, plus improving quality of life for poor people in developing countries, plus AI safety. In some ways people think of EA as a weird umbrella for those three things, and then those three things are basically professional communities pursuing a kind of well-defined goal.
But I think EA is more like a way of looking at the world, and a way of thinking about the good. And I think you can take an EA approach to cause areas that are in some sense more parochial than the big three EA cause areas. Like I think you can absolutely take an EA approach to US policy from the perspective of thinking about the welfare of US citizens, doing rigorous cost-effectiveness analysis of what policies actually help and don’t help. And a lot of people do.
Then I think there is EA as a generator of new cause areas that could get added to the canon. And right now there’s a bunch of fertile ground with, could EA be a force that helps society prepare for radical change by advanced AI, where AI safety is one big important thing there, but there might be a range of other issues, and you might want to prioritise some of those based on your values and your sense of how things will play out.
Rob Wiblin: You wrote in your notes that, at least from your personal point of view, EA wasn’t enough like a religion, or it wasn’t as much like a religion as you might personally have liked. Explain that?
Ajeya Cotra: I’m someone that just really benefits from structure and from emotional motivation reinforcement. I also just very much tend to a little bit socially conform, or I think I tend to try and achieve the ideal of the community I’m in. And I think the ideal of my corner of the EA community is sort of like you said: just to have a really impactful job and then do a really good job at it and work a lot of hours at it. So that’s the message you get from the community, and that’s what I’m trying to do.
But I think I personally would have liked a bit more of a spiritual angle to the community. Like if you read my colleague Joe Carlsmith’s blog, I get some of that existential reflection about our morality and our values and this crazy thing that so many EAs believe: that in a matter of a decade or two, we might be in an utterly transformed world that might be, relative to this vantage point, utopic or dystopic, and just like grappling with that.
You know, I think if there had been like an EA church, where like every Sunday, someone who’s really good and thoughtful about these issues spoke about them and led a discussion about them, I think that would have been very enriching for my life, and probably ultimately made me be higher impact.
But that’s just not how the EA community is structured. It’s sort of deliberately not structured that way because of the professional community aspect of EA. You really want to not care if people believe the deepest teachings and philosophical orientation. You really want to just be like, “If you’re doing great AI safety research, great, do great AI safety research.” So the incentives of a professional community pull against what I might personally want here.
Rob Wiblin: It sounds like you think, while it might have been more appealing to you, it’s not actually necessarily better for things to go in that direction. For me personally, I kind of like the more professional community, limited aspect of it — because you just want to be able to go home and not have to think about this stuff all the time.
Ajeya Cotra: Whereas I want to go home and think about it in a different way. I mean, I already go home and think about my work all day. I frequently have insomnia where I think about my work, and instead of thinking about the next Google Doc I need to write or the next email I need to send, I would like to be, like, spiritually marinading.
Rob Wiblin: Thinking about your work in a more spiritual dimension.
Ajeya Cotra: Yeah, exactly.
Rob Wiblin: I guess people have a range of views, but it’s clear why many people have not embraced that totally, or have been keen for the more “let’s have a strong division between this sort of thing,” which can be very stressful.
Ajeya Cotra: Yeah. It can be very dangerous and culty, and there are a lot of reasons to worry about it. But I do think there is a large contingent of EAs that are like me in wanting some sort of spiritual grounding. Joe Carlsmith’s blog is extremely popular with hardcore EAs. It’s not like a generically popular blog, or it’s reasonably popular, but there are a number of people who are like, “Wow, this is really nourishing something in me that I didn’t realise I needed.”
Rob Wiblin: Yeah. There’s probably an age thing here a little bit as well. I feel like when I was younger, I noticed that the older people were less interested in this aspect of it. I guess now I’m in the older class and I’m like, I have my family to provide nourishment, and that’s absorbing a lot of time and energy that I kind of don’t have for attending church or whatever else it might be.
Ajeya Cotra: That’s kind of interesting. Do you feel like you had some sort of spiritual hole that was filled specifically by having a child? Or were you always just not that interested in this?
Rob Wiblin: I think of myself as a deeply unspiritual person, so I think that wasn’t really an itch that I needed to scratch. I guess earlier on I was maybe more interested in the social scene to make good friends and meet people. I guess having made more friends who I think of as like-minded and having a lot of common interests with, that’s not as interesting either anymore. I’ve already got my friends and now I’m just going to ride it out.
Ajeya Cotra: I actually thought of myself as an extremely unspiritual person. I had a lot of disdain for spirituality when I was 20. So for me the age thing has gone the other way. I think I want more and more of a religion-shaped thing in my life as I age.
When I think about why, I think it’s because when I was 20 I had unrealistic aspirations for my worldly projects. By that point I’d already been an EA for like six or seven years, but I was just starting off trying to do things in the world and I had this sense that like, “This is obviously correct, this is obviously great. Everyone who’s good and reasonable will get on board with it and we’ll just solve poverty and solve factory farming.” I wouldn’t have exactly said this stuff, but I just had that inner vibe that I would go around being like, “Have you heard the good word about EA?”
And I think as I’ve just done things in the real world, everything is very hard and slow, and the feeling of doing my job, which involves writing these Google Docs and sending these emails, is just not automatically connected to my higher aspirations. There is a long grind and there’s a lot of failure. So I think I have increasing demand for some separate thing that is specifically trying to reorient me mentally towards the bigger picture.
Rob Wiblin: Yeah. For me, the bottom line is that working on this stuff can be quite stressful and quite tiring, and I want to completely check out and stop thinking about it, and just be with people and talk about other issues. They’re different strategies.
Ajeya Cotra: Yeah. I think I probably want some of both. Now I live in a group house with a couple of little kids, which is really great and it’s good. But I find unfortunately that it takes a lot to pull my mind away. Or like when I watch TV, I’m thinking about other stuff in the background.
The mundane factors that drive career satisfaction [02:34:33]
Rob Wiblin: I think during your sabbatical you considered going independent, I guess becoming a writer or researcher, just doing your own thing. But in the end you decided to come back to Open Phil, at least for a while. Why was that?
Ajeya Cotra: So towards the end of the sabbatical, I was planning on taking some time to just start a Substack and write about a bunch of stuff, including a lot of the stuff about EA that we were discussing, a lot of stuff about AI, and sort of see where it went.
And at that time I honestly didn’t have a super strong impact case for this I think. I didn’t think it was crazy that it would be the highest impact thing to do, but the reason I was doing it was just because I just wanted this, and not that I could really defend that it was the highest impact thing. But at that moment, after having gone through this whole journey, I was like, yeah, maybe I have more room in my life for making a career decision on the basis of not just impact.
The reason I decided to stay was that basically, while I was out, Open Phil was conducting a search for a new director to lead our [global catastrophic risk] work, so all our AI work and our biorisk work. This was the position Holden was in when he left in 2023. Both of the top two candidates seemed really good to me, and I felt like someone new coming in could probably really use help from someone who’s not particularly running any given programme area, doesn’t have a big team to worry about, and can just help that person develop context, figure out their strategy. And then it could be an opportunity for me to see if I could get the feeling of plugging in again that I had been missing for a while.
Rob Wiblin: How did it go?
Ajeya Cotra: I think it went really well. Our director of GCRs is Emily Oehlsen, who’s also the president of Open Philanthropy. And I’ve been spending most of this year, 2025, just helping her in various ways, trying to understand what have we funded? What’s come of that? What’s the AI worldview? What do we think is going to happen with AI? How is that informing our strategy? What are the strategies of the various sub-teams?
And I work really well with her. It’s like I actually had been lonely at Open Phil almost the entire time I’d been at Open Phil, even though it got worse in 2023. Because while Holden was really great at giving me a lot of bandwidth that I’m really grateful for and talking about object-level stuff with me, Holden never ran a ship where he was like, “I’m doing this bigger project, can you help me with this piece of it? Here’s how it fits in.” Holden was always more like a research PI, where I was doing my own research project and he would talk to me about it a bunch and he was interested in the results, but it was not integrated into a whole.
And Emily really does operate in more of an integrated way, where I’m doing stuff and I know she needs to know the answer and is going to do something with it, which is very cool and very novel for me as a way to work. It’s something that I always thought I would want, and indeed it’s really great.
And I think she’s an extremely caring and thoughtful manager for me, who’s really good at eliciting work out of me. I noticed that I work more than I did right before I went on sabbatical and it feels less hard. So that’s just a sign that things are working.
Rob Wiblin: So you’re trying to decide what to do next, whether to stay at Open Phil or go into something less meta and maybe that will allow you to go into even more depth. How are you using the stuff that you’ve learned about yourself over the last few years to inform that decision?
Ajeya Cotra: So besides Open Phil, which is still a top candidate, I’m talking to two technical research orgs about potentially finding a fit there. One is Redwood Research, the other is METR. Redwood Research works on basically futurism-inspired technical AI safety research, and they’re best known for pioneering the AI control agenda. And METR I think of as trying to be the world’s early warning system for intelligence explosion. They’re measuring all the different measures we want to be tracking to see if we’re on the cusp of AIs rapidly accelerating AI R&D or acquiring other capabilities that let them take over.
Both of these missions are very close to my heart. They’re both narrower than Open Phil, where I could just, if I wanted, dip my toes in absolutely everything that might help with making AI go well. But then in exchange, they would let me go deep in a way that I think would probably be more satisfying for me, all else equal.
In terms of how I’m using what I’ve learned — and this is so cliche, and it’s something that if a 20-year-old version of me were watching this, she’d roll her eyes — but your extremely local environment, the literal person you’re reporting to, matters a huge amount; and the two or three people you’re going to be talking to most in your job, or just features like how much are you talking to people in your job versus working on your own, can just make a transformative difference.
I found it interesting to reflect on. I said all that stuff earlier about how EA has become a lot less transparent and a lot less prioritising maximal integrity at all costs. And that does still bother me, and actually the moral foundations of EA, like utilitarian thinking, you can go down a long rabbit hole where it is very suspect in many ways, and we talked about this in some previous episodes.
But both of those things bother me a lot more when I’m also in a working environment that’s locally hard for me, you know? It’s not like those issues aren’t issues, but the salience of those kind of heady, big-picture things versus extremely micro things about what does it feel like when you have a one-on-one with your manager… I think I had been underrating the mundane and the micro in how I had been thinking about my career up to now.
And I’m trying to do trials. I’m actually in the middle of a work trial with METR as we’re filming this episode. [Update: Ajeya started working at METR!] And that’s what I’m paying attention to: How does the rhythm of the work feel? How do the people feel?
Rob Wiblin: I guess other generalisable observations are that Open Phil’s environment changed over the years. You were there for eight years.
Ajeya Cotra: I guess nine years now.
Rob Wiblin: Nine years, right. But the kind of constraints that Open Phil was labouring under in 2023 were very different than in 2016. Unsurprisingly, it might have been a good fit for you to start with, but that doesn’t necessarily mean it will be a good fit forever. And also there was a leadership change at Open Phil, the person you were reporting to changed. Very often when that occurs, you see some other people leave as well, because they were in their roles primarily because of their very good working relationship with that person, or because they had strategic alignment with that person.
Ajeya Cotra: Absolutely.
Rob Wiblin: I suppose potentially the CEO changing could have been a trigger for you to think, “Maybe this isn’t so great anymore, and I should proactively start looking for something else.”
Ajeya Cotra: Yeah, I think that’s possible. It sort of was true for me in both directions. I think Holden very much was a huge part of why I wanted to work at GiveWell rather than work in a number of other potential places or do earning to give like I thought I was going to do at first. Then when he left, that coincided with a difficult period for me. And now with Emily in the position that he was in before, it’s again pretty dramatically changed what my work is and how it feels.
So it does seem like it’s a big transformative thing. And if you’re in an organisation where there’s a leadership change, I think it should probably be a trigger to think about, even if you don’t leave, what might be different about your role and your place and what you’re doing based on the different style or the different constraints and strengths and weaknesses of new leadership.
Rob Wiblin: It sounds like taking four months off was also a good call. You were reasonably unhappy. I guess it could have gotten worse though if you hadn’t done that, and it gave you breathing room to make good decisions.
Ajeya Cotra: Yeah, I think that’s right. I’m very glad that I took the sabbatical. I’m also glad that I didn’t leave. A salient alternative for me at the time that I decided to take four months off was to just leave and figure out what I wanted to do next. And I think it was good both for my impact and for my personal growth and satisfaction that I came back. I helped Emily and now I’m doing a proper job search. At the time that I left for my sabbatical, it was more healing and reflecting, and not in a focused way searching for a role.
EA as an incubator for avant-garde causes others won’t touch [02:44:07]
Rob Wiblin: Coming back to effective altruism for a bit. You said we basically almost don’t talk about effective altruism on the show anymore. I guess it was a much bigger feature in the earlier years.
The biggest reason for that, I suppose, is now that we’re more AI focused — but AI is an issue that so many people are concerned about, regardless of their broader moral values or broader moral commitments — it just doesn’t feel as relevant. You don’t have to be concerned about shrimp, or you don’t have to be concerned about beings very far away in time to think it’ll be really good to do AI technical safety research, or it would be good to think about what governance challenges are going to be created by it.
And of course, EA is a controversial idea and I think actually is at its core quite a controversial idea. Many people even fully understanding it would simply not agree with its prescriptions of how resources ought to be allocated. And why bring along all that baggage when it’s not actually decision-relevant for most people?
Do you think we should talk about it more, or is that just a sensible evolution?
Ajeya Cotra: I think it kind of depends on the show’s goals. My take is that it’s correct and good that you don’t need to buy into the whole EA package with all of its baggage to worry about misaligned AI taking over the world and to do technical AI safety research to prevent that, to worry about AI-driven misuse, and to do research and policy to prevent that, and to just generally worry about AI disruption and think about that. So I think there should be, and there is, a healthy, thriving “AI is going to be a big deal” ecosystem that does not take EA as a premise.
But at the same time, I think EA thinking and EA values probably do still have a lot to add. In the age of AI disruption, I think it’s going to be EAs for the most part who are thinking seriously about whether AIs themselves are moral patients and whether they should have protections and rights and how to navigate that thoughtfully against tradeoffs with safety and other goals. It’s going to be EAs that by and large are still the ones that take most seriously the possibility that AI disruption could be so disruptive that we end up locked into a certain set of societal values, that we gain the technological ability to shape the future for millions of years or billions of years and are thinking about how that should go.
There’s a lot of degrees of extremity to the AI worldview. Even if you accept that AI is going to disrupt everything in the next 10 or 20 years, the people who are thinking hardest about the most intense disruptions are going to be disproportionately EAs, because EA thinking challenges you to try and engage in that kind of very far-seeing, rigorous speculation. Even though there’s a lot of challenges with that and it’s very hard to know the future, I think EAs are the ones that try hardest to peek ahead anyway.
Rob Wiblin: Yeah, I guess digital sentience, worrying about AIs themselves suffering, is a good example. I would definitely make the prediction that effective altruism will loom large in the group of people working on that.
For someone who’s not altruistic or isn’t motivated by social impact, it’s a bit unclear why you would go into that area. It’s not particularly lucrative; it’s not, at least yet, particularly respected. I guess it’s not super easy to make progress, and it’s sufficiently unconventional. I think most people, most of the time in their career, want to do something that’s acceptable, and that their parents will be proud of. And it’s a lot less clear that digital sentience is going to provide you with the kind of esteem or prestige or safety or comfort that many people want in a career.
So it’s maybe natural that people who are altruistically motivated and also I guess intellectually a bit eclectic, willing to be avant garde, are going to be more —
Ajeya Cotra: Yeah, intellectually avant garde, like tolerant of quite a lot of philosophical reasoning and speculation. In a sense, I think this might be what a healthy EA community is: it’s like an engine that incubates cause areas at a stage when they’re not very respected, they’re extremely speculative, the methodology isn’t firm yet; you just have to be extremely altruistic and extremely willing to do unconventional things — and then matures those cause areas to the point where they can stand on their own while also being a thing that many EAs work on.
And I think digital sentience and maybe the other things on Will and Tom’s list, like space governance and thinking about value lock-in and stuff like that, are other candidates for EA to kind of incubate, the way it incubated worrying about AI takeover basically.
Rob Wiblin: I feel it less strongly in the case of the value lock-in thing, because many of the mechanisms there would be just ways that AI ends up… I guess you get a power grab by people or a power grab by AIs, or somehow it undermines democracy or deliberation in a way that makes it hard for society to adapt over time. And I think people are worried about that, both people involved in effective altruism and people who would be very sceptical of it.
Ajeya Cotra: I think that there are some versions of the value lock-in concern that go through something else kind of overtly scary and bad happening — like one person getting all of the power, and that person’s values get locked in, and that’s how we get value lock-in.
But I think there’s a whole spectrum of things that are almost like social media++. Like in this distributed way, this technology has made us meaner to each other and worse at thinking and has allowed individuals to live in information bubbles of their own creation. You can imagine AIs getting way better at creating a curated information bubble for each individual person that allows them to continue believing whatever it is they started believing, with superintelligent help preventing them from changing their mind.
And this might be something you think of as an important social problem for the long-run future. Even if it’s not, if it doesn’t happen via one person getting all the power, power is still relatively distributed, but large fractions of society are sort of impervious to changing their mind.
Rob Wiblin: It’s interesting that in thinking about what is the niche that EA can fill that others won’t fill, the thing you were pointing to was not primarily actually altruism — although I guess that is a factor in terms of going into digital sentience, perhaps — but it’s actually a research methodology or like a research instinct, which is being willing to be in that very uncomfortable space between just making stuff up and having firm conclusions that you can stand by because you’ve taken particular measurements.
It feels like for some reason that is one of the most distinctive aspects of people who are passionate about effective altruism: willing to try really hard to make informed speculation about how things will go, and neither just have it be a good story nor be too conservative that you’re not willing to actually make hard predictions.
Ajeya Cotra: Yeah, absolutely. I think even the tamest of EA cause areas, global health and development, has a huge dose of this. If you look at GiveWell’s cost-effectiveness analysis, they have to grapple with how does the value of doubling one’s income if you make a very low amount of money compare to a certain risk of death or the value of a certain painful disease you could have? And they have to try and get their answers based on surveys and weird studies people have done. It’s not very rigorous in the end. And they have to form their judgements and spell out their judgements.
I think the willingness to tackle questions like this and just be like, “Here’s our answer; there’s a lot to argue with,” is very emblematic of EA organisations, including all the best AI safety EA organisations like Redwood Research.
Rob Wiblin: I guess more standard ways to approach those questions would be to just pick one slightly arbitrarily and then be really committed to it, or to be kind of irritated at being asked the question and to say that there’s absolutely no way of knowing, or there’s no fact of the matter here whatsoever. I guess trying to be somewhere in the middle… I don’t know whether it’s somewhere in the middle, but yeah.
Ajeya Cotra: And within EA, there’s a spectrum in terms of where in the middle you want to land, where everyone’s kind of looking at the person more speculative than them and thinking that they’re just like building castles on sand and this is not the way to do things. And they’re looking at people less speculative than them and they’re thinking they’re just the streetlight effect, and they’re just ignoring the most important considerations and not working in the most important area.
Rob Wiblin: Yeah. For people who do have that mindset, I suppose an important message would be that people should take advantage of the fact that they have this unique mentality, or this reasonably rare mentality, and go into roles that other people probably won’t fill because they feel too uncomfortable. Or they could just reasonably think it’s misguided, but other people aren’t necessarily going to do this stuff.
Ajeya Cotra: Yeah, I think that’s right, and it’s interesting to think about. If you imagine EA as one piece of the world’s response to crazy changes like AI, there’s actually a case that EA should be heavily indexed on research.
I think the community has gone back and forth with how it thinks about this. At first people are just naturally attracted to research stuff, so there was a huge glut of people who wanted to be researchers. And then there was a big push, including from 80K and others, to consider operations roles and policy roles and other things that aren’t just research.
And I think that was a good move at the time. But I wonder if we think about what is EA’s comparative advantage relative to the world, maybe that suggests that some of the people who are doing operations and doing policy, but maybe in their hearts just want to be like a weird truthteller, thinking speculative thoughts, should consider going back and doing that again.
Rob Wiblin: My guest today has been Ajeya Cotra. Thanks so much for coming on The 80,000 Hours Podcast again, Ajeya.
Ajeya Cotra: Thanks so much for having me.