#191 – Carl Shulman on government and society after AGI (Part 2)

By Robert Wiblin and Keiran Harris · Published July 5th, 2024 ·

#191 – Carl Shulman on government and society after AGI (Part 2)

By Robert Wiblin and Keiran Harris · Published July 5th, 2024

The AI advisor would point out all of these places where the system is making the top-level objective of getting a vaccine quickly, where that’s going wrong, and clarifies which changes will make it happen quicker. “If you replace person X with person Y; if you cancel this regulation, these outcomes will happen, and you’ll get the vaccine earlier. People’s lives will be saved, the economy will be rebooted,” et cetera.

There’s just all kinds of ways in which the thing is self-destructive, and only sustainable by deep epistemic failures and the corruption of the knowledge system that very often happens to human institutions. But making it as easy as possible to avoid that would improve it. And then going forward, I think these same sort of systems advise us to change our society such that we will never again have a pandemic like that, and we would be robust even to an engineered pandemic and the like.

— Carl Shulman

This is the second part of our marathon interview with Carl Shulman. The first episode is on the economy and national security after AGI. You can listen to them in either order!

If we develop artificial general intelligence that’s reasonably aligned with human goals, it could put a fast and near-free superhuman advisor in everyone’s pocket. How would that affect culture, government, and our ability to act sensibly and coordinate together?

It’s common to worry that AI advances will lead to a proliferation of misinformation and further disconnect us from reality. But in today’s conversation, AI expert Carl Shulman argues that this underrates the powerful positive applications the technology could have in the public sphere.

As Carl explains, today the most important questions we face as a society remain in the “realm of subjective judgement” — without any “robust, well-founded scientific consensus on how to answer them.” But if AI ‘evals’ and interpretability advance to the point that it’s possible to demonstrate which AI models have truly superhuman judgement and give consistently trustworthy advice, society could converge on firm or ‘best-guess’ answers to far more cases.

If the answers are publicly visible and confirmable by all, the pressure on officials to act on that advice could be great.

That’s because when it’s hard to assess if a line has been crossed or not, we usually give people much more discretion. For instance, a journalist inventing an interview that never happened will get fired because it’s an unambiguous violation of honesty norms — but so long as there’s no universally agreed-upon standard for selective reporting, that same journalist will have substantial discretion to report information that favours their preferred view more often than that which contradicts it.

Similarly, today we have no generally agreed-upon way to tell when a decision-maker has behaved irresponsibly. But if experience clearly shows that following AI advice is the wise move, not seeking or ignoring such advice could become more like crossing a red line — less like making an understandable mistake and more like fabricating your balance sheet.

To illustrate the possible impact, Carl imagines how the COVID pandemic could have played out in the presence of AI advisors that everyone agrees are exceedingly insightful and reliable.

To start, advance investment in preventing, detecting, and containing pandemics would likely have been at a much higher and more sensible level, because it would have been straightforward to confirm which efforts passed a cost-benefit test for government spending. Politicians refusing to fund such efforts when the wisdom of doing so is an agreed and established fact would seem like malpractice.

Low-level Chinese officials in Wuhan would have been seeking advice from AI advisors instructed to recommend actions that are in the interests of the Chinese government as a whole. As soon as unexplained illnesses started appearing, that advice would be to escalate and quarantine to prevent a possible new pandemic escaping control, rather than stick their heads in the sand as happened in reality. Having been told by AI advisors of the need to warn national leaders, ignoring the problem would be a career-ending move.

From there, these AI advisors could have recommended stopping travel out of Wuhan in November or December 2019, perhaps fully containing the virus, as was achieved with SARS-1 in 2003. Had the virus nevertheless gone global, President Trump would have been getting excellent advice on what would most likely ensure his reelection. Among other things, that would have meant funding Operation Warp Speed far more than it in fact was, as well as accelerating the vaccine approval process, and building extra manufacturing capacity earlier. Vaccines might have reached everyone far faster.

These are just a handful of simple changes from the real course of events we can imagine — in practice, a significantly superhuman AI might suggest novel approaches better than any we can suggest here.

In the past we’ve usually found it easier to predict how hard technologies like planes or factories will change than to imagine the social shifts that those technologies will create — and the same is likely happening for AI.

Carl Shulman and host Rob Wiblin discuss the above, as well as:

The risk of society using AI to lock in its values.
The difficulty of preventing coups once AI is key to the military and police.
What international treaties we need to make this go well.
How to make AI superhuman at forecasting the future.
Whether AI will be able to help us with intractable philosophical questions.
Whether we need dedicated projects to make wise AI advisors, or if it will happen automatically as models scale.
Why Carl doesn’t support AI companies voluntarily pausing AI research, but sees a stronger case for binding international controls once we’re closer to ‘crunch time.’
Opportunities for listeners to contribute to making the future go well.

Producer and editor: Keiran Harris
Audio engineering team: Ben Cordell, Simon Monsour, Milo McGuire, and Dominic Armstrong
Transcriptions: Katy Moore

Highlights

How AI advisors could have saved us from COVID-19

Carl Shulman: With Operation Warp Speed, which was the effort under the Trump administration to put up the money to mass produce vaccines in advance, they went fairly big on it by historical standards, but not nearly as big as they should have. Spending much more money to get the vaccines moderately faster and end lockdowns slightly earlier would be hugely valuable. You’d save lives, you’d save incredible quantities of money. It was absolutely worth it to spend an order of magnitude or more additional funds on that. And then many other countries, European countries, that haggled over the price on these things, and therefore didn’t get as much vaccine early, at a cost of losing $10 or $100 for every dollar you “save” on this thing.
If you have the AI advisors, and they are telling you, “Look, this stuff is going to happen; you’re going to regret it.” The AI advisor is credible. It helps navigate between politicians not fully understanding the economics and politics. It helps the politicians deal with the public, because the politicians can cite the AI advice, and that helps to deflect blame from them, including controversial decisions.
So one reason why there was resistance to Operation Warp Speed and similar efforts is you’re supporting the development of vaccines before they’ve been fully tested on the actual pandemic. And you may be embarrassed if you paid a lot of money for a vaccine that turns out not actually to be super useful and super helpful. And if you’re risk averse, you’re very afraid of that outcome. You’re not as correspondingly excited about saving everybody as long as you’re not clearly blameworthy. Well, having this publicly accessible thing, where everyone knows that HonestGPT is saying this, then you look much worse to the voters when you go against the advice that everyone knows is the best estimate, and then things go wrong.
I think from that, you get vaccines being produced in quantity earlier and used. And then at the level of their deployment, I think you get similar things. So Trump, Donald Trump, in his last year in office, he was actually quite enthusiastic about getting a vaccine out before he went up for reelection. And of course, his opponents were much less enthusiastic about rapid vaccine development before election day, compared to how they were after. But the president wanted to get the system moving as quickly as possible in the direction of vaccines being out and deployed quickly, and then lockdowns reduced and such early — and in fact, getting vaccines fast, and lockdowns and NPIs over quickly, was the better policy.
And so if he had access to AI advisors telling him what is going to maximise your chances of reelection, they would suggest, “The timeline for this development is too slow. If you make challenge trials happen to get these things verified very early, you’ll be able to get the vaccine distributed months earlier.” And then the AI advisor would tell you about all the problems that actually slowed the implementation of challenge trials in the real pandemic. They say, “They’ll be quibbling about producing a suitable version of the pathogen for administration in the trial. There’ll be these regulatory delays, where commissions just decide to go home for the weekend instead of making a decision three days earlier and saving enormous numbers of lives.”
And so the AI advisor would point out all of these places where the system is making the top-level objective of getting a vaccine quickly, where that’s going wrong, and clarifies which changes will make it happen quicker. “If you replace person X with person Y; if you cancel this regulation, these outcomes will happen, and you’ll get the vaccine earlier. People’s lives will be saved, the economy will be rebooted,” et cetera.
And then we go from there out through the end. You’d have similar benefits on the effect of school closures, on learning loss. You’d have similar effects on anti-vaccine sentiment. So processing the data about the demonisation of vaccines that happened in the United States later on, and so having that as a very systematic, trusted source — where even the honest GPT made by conservatives, maybe Grok, Elon Musk’s new AI system, Grok would also be telling you, if you are a Republican conservative who’s suspicious of vaccines, especially after the vaccine is no longer associated with the Trump administration, but the Biden administration, then having Grok, an equivalent, telling you these things, having it tell you that anti-vaccine things are to the disadvantage of conservatives, because they were getting disproportionately killed and reducing the number of conservative voters…
There’s just all kinds of ways in which the thing is self-destructive, and only sustainable by deep epistemic failures and the corruption of the knowledge system that very often happens to human institutions. But making it as easy as possible to avoid that would improve it. And then going forward, I think these same sort of systems advise us to change our society such that we will never again have a pandemic like that, and we would be robust even to an engineered pandemic and the like.
This is my soup to nuts how AI advisors would have really saved us from COVID-19 and the mortality and health loss and economic losses, and then just the political disruption and ongoing derangement of politics as a result of that kind of dynamic.

Why Carl doesn't support enforced pauses on AI research

Carl Shulman: The big question that one needs to answer is what happens during the pause. I think this is one of the major reasons why there was a much more limited set of people ready to sign and support the open letter calling for a six-month pause in AI development, and suggesting that governments figure out their regulatory plans with respect to AI during that period. Many people who did not sign that letter then went on to sign the later letter noting that AI posed a risk of human extinction and should be considered alongside threats of nuclear weapons and pandemics. I think I would be in the group that was supportive of the second letter, but not the first.
I’d say that for me, the key reason is that when you ask, when does a pause add the most value? When do you get the greatest improvements in safety or ability to regulate AI, or ability to avoid disastrous geopolitical effects of AI? Those make a bigger difference the more powerful the AI is, and they especially make a bigger difference the more rapid change in progress in AI becomes.
I think the pace of technological, industrial, and economic change is going to intensify enormously as AI becomes capable of automating the processes of further improving AI and developing other technologies. And that’s also the point where AI is getting powerful enough that, say, threats of AI takeover or threats of AI undermining nuclear deterrence come into play. So it could make an enormous difference whether you have two years rather than two months, or six months rather than two months, to do certain tasks in safely aligning AI — because that is a period when AI might hack the servers it’s operating on, undermine all of your safety provisions, et cetera. It can make a huge difference, and the political momentum to take measures would be much greater in the face of clear evidence that AI had reached such spectacular capabilities.
To the extent you have a willingness to do a pause, it’s going to be much more impactful later on. And even worse, it’s possible that a pause, especially a voluntary pause, then is disproportionately giving up the opportunity to do pauses at that later stage when things are more important. So if we have a situation where, say, the companies with the greatest concern about misuse of AI or the risk of extinction from AI — and indeed the CEOs of several of these leading AI labs signed the extinction risk letter, while not the pause letter — if those companies, only the signatories of the extinction letter do a pause, then the companies with the least concern about these downsides gain in relative influence, relative standing.
And likewise in the international situation. So right now, the United States and its allies are the leaders in semiconductor technology and the production of chips. The United States has been restricting semiconductor exports to some states where it’s concerned about their military use. And a unilateral pause is shifting relative influence and control over these sorts of things to those states that don’t participate — especially if, as in the pause letter, it was restricted to training large models rather than building up semiconductor industries, building up large server farms and similar.
So it seems this would be reducing the slack and intensifying the degree to which international competition might otherwise be close, which might make it more likely that things like safety get compromised a lot.
Because the best situation might be an international deal that can regulate the pace of progress during that otherwise incredible rocket ship of technological change and potential disaster that would happen near when AI was fully automating AI research.
Second best might be you have an AI race, but it’s relatively coordinated — it’s at least at the level of large international blocs — and where that race is not very close. So the leader can afford to take six months rather than two months, or 12 months or more to not cut corners with respect to safety or the risk of a coup that overthrows their governmental system or similar. That would be better.
And then the worst might be a very close race between companies, a corporate free-for-all.
So along those lines, it doesn’t seem obvious that that is a direction that increases the ability for later explosive AI progress to be controlled or managed safely, or even to be particularly great for setting up international deals to control and regulate AI.

Value lock-in

Carl Shulman: When I think about, say, an application in North Korea or in the People’s Republic of China, it is already the official doctrine that information needs to be managed in many ways to manipulate the opinions and loyalties of the population. And you might say that these issues of epistemic propaganda and whatnot, a lot of what we’ve been talking about is not really relevant there, because it’s already just a matter of government policy.
But you could see how that could distort things even within the regime. So the Soviet Union collapsed because Gorbachev rose to the top of the system while thinking it was terrible in many ways. Good in many ways: he did want to preserve the Soviet Union; he just was not willing to use violence to keep it together.
But if the ruling party in some of these places sets conditions for, say, loyalty indexes, and then has an AI system that optimised to generate as high a loyalty index as possible — and it gives this result where the loyalty index is higher for someone who really can believe the party line in various ways, although of course changing it whenever the party authorities want something different, then you can wind up with successors or later decisions made by people who have been to some extent driven mad by these things that were mandated as part of the apparatus of loyalty and social control.
And you can imagine, say, in Iran, if the ruling clerics are getting AI advice and just visible evidence that some AIs systematically undermine the faith of people who use them, and that AIs directed to strengthen people’s faith really work, that could result relatively quickly in a collective decision for more of the latter, less of the former. And that gets applied also to the people making these decisions, and results in a sort of runaway ideological shift, in the same way that many of many groups became ideologically extreme in the first place: where there’s competitive signalling to be more loyal to the system, more loyal to the regime than others.
Rob Wiblin: Is there anything that we can do to try to make this kind of misuse less likely to occur?
Carl Shulman: So where a regime is already set up that would have a strong commitment to causing itself to have various delusions, there may be only so much you can do. But by developing the scientific and technical understanding of these kinds of dynamics and communicating that, you could at least help avoid the situations where the leadership of authoritarian regimes get high on their own supply, and wind up accidentally driving themselves into delusions that they might have wanted to avoid.
And at a broader level, to the extent these places are using AI models that are developed in much less oppressive locations, this can mean do not provide models that will engage in this sort of behaviour. Which may mean API access to the very powerful models: do not provide it to North Korea to provide propaganda to its population.
And then there’s a more challenging issue where very advanced open source models are then available to all the dictatorships and oppressive regimes. That’s an issue that recurs for many kinds of potential AI misuse, like bioterrorism and whatnot.

How democracies avoid coups

Carl Shulman: In general, the problem of how democracies avoid coups, avoid the overthrow of the liberal democratic system, tends to work through a setup where different factions expect that the outcomes will be better for them by continuing to follow along with the rules rather than going against them. And part of that is that, when your side loses an election, you expect not to be horribly mistreated on the next round. Part of it is cultivating principles of civilian control of the military, things like separating military leadership from ongoing politics.
Now, AI disrupts that, because you have this new technology that can suddenly replace a lot of the humans whose loyalties previously were helping to defend the system, who would choose not to go along with a coup that would overthrow democracy. So there it seems one needs to be embedding new controls, new analogues to civilian control of the military, into the AI systems themselves, and then having the ability to audit and verify that those rules are being complied with — that the AIs being produced are motivated such that they would not go along with any coup or overthrow the rules that were being set, and that setting and changing those rules required a broad buy-in from society.
So things like supermajority support. There are some institutions — for example, in the United States, the Federal Elections Commission — and in general, election supervisors have to have representation from both parties, because single-party referees for a two-party competitive election is not very solid. But this may mean passing more binding legislation, enabling very rapid judicial supervision and overview of violations of those rules may be necessary, because you need them to happen quite quickly, potentially. This also might be a situation where maybe you should be calling elections more often, when technological change is accelerating tenfold, a hundredfold, and maybe make some provisions for that.
That’s the kind of, unfortunately, human, social, political move that is large; it would require a lot of foresight and buy-in to it being necessary to make the changes. And then there’s just great inertia and resistance. So the difficulty of arranging human and legal and political institutions to manage these kinds of things is one reason why I think it’s worthwhile to put at least a bit of effort into paying attention to where we might be going. But at the same time, I think there are limits to what one can do, and we should just try to pursue every option we can to have the development of AI occur in a context where legal and political authority, and then enforcement mechanisms for that, reflect multiple political factions, multiple countries — and that reduces the risk to pluralism of one faction in one country suddenly dragging the world indefinitely in an unpleasant way.
Carl Shulman: In a world where there are thousands or millions of robots per human, to have a military and security forces that don’t depend on AI is pretty close to just disarmament and banning war. And I hope we do ban war and have general disarmament, but it could be quite difficult to avoid. And in avoiding it, just like the problem of banning nuclear weapons, if you’re going to restrict it, you have to set up a system such that any attempt to break that arrangement is itself stopped.
So I think we do have to think about how we would address the problem when security forces are largely automated, and therefore the protection of constitutional principles like democracy is really dependent on the loyalties of those machines.
Rob Wiblin: At some point, once most of the military power basically is just AI making decisions, having it saying that the way we’re going to keep it safe is that it will always follow human instructions, well, if all of the equipment is following the instructions of the same general, then that’s an extremely unstable situation. And in fact, you need to say no, we need them to follow principles that are not merely following instructions; we need them to reject instructions when those instructions are bad.
Carl Shulman: Indeed. And human soldiers are obligated to reject illegal orders, although it can be harder to implement in practice sometimes than to specify that as a goal. And yes, to the extent that you automate all of these key functions, including the function of safeguarding a democratic constitution, then you need to incorporate that same capacity to reject illegal orders, and even to prevent an illegal attempt to interfere with the processes by which you reject illegal orders. It’s no good if the AIs will refuse an order to, say, overthrow democracy or kill the population, but they will not defend themselves from just being reprogrammed by an illegal attempt.
So that poses deep challenges and is the reason why you want, A, problems of AI alignment and honest AI advice to be solved, and secondly, to have institutional procedures whereby the motives being put into those AIs reflect a broad, pluralistic set of values and all the different interests and factions that need to be represented.

Building trust between adversaries about which models you can believe

Carl Shulman: Right now this is a difficult problem — and you can see that with respect to large software products. So if Windows has backdoors, say, to enable the CIA to route machines running it, Russia or China cannot just purchase off-the-shelf software and have their cybersecurity agencies go through it and find every single zero-day exploit and bug. That’s just quite beyond their capabilities. They can look, and if they find even one, then say, “Now we’re no longer going to trust commercial software that is coming from country X,” they can do that, but they can’t reliably find every single exploit that exists within a large piece of software.
And there’s some evidence that may be true with these AIs. For one thing, there will be software programs running the neural network and providing the scaffolding for AI agents or networks of AI agents and their tools, which can have backdoors in the ordinary way. There are issues with adversarial examples, data poisoning and passwords. So a model can be trained to behave normally, classify images accurately, or produce text normally under most circumstances, but then in response to some special stimulus that would never be produced spontaneously, it will then behave in some quite different way, such as turning against a user who had purchased a copy of it or had been given some access.
So that’s a problem. And developing technical methods that either are able to locate that kind of data poisoning or conditional disposition, or are able to somehow moot it — for example, by making it so that if there are any of these habits or dispositions, they will wind up unable to actually control the behaviour of the AI, and you give it some additional training that restricts how it would react to such impulses. Maybe you have some majority voting system. You could imagine any number of techniques, but right now, I think technically you have a very difficult time being sure that an AI provided by some other company or some other country genuinely had the loyalties that were being claimed — and especially that it wouldn’t, in response to some special code or stimulus, suddenly switch its behaviour or switch its loyalties.
So that is an area where I would very much encourage technical research. Governments that want to have the ability to manage that sort of thing, which they have very strong reasons to do, should want to invest in it. Because if government contractors are producing AIs that are going to be a foundation not just of the public epistemology and political things, but also of industry, security, and military applications, the US military should be pretty wary of a situation where, for all they know, one of their contractors supplying AI systems can give a certain code word, and the US military no longer works for the US military. It works for Google or Microsoft or whatnot. That’s just a situation that is just not very appealing. It’s not one that would arise for a Boeing.
Even if there were a sort of sabotage or backdoor placed in some systems, the potential rewards or uses of that would be less. But if you’re deploying these powerful AI systems at scale, they’re having an enormous amount of influence and power in society — eventually to the point where ultimately the instruments of state hinge on their loyalties — then you really don’t want to have this kind of backdoor or password, because it could actually overthrow the government, potentially. So this is a capability that governments should very much want, almost regardless, and this is a particular application where they should really want it.
But it also would be important for being sure that AI systems deployed at scale by a big government, A, will not betray that government on behalf of the companies that produce them; will not betray the constitutional or legal order of that state on behalf of, say, the executive officials who are nominally in charge of those: you don’t want to have AI enabling a coup that overthrows democracy on behalf of a president against a congress. Or, if you have AI that is developed under international auspices, so it’s supposed to reflect some agreement between multiple states that are all contributing to the endeavour or have joined in the treaty arrangement, you want to be sure that AIs will respect the terms of behaviour that were specified by the multinational agreement and not betray the larger project on behalf of any member state or participating organisation.
So this is a technology that we really should want systematically, just because empowering AIs this much, we want to be able to know their loyalties, and not have it be dependent on no one having inserted an effective backdoor anywhere along a chain of production.

Opportunities for listeners

Carl Shulman: There is huge social value potentially to be provided by predicting the political consequences and economic consequences of different policies. So when we talked earlier about the application to COVID, if politicians were continuously getting smart feedback about how this will affect the public’s happiness two years later, four years later, six years later, and their political response to the politician, that could really shift discourse.
But it’s not the kind of thing that’s likely to result in an enormous amount of financing, unless you might have some government programme to fight misinformation that attempts to create models, or fine-tune open source models, or contract large AI companies to produce AI that appears trustworthy on all of the easy examinations and probes and tests one can make for bias. And it might be that different political actors in government could demand that sort of thing as a criterion for AI being deployed in government, and that could be potentially significant.
Rob Wiblin: Yeah. Are there any other opportunities for listeners potentially to cause this epistemic revolution to happen sooner or better that are worth shouting out?
Carl Shulman: Yeah. Some small academic research effort or the like is going to have difficulty comparing to the resources that these giant AI companies can mobilise. But one enormous advantage they have is independence. So watchdog agencies or organisations that systematically probe the major corporate AI models for honesty, dishonesty, bias of various kinds — and attempt also to fine-tune and scaffold those models to do better on metrics of honesty of various kinds — those could be really helpful, and provide incentives for these large companies to produce models that both do very well on any probe of honesty that one can muster from the outside, and secondly, do so in a way that is relatively robust or transparent to these outside auditors.
But right now this is something that is, I think, not being evaluated in a good systematic way, and there’s a lot of room for developing metrics.

Articles, books, and other media discussed in the show

Carl’s work:

First appearance on The 80,000 Hours Podcast: #112 – Carl Shulman on the common-sense case for existential risk work and its practical implications
Appearances on the Dwarkesh Podcast:
- Carl Shulman (Pt 1) – Intelligence explosion, primate evolution, robot doublings, & alignment
- Carl Shulman (Pt 2) – AI takeover, bio & cyber attacks, detecting deception, & humanity’s far future
Reflective Disequilibrium — Carl’s blog
Propositions concerning digital minds and society (with Nick Bostrom)
Sharing the world with digital minds (with Nick Bostrom)
Racing to the precipice: A model of artificial intelligence development (with Stuart Armstrong and Nick Bostrom)
Carl’s response on LessWrong to Katja Grace’s post, Let’s think about slowing down AI
Whole brain emulation and the evolution of superorganisms

Artificial sentience:

The Google engineer who thinks the company’s AI has come to life by Nitasha Tiku
Artificial Intelligence, Morality, and Sentience (AIMS) Survey — 2023 poll by the Sentience Institute by Janet Pauketat, Ali Ladak, and Jacy Reese Anthis
Improving the welfare of AIs: A nearcasted proposal by Ryan Greenblatt
Passion of the Sun Probe — a vignette by Eric Schwitzgebel (a former guest of the show)
Artificial intelligence: An evangelical statement of principles

AI forecasting:

FutureSearch
Forecasting future world events with neural networks by Andy Zou et al.
Approaching human-level forecasting with language models by Danny Halawi et al.

AI and economic growth:

Artificial intelligence and economic growth by Philippe Aghion, Benjamin F. Jones, and Charles I. Jones
Economic growth under transformative AI by Philip Trammell and Anton Korinek
Explosive growth from AI automation: A review of the arguments by Ege Erdil and Tamay Besiroglu

Other recent AI advances:

Anthropic’s Constitutional AI: Harmlessness from AI feedback
AI Control: Improving safety despite intentional subversion by Ryan Greenblatt and others at Redwood Research
Eureka! NVIDIA research breakthrough puts new spin on robot learning by Angie Lee
The media very rarely lies by Scott Alexander

Other 80,000 Hours podcast episodes:

Transcript

Table of Contents

1 Cold open [00:00:00]
2 Rob’s intro [00:01:16]
3 The interview begins [00:03:24]
4 COVID-19 concrete example [00:11:18]
5 Sceptical arguments against the effect of AI advisors [00:24:16]
6 Value lock-in [00:33:59]
7 How democracies avoid coups [00:48:08]
8 Where AI could most easily help [01:00:25]
9 AI forecasting [01:04:30]
10 Application to the most challenging topics [01:24:03]
11 How to make it happen [01:37:50]
12 International negotiations and coordination and auditing [01:43:54]
13 Opportunities for listeners [02:00:09]
14 Why Carl doesn’t support enforced pauses on AI research [02:03:58]
15 How Carl is feeling about the future [02:15:47]
16 Rob’s outro [02:17:37]

Cold open [00:00:00]

Carl Shulman: I think the pace of technological, industrial, and economic change is going to intensify enormously as AI becomes capable of automating the processes of further improving AI and developing other technologies. And that’s also the point where AI is getting powerful enough that, say, threats of AI takeover or threats of AI undermining nuclear deterrence come into play. So it could make an enormous difference whether you have two years rather than two months, or six months rather than two months, to do certain tasks in safely aligning AI — because that is a period when AI might hack the servers it’s operating on, undermine all of your safety provisions, et cetera. It can make a huge difference, and the political momentum to take measures would be much greater in the face of clear evidence that AI had reached such spectacular capabilities.

Rob’s intro [00:01:16]

Rob Wiblin: Hey listeners, Rob here. We’re back for part two of my marathon conversation with the polymath researcher Carl Shulman, whose detailed and concrete visions of how superhuman AI might play out have been highly influential.

Part one covered the economy and national security after AGI, while this time the unifying themes are government and politics after AGI.

We discuss, in order:

How trustworthy superhuman AI advisors could revolutionise how we’re governed.
The many ways that credible AI advisors could have enabled us to avoid COVID-19.
The risk of society using AI to lock in its values.
The difficulty of preventing coups once AI is key to the military and police.
What international treaties we need to make this go well.
How to make AI superhuman at forecasting the future.
Whether AI can help us with intractable philosophical questions.
Whether we need dedicated projects to make wise AI advisors, or it happens automatically as models scale.
Why Carl doesn’t support AI companies voluntarily pausing AI research, but sees a stronger case for binding international controls once we’re closer to ‘crunch time.’
And opportunities for listeners to contribute to making the future go well.

We’ve organised this so you can listen and follow it all without going back to part one first — but if you want to, it’s the episode right before this one: Episode 191: Part 1.

If you’d like to work on the topics in this episode, there’s a census where you can put your name forward that I’ll mention at the end of the episode.

And now, I again bring you Carl Shulman.

The interview begins [00:03:24]

Rob Wiblin: So, in part one of this conversation, we mostly focused on economic concerns and productivity and so on.

Now we’re going to turn our attention to a different broad cluster, which is less about economic output and more about knowledge production; wisdom; politics; public discourse and debate; figuring out good moral goals to pursue; prudent, wise, collective decision making; coordination; and all of that. And that side of society can be reasonably independent of the economic side. You could, in principle, have a society that is very rich, but rationality and public discourse remain fairly poor, and that leads to bad outcomes in other ways. These two things are connected, of course, because wisdom can lead to economic productivity, but one could be going a whole lot better than the other, in principle.

Now, I think it’s not going to be a great mystery to listeners why getting that epistemic side of society functioning well might put us in a better position to create a positive future. But why is it important to think about how cheap, superhuman AI could affect all of those things now, rather than just sitting tight and crossing that bridge when we come to it?

Carl Shulman: Right. So in our systems of governance and policy, the work of journalists and scientists and politicians, bureaucrats, voters, those are cognitive tasks — the kind of tasks where the marginal impact of AI is largest. So we expect big changes.

And some aspects of it we should expect basically to be eventually solved by the advance of technology. So I don’t think we should doubt that any really advanced technological society will have the germ theory of disease or quantum mechanics. So we shouldn’t doubt that it will be technologically feasible to have very good social epistemology, very good science in every area, very good forecasting of the future by AIs within the limits of what is possible.

But we can worry about two things. One is: to what extent will we wind up going down a path where we actively suppress the potential of AI technology to make our society’s epistemology and understanding of the world better, because that results in knowledge that we’re uncomfortable with, or that some actors in society are uncomfortable with?

And then in the shorter term, it’s possible that we may really want to have this kind of AI assistance in thinking through the problems that we face, including the problems involved in the development and deployment of more advanced AI. And there, there are questions of how advanced are these technologies at the times when you really want to have them for some of these early decisions? What are the procedures by which people who might benefit from such AI advice and assistance, how can they know they can trust that advice in high-stakes situations? So if you’re deciding what’s the proper way to regulate Google or OpenAI, and then you consult ChatGPT, and it says, “You should have this regulation that’s very favourable to the company that created me,” you might doubt, can you really trust this?

So even if the AI systems are capable of making great advances, and people have made the necessary investments in keeping that area of AI applications up to par with other applications, still you may need to do a lot of work on human society’s ability to trust, verify, and test claims of honest AI advice, honest AI epistemic assistance — and then that capability and its influence in society grow faster than things like AI disinformation or manipulative propaganda, or other ways in which AI could be used to make us more confused, more deranged as a society, rather than saner and smarter and more sophisticated.

Rob Wiblin: What are some of the most important or interesting or consequential uses to which these amazingly insightful AIs might be put?

Carl Shulman: In talking about the maturation of hard technology and things like medicine, engineering, and whatnot, basically we expect a lot of advancement. We can get into some of the details and ways that might work, but the things that people maybe think less about are ones that more extend advanced knowledge and advice to domains that are currently considered a realm of subjective judgement, or where we don’t wind up with robust, well-founded scientific consensus on how to answer them.

So that would include a lot of questions in social science, forecasting of events in the world, things like history, philosophy, foreign affairs, trying to figure out the intentions of other states and leaders, trying to figure out what political results will follow from different institutional or legal manoeuvres. Things like hot-button questions that people have very strong feelings about and so often have difficulty thinking about objectively, so that people with different biases will end up giving different answers on questions to which there ultimately is a factual answer, even if it’s difficult or expensive to acquire that answer.

These are things that it’s possible to slip into a mode of imagining a future where AI has provided all of this technology. It’s sort of science fiction-y in some fashions, but things like the quality of government, or the degree to which consumers can navigate absurdly complicated financial products, or distinguish between truth and lies from politicians, where those are all the same, or anything like the ability to work out deals and agreements between businesses or countries.

Those things are also potentially very susceptible indeed to some of the changes that AI enables. And not just the capabilities of the AI, but that AI has the potential to be more transparent than humans, so that we can understand what it’s doing and verify it to one another or just to ourselves, that an AI works in a particular way and it’s more possible to incentivise it to do particular things.

Rob Wiblin: OK, so there’s a lot there. General point is, often we might maybe picture the future where hard sciences and technology have changed a lot, but so much else of society stays fixed. I’m not sure entirely why we have that tendency in science fiction to imagine that social aspects of the world aren’t changing, even though hard technology is shifting enormously, but there does seem to be a general tendency.

COVID-19 concrete example [00:11:18]

Rob Wiblin: Can you describe, in the formation of some sort of policy approach to some challenging question, what are the different points at which having this mechanism for establishing accurate forecasts, or the probability of different statements being true, how that improves the policy outcome at the end of the chain?

Carl Shulman: Yeah, maybe a good and sort of still painfully fresh example would be the COVID-19 pandemic. We saw throughout any number of situations where epistemological conflict and failures were important. So we could go from beginning to end. Actually, we should go even earlier.

It was already sort of common knowledge that massive pandemics had happened before and were likely to happen again. There was the 1918 flu pandemic that killed tens of millions of people and had significant global economic damage. And projecting that forward, it was clear to many people studying pandemics in public health that, yeah, you could have damage on this scale.

And so the actual damage from COVID, $10 trillion plus, there were various things that could have been done to reduce the danger in advance. So it was known that coronaviruses were a big risk. They were commonly mooted as the next big pandemic. SARS-1 and MERS were sort of recent outbreaks that had been contained. And so all of these. You’ve had now many episodes talking about the things that could be done to prevent pandemics.

Rob Wiblin: We’ve had a few. Not enough, evidently, but we’ve had a few.

Carl Shulman: Yeah. So if you just take the actuarial value, the year-by-year chance that a pandemic is going to come up, it’s definitely worth spending tens of billions of dollars, $100 billion a year. It’s worth it spending $100 billion this year to block a pandemic next year. It’s worth spending hundreds of billions of dollars over a decade to block a pandemic next decade. And if you actually spent hundreds of billions of dollars on the things that were most likely to actually block that, that likely would have prevented this.

For example, we would have had global surveillance networks and flexible sensors that were more able to detect these things, so when the first patients were coming into hospitals in Wuhan, that new pathogen would have been identified more immediately.

The local officials who may have tried to prevent information from going up the chain too quickly and tried to avoid having a panic, which could perhaps hurt those people’s careers, at least until more senior levels and the Chinese government got involved: if it were the case that the top-level parts of the government had AIs that were forecasting what could happen here, how plausible is it that this will go out of control, cost us trillions of dollars, significantly damage the reputation of the party and the leadership? And at this point, I think it has been quite damaging to them. So were they able to forecast the likelihood of that undermining the existing regime and setup and the strength of the country, that’s sort of a strong push to deal with this situation.

And the advanced AI stuff is very good at determining what is going on at the lower levels, interpreting the mixed data and reports. It can read all the newspaper articles, it can do all those things. And then at the lower level, AI advisors to the local officials can be telling them what’s the sensible thing to do here in light of the objectives of the country as a whole. And then instead of the local official being maybe incentivised to minimise the story, maybe from a perspective of protecting themselves from a stink being made, they can just follow the truthful, honest AI that makes clear this is the reasonable thing to do in light of the larger objectives, and therefore that covers the rear end of the parties who follow it locally. Whereas otherwise, bureaucrats have a tendency to act to minimise their chance of being blamed and fired — or worse, in the PRC. So that helps a lot.

Going out from there, at the point when it was still possible to contain it, perhaps, in the way that SARS and whatnot were confined: by noticing it immediately, you could have outgoing planes and such stopped, and super effective contact tracing done along those fronts, and telling whether a given contact tracing setup was sufficient to deal with the problem — and explaining if it wasn’t sufficient, what changes would be needed to be made.

And that could go to the top-level officials. So in the United States, there was a terrible episode where the CDC actually prevented testing for COVID, because they were developing their own test, which they screwed up and delayed.

Rob Wiblin: I’d forgotten about this one.

Carl Shulman: And meanwhile, they prevented anyone else from testing it. And so this leads to a period where local contact tracing, or even just understanding the development of the pandemic, was just severely impaired.

So AI models that are trustworthy to all the leadership at the top, and that can also, in a trustworthy fashion, forecast how the further development of this pandemic can strike against their reelection chances, and explain how different policy choices now can help lead to results then that would push for, “Break the CDC’s objections on this point. Get the real test out there.”

By the same token, with Operation Warp Speed, which was the effort under the Trump administration to put up the money to mass produce vaccines in advance, they went fairly big on it by historical standards, but not nearly as big as they should have. Spending much more money to get the vaccines moderately faster and end lockdowns slightly earlier would be hugely valuable. You’d save lives, you’d save incredible quantities of money. It was absolutely worth it to spend an order of magnitude or more additional funds on that. And then many other countries, European countries, that haggled over the price on these things, and therefore didn’t get as much vaccine early, at a cost of losing $10 or $100 for every dollar you “save” on this thing.

If you have the AI advisors, and they are telling you, “Look, this stuff is going to happen; you’re going to regret it.” The AI advisor is credible. It helps navigate between politicians not fully understanding the economics and politics. It helps the politicians deal with the public, because the politicians can cite the AI advice, and that helps to deflect blame from them, including controversial decisions.

So one reason why there was resistance to Operation Warp Speed and similar efforts is you’re supporting the development of vaccines before they’ve been fully tested on the actual pandemic. And you may be embarrassed if you paid a lot of money for a vaccine that turns out not actually to be super useful and super helpful. And if you’re risk averse, you’re very afraid of that outcome. You’re not as correspondingly excited about saving everybody as long as you’re not clearly blameworthy. Well, having this publicly accessible thing, where everyone knows that HonestGPT is saying this, then you look much worse to the voters when you go against the advice that everyone knows is the best estimate, and then things go wrong.

I think from that, you get vaccines being produced in quantity earlier and used. And then at the level of their deployment, I think you get similar things. So Trump, Donald Trump, in his last year in office, he was actually quite enthusiastic about getting a vaccine out before he went up for reelection. And of course, his opponents were much less enthusiastic about rapid vaccine development before election day, compared to how they were after. But the president wanted to get the system moving as quickly as possible in the direction of vaccines being out and deployed quickly, and then lockdowns reduced and such early — and in fact, getting vaccines fast, and lockdowns and NPIs over quickly, was the better policy.

And so if he had access to AI advisors telling him what is going to maximise your chances of reelection, they would suggest, “The timeline for this development is too slow. If you make challenge trials happen to get these things verified very early, you’ll be able to get the vaccine distributed months earlier.” And then the AI advisor would tell you about all the problems that actually slowed the implementation of challenge trials in the real pandemic. They say, “They’ll be quibbling about producing a suitable version of the pathogen for administration in the trial. There’ll be these regulatory delays, where commissions just decide to go home for the weekend instead of making a decision three days earlier and saving enormous numbers of lives.”

And so the AI advisor would point out all of these places where the system is making the top-level objective of getting a vaccine quickly, where that’s going wrong, and clarifies which changes will make it happen quicker. “If you replace person X with person Y; if you cancel this regulation, these outcomes will happen, and you’ll get the vaccine earlier. People’s lives will be saved, the economy will be rebooted,” et cetera.

And then we go from there out through the end. You’d have similar benefits on the effect of school closures, on learning loss. You’d have similar effects on anti-vaccine sentiment. So processing the data about the demonisation of vaccines that happened in the United States later on, and so having that as a very systematic, trusted source — where even the honest GPT made by conservatives, maybe Grok, Elon Musk’s new AI system, Grok would also be telling you, if you are a Republican conservative who’s suspicious of vaccines, especially after the vaccine is no longer associated with the Trump administration, but the Biden administration, then having Grok, an equivalent, telling you these things, having it tell you that anti-vaccine things are to the disadvantage of conservatives, because they were getting disproportionately killed and reducing the number of conservative voters…

This is my soup to nuts how AI advisors would have really saved us from COVID-19 and the mortality and health loss and economic losses, and then just the political disruption and ongoing derangement of politics as a result of that kind of dynamic.

Sceptical arguments against the effect of AI advisors [00:24:16]

Rob Wiblin: Yeah, that was outstanding. I think there’s a certain kind of person who I think will hear — if they were still with us, which I don’t think actually would be — but if they were, they would listen to that and they would think, this is classic hard science people thinking that the social world is as easy to fix as physics or chemistry. And they might have a whole series of objections thinking that really, these questions, it doesn’t matter how big a brain, doesn’t matter if you’re super smart, doesn’t matter if you’re an AI who’s been trained on a million years of experience: some of these things are just not knowable, at least not until you did experiments that these models, you wouldn’t be able to do.

So there’s a question of how much intelligence would actually buy you. And then you think everyone would trust what the AI says. But doesn’t history show us that people can believe whatever insane stuff? And even if the track record of it is extremely good, many people wouldn’t trust it for all of the kind of reasons that people have poor judgement about who to trust today.

What other objections might they feel about this whole picture? Politics is about power; it’s not just about having the right understanding of the issues, like you were saying. So Trump was very keen on getting the vaccines out as quickly as possible while he was running for reelection, and then this kind of flipped once he was no longer in power. But is it the case, you know, the Democrats who were kind of sceptical about getting the vaccines out really quickly while they were running against Trump as the incumbent, wouldn’t they have been going to their advisors and asking for advice on how to slow down the vaccine as much as possible in order to increase their chances of winning that election? Maybe some more ideas that this is more competitive; it’s not as positive-sum as you’re imagining.

Is there anything you’d like to say to that whole cluster of sceptical aggravates? We could go one by one, or I could just throw them all at you at once.

Carl Shulman: To start with: is it possible to make progress on any of these kinds of questions? I’d say yes, because we see differences in performance on them. So some people do better on forecasting particular things, some people have greater ability and expertise in these domains, and often they’re among the sort of people who are relevantly expert. If you had sufficiently fine-grained and potent measures of their ability, there is a consensus, or a relatively solid thing. But that’s not necessarily trivial to distinguish for politicians, and even more so the public, from where that lies.

And again, remember that we have made progress in the world: we understand more about disease, we understand more about the economy than we did 200 years ago. And in this world, of course, the expansion of cognitive effort for understanding and improving quality has grown by many orders of magnitude. And so, just as we’ve seen epistemic progress on things in the past throw an enormously greater quantity of resources at it, which is more effectively directed at getting results. And backpropagating from that means the capacities are much greater.

And so, for each of these questions that I mentioned, the view that vaccines could be accelerated, could be delivered substantially faster than they ever had been before, that was the one that was systematically held more by people who deeply understood various aspects of the situation. And it got traction — as with Operation Warp Speed — but it got less traction than it would if it hadn’t been the case that many people with sort of ill-thought-out over-indexing on “past experience of vaccine trials have been slow in the past” without properly adjusting for the difference in willingness to go forward with things quickly — from differences with mRNA vaccines compared to other types and so on, you could have huge difference.

On the who listens, I think it’s important to these examples that people who are already in possession of power — the kind of people could be in a position to have AI advice that they trusted or expected to be working for them, and if it wasn’t working for them, they might no longer be in power, for some of them. So the top leadership in a country like China or the US can expect that they have some AIs that are constructed by their ideological allies or subordinates or whatnot, or at least audited and verified by those people. And so to have access.

And then, because those top leaders have an interest in actually getting results that are popular, or results that are actually effective, they don’t necessarily need to convince everyone else that’s the case to radically change their own behaviour and to navigate some of the problems of going down the chain. So even if, say, Trump had really wanted to expedite the vaccine, but he didn’t understand all of the technology, he couldn’t get himself in place at all of these locations, or hire enough people who he trusted who were expert enough to really implement the thing: with AI, having a shortage of people and capacity to do that kind of thing for any particular goal is not an issue, so these sort of implementation issues are less.

Now, your last question about what about other adversarial actors trying to mess things up, make things worse? In the United States, say it were the case that the opposition party decided they were going to try and make the pandemic worse, or interfere with efforts to prevent it, on the basis that it would make the incumbent president more unpopular in a way that was sufficiently to their advantage.

So one thing is there really is a lot of elite and activist and general public opinion that would think that’s absolutely horrible. Now, it’s not always easy for the public to navigate that and identify if it’s happening — and indeed, it is a common political tactic, when the public misallocates blame for things, to try and manipulate the system in a way to create failures that the public will not attribute to the ones creating it. And so it’s thought that this may be one reason for, under the Obama administration, there were accusations that Republicans in Congress were doing manoeuvres of this sort — of vetoing things, creating gridlock, and then campaigning on “nothing has been achieved because of gridlock.”

Yeah, that kind of dynamic can happen, but an important part of it happening is the public misattribution. Unfortunately, assessing political outcomes is challenging. So, in principle, even simple heuristics might do pretty good. So if voters were able to reliably assess, “How have things been? Are you better off than you were four years ago?,” that could go pretty far. It would result in a systematic incentive of politicians for winning elections to try and make people feel better off. Because if they asked their AI advisors, “If I vote for party one, will I be better off in four years, as I judge it or in these respects, than if I vote for party two?”

And it would be even better if they could do the counterfactuals and figure out which things are blameworthy. So actually, the leadership doesn’t cause hurricanes, mostly. And so voters would do better if they were able to distinguish between, I’m worse off here because of a natural disaster, rather than I am worse off because the government had a poor response to the natural disaster. And so insofar as voters, even a nontrivial portion of voters — so if you have 5% of voters who take seriously this kind of thing — then act as swing voters based on it, then that could be an absolutely enormous political effect. And it can go larger.

If you think about the audience for, say, The New York Times: yes, many readers may want to hear things that flatter their ideological preconceptions, but they also value a sort of sophisticated, apparently honest and rigorous and accurate source of information. And to the extent that it becomes clear what is more that and what is less than that, people who have some pull in this direction, it will move them along.

Value lock-in [00:33:59]

Rob Wiblin: Nice. OK, so we’ve mostly been talking about ways that AI might be able to help us all converge on the truth, at least where that’s possible. But in principle it could also be used to entrench falsehoods more thoroughly. And people have been very concerned about that over the last year or two, seeing lots of ways that AI might make the information environment worse rather than better.

I know one concern you’ve expressed in conversation before is the possibility that people might opt to use AI advisors to kind of extremise their views, or lock them in so that they can’t easily be changed with further reflection or that empirical errors won’t be discovered. To what extent do you think many people are likely to want to use AI in that way?

Carl Shulman: The existing evidence is a bit tricky and mixed. So on the one hand, there’s very little demand for an explicit propaganda service that says, “We are just going to lie to you all day long and tell you things to try and make you more committed to your political faction or your religion.” Pravda was called “pravda” (literally “truth”) — not, “We lie to you for communism.” And so if the technological developments make it sufficiently clear what honest AI is and what it isn’t, then that suggests that high honesty would be something that would be demanded or preferred, at least where it’s a ceteris paribus choice.

On the other hand, when we look at something like media demand, people seem to quite significantly prefer — when there is market competition and not a monopolistic situation — to read media that reinforces and affirms their prejudices and dogmas and whatnot. And people feel good cheering on their team and feel bad hearing things that suggest that their team is wrong in any way, or sort of painful facts or arguments or considerations or things that make people sad.

And since people’s consumption of political news is heavily driven by that kind of dynamic, you get this perverse effect where, first, people mostly don’t know that much about policy because it’s not their profession; it’s something that they’re occasionally touching on, incidentally. But then those who do consume a lot of political information are very often doing it for this emotional charge of being reinforced in their self-perception and worldviews in certain ways.

So you might not buy a service that says it will lie and deceive and propagandise you to maintain a commitment to your politics or your religion. But if some fig leaf can be provided, something like, “Axiomatically by faith, my precise religious views are correct and everyone else’s religious views are wrong” — or likewise with politics or ethics, “That’s known a priori; there’s no way I could be wrong about that” — “This so-called honest AI seems to give contrary answers, and it tends to undermine people’s commitment to this thing that I cherish. Therefore, I will ask my AI for assistance in managing my information environment, my social support” — which can be extremely important when people are constantly talking with AI companions who are with them at all hours of the day, helping them navigate the world — “to help me maintain my good ethical values or my good religious commitments and generate and explain the thing to me.”

And then you get some products that will have a contorted epistemology that explains why they’re failing to reason about topics connected to these ideologically sensitive areas differently from how they reason about, “Will this code run? Will the faucets in this house turn on or not?” There’ll have to be some amount of doublethink, but if it’s obscured, people could be more into it.

And one of the very early steps of such obscurement would be the AI getting you to stop thinking so much about how it has been set up to be propaganda — so continuously shaping your social and emotional and informational environment in that direction. And then maybe going forward, you may have technologies, neurotechnologies, that allow more direct alteration of human moods, emotions, things that might allow people to just bake in an arbitrary commitment, a deep love of 2024-era North Korean Juche ideology, or the precise dogma of one’s particular tiny religious sect, and just lay out an entire picture of the world based on that.

Rob Wiblin: I feel pretty unsure how worried to be about this issue. As you say, there’s a fair bit of this that goes on already, just with people’s choices about what news to read, what friends to have, what conversations to have, what evidence to look into and what evidence not to look into. To what extent do you think this would cause things to get significantly worse? Or would it just maybe be a continuation of the kind of current level of closed-mindedness that people have, just with a new sort of media that they’re consuming? Where they’re absorbing information from AIs, perhaps, rather than going to the front page of their favourite newspaper that flatters their preconceptions?

Carl Shulman: I’d say it would either make things much better or much worse cumulatively, all of the applications of AI to these things.

On the getting worse side, an obvious one is that, for many positions — and particularly many falsehoods — find it difficult to command a lot of expert support. So it’s very difficult, say, to muster large supplies of scientists who will take a young Earth creationist attitude. It’s to the point where you get tiny numbers over a whole planet available as spokespeople. And in media you have issues where views that are popular among the kind of people who like to become journalists have a much easier time: it’s much cheaper for those to be distributed and things produced.

Superabundant cognitive labour allows you to build up webs of propaganda that are much more thorough and consistent and deep and appealing and high quality for positions that would otherwise be extremely niche, and where it would be very difficult to get large populations of workers who would produce them, and produce them in a compelling-looking way.

Rob Wiblin: Yeah. My impression is that people vary a lot on how much they would be likely to want to use this technology. Some people, they really have the disposition of a true believer, where they’re very committed to a political or religious view, and the idea of restricting their access to other information in order to ensure that they remain a good person in their eyes would sound appealing to them. But I think it’s a minority of people, at least of people that I know, and for most people, the prospect of trying to close themselves off to contrary information so that they couldn’t change their view creeps them out enormously.

So I suppose I’m a little bit hopeful that many people are not that strongly committed to any particular ideological bent. Many people just do not take that great an interest in religion or in politics or any other issue, so it’s unclear why this would be super appealing to them. So that’s my hopeful angle.

Carl Shulman: Yeah, I do have a lot of hope for that. And my actual best guess is that the result of these technologies comes out hugely in favour of improved epistemology, and we get largely convergence on empirical truth wherever it exists.

But when I think about, say, an application in North Korea or in the People’s Republic of China, it is already the official doctrine that information needs to be managed in many ways to manipulate the opinions and loyalties of the population. And you might say that these issues of epistemic propaganda and whatnot, a lot of what we’ve been talking about is not really relevant there, because it’s already just a matter of government policy.

But you could see how that could distort things even within the regime. So the Soviet Union collapsed because Gorbachev rose to the top of the system while thinking it was terrible in many ways. Good in many ways: he did want to preserve the Soviet Union; he just was not willing to use violence to keep it together.

But if the ruling party in some of these places sets conditions for, say, loyalty indexes, and then has an AI system that optimised to generate as high a loyalty index as possible — and it gives this result where the loyalty index is higher for someone who really can believe the party line in various ways, although of course changing it whenever the party authorities want something different, then you can wind up with successors or later decisions made by people who have been to some extent driven mad by these things that were mandated as part of the apparatus of loyalty and social control.

And you can imagine, say, in Iran, if the ruling clerics are getting AI advice and just visible evidence that some AIs systematically undermine the faith of people who use them, and that AIs directed to strengthen people’s faith really work, that could result relatively quickly in a collective decision for more of the latter, less of the former. And that gets applied also to the people making these decisions, and results in a sort of runaway ideological shift, in the same way that many of many groups became ideologically extreme in the first place: where there’s competitive signalling to be more loyal to the system, more loyal to the regime than others.

Rob Wiblin: It sounds like the most troubling variation of this is where it’s imposed on a large group of people by some sort of government or authority. In a very pluralistic society, where different people who are already kind of ideological extremists on different political or religious views, some of them decide to go off in various different directions, convincing themselves of even more extreme views, that doesn’t sound great, but it’s not necessarily catastrophic because there would be still lots of disagreement. But inasmuch as you had a government in Iran managing to radicalise their population using these tools all at once, it’s easy to see how that takes you down a very dark path quite quickly.

Is there anything that we can do to try to make this kind of misuse less likely to occur?

Carl Shulman: I’ve got nothing, Rob. Sorry.

Rob Wiblin: OK, that’s cool.

Carl Shulman: No, no, I’ve got an answer. So where a regime is already set up that would have a strong commitment to causing itself to have various delusions, there may be only so much you can do. But by developing the scientific and technical understanding of these kinds of dynamics and communicating that, you could at least help avoid the situations where the leadership of authoritarian regimes get high on their own supply, and wind up accidentally driving themselves into delusions that they might have wanted to avoid.

And at a broader level, to the extent these places are using AI models that are developed in much less oppressive locations, this can mean do not provide models that will engage in this sort of behaviour. Which may mean API access to the very powerful models: do not provide it to North Korea to provide propaganda to its population.

And then there’s a more challenging issue where very advanced open source models are then available to all the dictatorships and oppressive regimes. That’s an issue that recurs for many kinds of potential AI misuse, like bioterrorism and whatnot.

How democracies avoid coups [00:48:08]

Rob Wiblin: Yeah, you mentioned a while back that we could end up with this very strange situation, where you see an enormous technological and societal revolution that occurs during a single term in office for a given government, because you see such big changes over just a handful of years. Are there any changes that you’d like to see in the US or UK to make it less likely for there to be some sort of power grab? Where a government that happens to be in office at the time that some new, very powerful tool of social influence comes online, that they might try to use that to entrench themselves and ensure that they continue to get reelected indefinitely?

Carl Shulman: I mean, it does not seem like a super easy problem.

Rob Wiblin: I bring the hard ones to you, Carl.

Carl Shulman: Yeah. In general, the problem of how democracies avoid coups, avoid the overthrow of the liberal democratic system, tends to work through a setup where different factions expect that the outcomes will be better for them by continuing to follow along with the rules rather than going against them. And part of that is that, when your side loses an election, you expect not to be horribly mistreated on the next round. Part of it is cultivating principles of civilian control of the military, things like separating military leadership from ongoing politics.

Now, AI disrupts that, because you have this new technology that can suddenly replace a lot of the humans whose loyalties previously were helping to defend the system, who would choose not to go along with a coup that would overthrow democracy. So there it seems one needs to be embedding new controls, new analogues to civilian control of the military, into the AI systems themselves, and then having the ability to audit and verify that those rules are being complied with — that the AIs being produced are motivated such that they would not go along with any coup or overthrow the rules that were being set, and that setting and changing those rules required a broad buy-in from society.

So things like supermajority support. There are some institutions — for example, in the United States, the Federal Elections Commission — and in general, election supervisors have to have representation from both parties, because single-party referees for a two-party competitive election is not very solid. But this may mean passing more binding legislation, enabling very rapid judicial supervision and overview of violations of those rules may be necessary, because you need them to happen quite quickly, potentially. This also might be a situation where maybe you should be calling elections more often, when technological change is accelerating tenfold, a hundredfold, and maybe make some provisions for that.

That’s the kind of, unfortunately, human, social, political move that is large; it would require a lot of foresight and buy-in to it being necessary to make the changes. And then there’s just great inertia and resistance. So the difficulty of arranging human and legal and political institutions to manage these kinds of things is one reason why I think it’s worthwhile to put at least a bit of effort into paying attention to where we might be going. But at the same time, I think there are limits to what one can do, and we should just try to pursue every option we can to have the development of AI occur in a context where legal and political authority, and then enforcement mechanisms for that, reflect multiple political factions, multiple countries — and that reduces the risk to pluralism of one faction in one country suddenly dragging the world indefinitely in an unpleasant way.

Rob Wiblin: Yeah, it seems very hard. Getting into some prosaic details, in the US, the elections are on a sort of fixed schedule, and I think it would be extremely difficult and require a constitutional amendment to change it. So a little bit of a heavy lift to fix that. In the UK, I think elections can be mandated just by Parliament: a majority in Parliament can say that you need to have elections on this date, unless Parliament says that you can’t. Although it’s hard to see… I’m not sure that there’s any system by which you can prevent a new majority in Parliament from refusing to have elections, that is, if we’re trying to shorten them.

Carl Shulman: Ask the king, right?

Rob Wiblin: Yeah, the king, I think, could insist on elections, although it might be unclear whether the king is able to do that without the advice of the prime minister asking them to call that election. I’m not sure exactly what the triggers are there, but yeah, it seems more practical in the UK, but still quite challenging. And I haven’t heard many people yet calling for six-month terms in office, but maybe that time will come.

I thought it seemed really key to what you were saying, that the reason that you can have the maintenance of a kind of liberal democratic situation in a country like the UK or in the US is that the losing party in a given election thinks that it’s in their interest to go along with a transfer of power, because they won’t be horribly mistreated and they’ll have a chance to win back power in the future; they prefer to have the system continue, even though they lost in this instance. And furthermore, the people running the government don’t expect that the military would assist them in a coup, necessarily. They think that the military would probably refuse to participate in overthrowing the legal order, even if they asked them to.

Inasmuch as at some point we are handing over our security forces — both the police, maybe, and the military — to AI systems, to basically be operationalising orders, we ideally would want to program them so that they would not accept orders to break the law, that they would not accept orders to participate in a coup. Getting to the point where we could at least have confidence that security forces would not do that as soon as possible would seem very beneficial.

Carl Shulman: Yeah, this is a much more demanding version of the requirements of, say, Anthropic’s constitutional AI for their chatbot, and these issues of, does it lie to customers or use offensive language? But the problems of evaluating these sorts of principles, figuring out with AI assistance all the kinds of cases that might come up where there’s a constitutional crisis and security forces are forced to decide who’s right.

So it’s been an unfortunately common circumstance in Latin America in some of these presidential systems where you have the president on one side, the congress on the other, the supreme court on one side or the other and divided, and then the military winds up picking a side and that’s what happens.

If you’re putting AIs in a position where either they’re being directly applied as police and military, or they just have the industrial and technical capability where they could potentially enforce their will or take over, then that’s a case where you want to have intense joint auditing and exploration of the effects of different kinds of AI principles and governing motivations — and then jointly, hopefully with a large supermajority, approve of what the motivations of those systems are going to be.

Rob Wiblin: Do you think people appreciate currently — as we integrate AI into the military and other security services, and kind of hand over the capability to do violence to AI — how important it is going to be, how critically important it might be, what rules we impose upon them, and whether we believe that those rules are going to be followed? I’ve heard a bit of discussion about this, but it maybe seems like it’s quite essential, and maybe a bit of an underrated alignment issue.

Carl Shulman: Well, I think that the reason why it’s not much discussed is it’s not particularly applicable to current systems. So existing AI can incrementally increase the effectiveness of war fighters in various ways, but you won’t have automated tanks, planes, robots doing their infrastructure and maintenance, et cetera. And indeed, there are campaigns to delay the point at which that happens, and there are statements about retaining human control, and I see that case.

But also, in a world where there are thousands or millions of robots per human, to have a military and security forces that don’t depend on AI is pretty close to just disarmament and banning war. And I hope we do ban war and have general disarmament, but it could be quite difficult to avoid. And in avoiding it, just like the problem of banning nuclear weapons, if you’re going to restrict it, you have to set up a system such that any attempt to break that arrangement is itself stopped.

So I think we do have to think about how we would address the problem when security forces are largely automated, and therefore the protection of constitutional principles like democracy is really dependent on the loyalties of those machines.

Rob Wiblin: Right. Yeah. I mean, currently it does seem right to say that we want our autonomous weapons to follow human instructions, and not to be going off and freelancing and making their own calls about what to do. But at some point, once most of the military power basically is just AI making decisions, having it saying that the way we’re going to keep it safe is that it will always follow human instructions, well, if all of the equipment is following the instructions of the same general, then that’s an extremely unstable situation. And in fact, you need to say no, we need them to follow principles that are not merely following instructions; we need them to reject instructions when those instructions are bad.

Carl Shulman: Indeed. And human soldiers are obligated to reject illegal orders, although it can be harder to implement in practice sometimes than to specify that as a goal. And yes, to the extent that you automate all of these key functions, including the function of safeguarding a democratic constitution, then you need to incorporate that same capacity to reject illegal orders, and even to prevent an illegal attempt to interfere with the processes by which you reject illegal orders. It’s no good if the AIs will refuse an order to, say, overthrow democracy or kill the population, but they will not defend themselves from just being reprogrammed by an illegal attempt.

So that poses deep challenges and is the reason why you want, A, problems of AI alignment and honest AI advice to be solved, and secondly, to have institutional procedures whereby the motives being put into those AIs reflect a broad, pluralistic set of values and all the different interests and factions that need to be represented.

Rob Wiblin: Nice. Yeah.

Where AI could most easily help [01:00:25]

Rob Wiblin: It sounded earlier like you were saying that it’s possible we might get more juice out of AI in the areas where we’re currently struggling the most. So people generally say we’ve made more progress, we have a greater grip on things in chemistry than we do in philosophy. Do you actually think it might be the case that these superhuman advisors might be able to help us to make more progress in philosophy than they can in chemistry? Perhaps because we’re already doing all right in chemistry, so we’ve already made a reasonable amount of progress, and it’s actually the areas where we’re floundering where they can best save us?

Carl Shulman: Yeah, I think we should separate two things. One is how much absolute progress in knowledge can we generate? And there’s some sense in which in the physical sciences we’re really great at getting definitive knowledge, and adding in a tonne of research capacity from AI will make that quite a bit better.

There’s then the question of, in relative change, how drastically different do things look when you add these AI advantages? And so it could be that when we bring in AI to many of these sort of questions of subjective judgement, they’re still not ultra accurate maybe in absolute terms, but it’s revolutionary in terms of the qualitative difference of answers you’re getting out. It could be the case that on many sort of controversial, highly politicised factual disputes, that you get a pretty solid, univocal answer from AIs trained and scaffolded in such a way as to do reliable truth tracking, and then that makes for quite drastic differences in public policymaking around the things, rather than having basically highly distorted views from every which direction because of basically agenda-based belief formation or belief propagation.

In the hard sciences, eventually you get to results of do technologies work or not? They’re relatively unambiguous, and you can have lots of corruption beforehand with p-hacking or fraudulent experiments, or misallocation of research funds between higher and lower promise areas, but eventually you get hard technological results that are pretty solid just from improving the amount and quality of your data.

There are other questions where, even after you’ve got all of the data, it’s not just overwhelmingly perfectly pinned down in a way that no one could possibly be confused about, and so on questions where the best assessment is still going to be probabilistic or still be a mix of considerations, then it makes an enormous difference whether you can get an unbiased estimator of those, because then you can act on it. If it’s the case that on these questions where maybe the most reasonable view is to think it’s 70% probability that X, but then someone with an agenda that X is convenient to may shift many aspects of how they think and talk and reason about the question and their epistemic environment to act as though it’s 100%, while others pull it towards acting as though it were 0%.

And so in this class of problems that are relatively challenging, that are not ultimately going to be perfectly pinned down by data in a completely obvious way, then the ability to know you’re not going to have this corrupt, “Must I believe X? Can I believe Y?,” then yeah, you could benefit across all of those domains — and collectively, those are responsible for an enormous amount of import for human life.

AI forecasting [01:04:30]

Rob Wiblin: OK, let’s come back and talk about accurate AI forecasting in some more detail. It has been bubbling under the surface the whole time, but I’d like to get out a few more details about how it would actually operate.

How would you train an AI that was able to do a superhuman job of predicting the future?

Carl Shulman: We’ve already had a few early papers, using earlier LLMs, trying to do this task. They’re not very good at it yet, and especially not with the earlier models. But they just have prediction of text. So a model that was, say, trained in 2021, if it hasn’t been further updated since then, you can then just ask it, “What happened in 2022? What happened in 2023?” And just taking the model straight, you get some predictions. You can then augment it by having it do chain of thought. If you set up an LLM agent, you can have it do more elaborate internet research, experiments, write some code to model some issues, use tools. But reinforcement learning over an objective of, “Are the forecasts you get out of this procedure accurate or not?”

And if you want to get lots of independent data, then there are some issues about data limitations. So you can make predictions about what is going to happen to these one million different workers next year. Knowing everything you know about their lives and what happens to each of them will be significantly independent, not perfectly, of what happens to the others. So then you can get a lot of separate training signals.

But other things are more confounded. So we have only one history. And so if you’re in the year 2002 and you’re projecting economic questions, if you start doing gradient descent on predictions about whether there’s a recession in 2008, then that’s going to affect your answers for what were the unemployment rates, what were outcomes on people’s health? Because by training on one of those data points, it’s propagating information into the model from the held-out set. So you can use this sort of, take a model trained on old data and trained on procedures to be good at reasoning and working with that data, and then validate that it works for sort of macro-scale forecasting. But in order to train up its intelligence, you have to manage these issues of not leaking all of the information from your held-out test sets into the model’s weights.

Rob Wiblin: OK, so we’re pretty likely to have AI models that are much better than we are at forecasting the future in a bunch of different domains. That seems pretty likely by one method or another. What sort of implications do you think this would have? What social effects would it have? Especially given that everyone would be able to see that model X has an amazing track record at forecasting, and so they would be persuaded and convinced to trust its predictions.

Carl Shulman: I think that the ability of different parties to trust that the AI really is honest is a potentially absolutely critical linchpin, which would require not only that the technology be sophisticated, but that parties be able to themselves, or have people that they trust, verify that that’s really what it’s doing.

But given all of that, yeah, it seems like it could result in vast systematic improvements in policy and governance at the level of the political system, and also in just the implementation of local activities in business, in science and whatnot. Probably the places where it really seems the juiciest, perhaps, are in politics and policy, which is an area that most people engaging within it put relatively low effort in. And when they do put in that effort, it’s often driven by other drives, like supporting a team in a way like sports teams, or conveying that you’re a certain kind of person to those around you.

And yeah, there’s a lot of just empirical data that on questions with verifiable factual answers, it’s much harder for people to find a truth when it runs up against some political appeal or political advantage, where there’s an ecosystem that wants that not to be believed. And this is something that varies in the place and the extent to which these kinds of dynamics are warping any particular actor, but the dynamic in some form or another is ubiquitous. And when we look at policy failures in the world, I think it’s actually quite systematic that you can trace out ways in which they get drastically improved if, at every step along the policy process, you had unbiased best estimates of how the world works, and we could poke at random examples to test this thesis.

Rob Wiblin: I guess one sceptical intuition I have is that we already could be more rationalist about this. We already could, if we were so motivated, try to get more accurate predictions about whether our sports team will win or whether our policy idea really is good. But as you’re saying, often we are not so motivated to get absolutely objective answers to these questions, because it would be unpleasant, or it’d be bad for our coalition, or it would damage our relationships.

But do you think if there were models that could just do this very cheaply for everyone all the time, and anyone else could figure out the answer, even if you didn’t want to, it would kind of force our hand, and we would no longer be able to stick our head into the sand on these topics? Like the truth would out itself in a way that it presently doesn’t?

Carl Shulman: I’d say that description is too binary. I’d say that our societies, in many cases, have already made enormous progress in our epistemological capabilities. So the development of science, I think, is just the paradigmatic example here. So many of these same dynamics of clique formation, you have the scholars who declare that some sacred authority can never be questioned, and then they wind up in mutually supporting “I scratch your back if you scratch mine” kind of setups, are just too closely tied to some dogmatic function, and so never exposing themselves to experimental feedback from the world.

And the initial development of the scientific process was something that was apparently very difficult. It did not happen in a lot of other places that it might have happened, although I expect if you prevent the scientific revolution that happened in our actual history, eventually it would happen in other places, or the same place through different routes. But yeah, those methods were able to prove themselves in significant ways. So initially it was done by people who were sort of convinced at the theoretical level that this was better. But as it generated more practical industrial discoveries, answers that were checked and solid against other dimensions of knowledge acquisition, it became very popular, very powerful, very influential.

I mean, it’s still the case that there are tensions, say, between palaeontology and archaeology, and, say, creationist accounts of the history of life on Earth. But on the whole, there have been quite drastic shifts in the beliefs of the general public — and certainly to a much greater extent at the level of sort of elite institution-making and systematic policy. Quite drastic changes, even where there were strong motives or factions who didn’t want to know certain things: still, the visible power of the mechanism that people could see, and then the systematic value of people and institutions that had these truth-tracking properties, let them grow and shift society in those directions.

And I say there are some similar effects with free and competitive press. More so the more that there are norms or dynamics that wind out, weakening the credibility of those who, say, directly falsify information. And in many ways those are quite impressive.

Scott Alexander has a post “The media very rarely lies,” that the sort of journalistic norms of, “Don’t entirely make up your source from whole cloth; don’t just completely falsify the words that someone said,” these sorts of norms are actually quite widely respected, and even many of seemingly the most misinformation-prone institutions and propaganda, there’s a tendency not to violate those kind of norms, because they’re just too easy to check and violate. Some people who really want to believe will go along with just completely making things up, but many won’t. And so on the whole, it’s not a super winning strategy. And these institutions that have the appearance of being truth tracking, that’s a significant advantage in many contexts from people who want to know the answer for one reason or another, or who want to see themselves as not just being foolish and self-deluded.

So now the limit to that is that many questions aren’t as easy as, “Did you just make up your source outright?” And the ways in which flawed journalism often can lead people to have false beliefs is by stitching together a set of true statements that predictably have the effect of causing people to believe some connotation or vague suggestion or implication of that set of statements, without any one of them being unambiguously wrong. So people trying to stimulate hatred of some other group often do this by selectively reporting one incident after another where members of group X did something that seems objectionable, and then by super highlighting this and having it be very available to the audience, it’s possible for those who have some political or financial interest in stoking hatred to often do so, even when the misbehaviour that is being highlighted and amplified, or even like the suggestions or accusations has been amplified, aren’t any more frequent in the group that’s being demonised.

So if we had an analogue to the mechanisms that stop the completely fraudulent sources in most cases, or sufficiently to heavily discipline them, if you could do the same thing with some of these questions that are more right now we would say a matter of judgement of, you made this set of statements statistically in the distribution of human audiences, it’s going to alter those audiences’ beliefs about questions A, B, C, D, E, and F. And many of those questions do have unambiguous answers. And so you can ask, does this newspaper article cause people to have false beliefs about things and then true beliefs? And which ones? And then what weightings can you put on them? And what weightings might other people put on them?

And if the way you engage with that is like some human looks at it and kind of eyeballs it with their own bias process, then you can wind up with maybe a war of factional fact checkers who each try and spin each thing in whatever way is most convenient to them — which is better than no debate at all, because it provides some discipline, but not as much as if you had something that was really reliable.

So say if you had an AI that has been trained specifically for predictive accuracy, or predictive accuracy about what deeper dives on things will do, and then you’ve been able to further verify that with interpretability techniques that let you actually examine the concepts within the neural network and find out how it conceptualises and thinks about things that it actually believes are true, because they are useful for predicting what will happen in the world, as opposed to common human motivations of, how do I support my tribe or this political view or whatnot.

And so to the extent that, say, left and right: if there are computer scientists of left and right, who can each take a pretrained model or train their own model on predictive accuracy, use these interpretability techniques to find the concept of what’s actually happening in the model, and then get an answer output from that, then you can take what previously was, there are 100 little decisions here, and doing each one in a biassed way can make the answer radically off. And the explanation of that will be long or in parallel, so that a casual voter is not going to look at it because is it too long, too complicated. They’re not going to bother. And the same, probably, for politicians who don’t have much time to supervise things.

But now, with each party having access to this same thing, they have a Schelling point. Because when people around the world, from every political party, every country, every religion, if they pretrain for predictive accuracy on stuff in the world and then use these same interpretability techniques, their AIs will all give the same answer in the same way that you find there are many different religious creation stories, many of which are incompatible with one another. But scientists from all different religions wind up concluding that the dinosaurs existed X million years ago; there’s plate tectonics that shaped geology in these ways — and the fact that it works like that is a powerful and credible signal to someone who’s not paying that much attention, not that much understanding this fact of like, these sort of truth-seeking setups, they give the same result from all kinds of people all over the world who say they’re doing it.

Now, there will be other people who manufacture systems to deceive, and claim that they’re not doing that. And someone who’s a bystander and does not have the technical chops or the institutional capacity to verify things themselves may still find themselves epistemically helpless about, is this really what it says? But still, some institutions and organisations and whatnot are potentially able to do that, follow the same procedures, get to the same truths themselves, and that would move just endless categories of things to objectivity. And so you have a newspaper and it talks about event X, and so they can say, “Interested party A claims X; interested party B with different interests, claims not X.” But then if they also say “Truth-tracking AI says X,” then that’s the kind of norm that can be like, don’t make up your sources, don’t put citations that don’t exist in your bibliography.

And then furthermore, it just reduces the expense. So it makes it possible for a watchdog organisation to just check a billion claims with this sort of procedure. And there are small amounts of resources available today for journalistic watchdog things, for auditing, for Tetlockian forecasting. And then when the effectiveness of those things — you’ve got an incredibly large amount of the product for less money — then, A, people may spend more; there’s tremendously greater wealth and resources, so more of the activity happens. And running on all the most important cases, having the equivalent of sort of shoe leather local investigative journalism in a world where there’s effectively trillions of super sophisticated AI minds who could act as journalists, is enough to kind of police all of the issues that relate to each of the 10 billion humans, for example.

And so every government decision, every major one at least, is potentially subject to that kind of detailed review and analysis. It’s a different epistemic environment, because you could very unambiguously say, is it true or not that honest AI systems give this result? And then if you lie about what such systems make of data, then many others can just show it directly, like having an arithmetic error in your article.

Rob Wiblin: I see. Yeah, I think there’s actually quite a lot of areas where I’m not sure people have appreciated how much cheaper it’s going to become to do things. And so stuff that was extremely laborious before is now going to be possible. One that jumps to mind with the forecasting is Tetlock, I think, has always wanted to go back over pundits’ predictions historically — like vast numbers of them, thousands or tens of thousands of them — by mining newspapers and I guess transcripts of television stuff, in order to see how accurate they are, which I think has been too laborious of people to do it for that many different pundits. But in this new world, indeed quite soon, that might become relatively straightforward to grab tonnes of them and then score them on their accuracy.

Application to the most challenging topics [01:24:03]

Rob Wiblin: OK. I’m pretty convinced that that sounds pretty good. I can more easily see how you could train these models to be credible and reliable on these kind of more empirical questions, like which committee is going to hold up the vaccine approval? And what would be your approval rating if you had a vaccine rolled out earlier versus later?

It sounded earlier, though, as if you thought it wouldn’t only help with those kinds of more concrete empirical questions, but also potentially help us with even the most abstract stuff, like questions in philosophy, like what is time? What is the good? And I suppose there, it’s a little bit harder to see what the training feedback mechanism is. Mostly you might have to be doing it by analogy to other things, or just becoming smart in areas where you can get feedback and then hoping that transfers into philosophy. Talk to us about how these models could help to form more agreement in the most abstract and less empirical areas.

Carl Shulman: Yeah. So with things like journalism that creates misleading impressions through a set of truth, you can turn that into verifiable empirical questions. You can have a set of things where the answer is unambiguous: like, what do the government statistics on wheat production in Iowa say? And so you can go from that to notice that yes, this article is structuring things in such a way that it creates false beliefs on these questions where we know the answer. And that sort of move actually has a lot of potential to be scaled up and to turn these things where we don’t have a direct ground truth into something where we can indirectly discipline it using other sorts of ground truth.

An example of how we might go from there: Anthropic, the AI company, has developed this method called constitutional AI. It’s a way of training and directing an AI to follow certain rules. What they do is they have actually natural language descriptions — like, “The AI should not lie between two responses,” “The AI should choose the less offensive response or the less power-seeking response” — and then humans can evaluate whether those things are being followed in a particular case. And you can have an AI model that automates what the humans would have done to evaluate, “Are you following this particular heuristic in this particular case?” or “Does this heuristic more support choice A or choice B?”

So, for any situation where we can describe a rule or way of doing things, if we’ve solved these other alignment issues, then we can generate reasoning and ways of answering questions — including difficult abstract questions — that follow those rules. And we can further develop a science of which kind of epistemic rules are effective. So we can take principles like modus ponens. Consider if the situation were the other way around, where the interested parties were flipped in their political valence. We can consider, when dealing with statistical evidence, pre-register your hypotheses in advance. Or if you can’t do that, use a p-curve analysis where you consider all the different analytic choices that could be made in analysing a given piece of data, and then see what is across the distribution of all of the options, possibly weighted by other factors: what’s the distribution of answers you could get about the empirical question?

And some of these mechanisms can be extremely powerful. So preregistering hypotheses before you do an experiment or study, is something that people are very big on in the world of reproducibility and open science. And I think you may have explored this before on the show, but in fancy scientific journals, you may have on the order of half of the studies replicating the results that were originally claimed. And very often that’s because of publication biases, reporting biases, taking the data that was received and presenting it in a particular way after the fact, painting a target around the bullet holes in the side of a barn.

But there have been, since then, the creation of this mechanism, Registered Reports — where a journal accepts or rejects a study based on the preregistration, and commits to publishing it if it comes through. And these have been around long enough now that people have been able to look back and see how are they doing, what’s the performance of these studies? And it seemed they actually replicate basically in line with what it says on the tin.

Rob Wiblin: Oh, wow.

Carl Shulman: Yeah. Which is pretty striking. Now, over time, that could change. If Registered Reports became universal, they could largely defeat p-hacking after the fact. But you could see things like, well, maybe people would just go to outright fraudulent results, just make up the data. And right now, I think that’s much less common, more than an order of magnitude less common than p-hacking and reporting biases. But yeah, instituting that one change makes a pretty drastic difference.

There are other fields that just raised their threshold for statistical significance from having to support a hypothesis, having to support it with a 1-in-20 chance that your null model would produce the result to 1-in-20 million. And when switching from a 1-in-20 to 1-in-20 million p-value, yeah, actually afterward, it does pretty well. Or like requiring multiple hypotheses corrections for all of the possible hypotheses of the same form that you could consider. And these methods, they have costs, there are limitations, but they can be very powerful.

And so if you simulate, you can do scientific work that assesses how all of these rules affect the performance of your AIs on questions where the answers are knowable, or you can see how well they do. Then you can pick a set of rules — and preferably as small and simple a set of rules as possible, so they can’t be gerrymandered: you don’t want to have a rule that is a petabyte of data that’s just giving all of the answers of what to do for each situation — but yeah, given that, you can make very robust epistemic rules, and you can make them more robust by then trying to train an AI to lie, see if it can follow all of these rules about good reasoning and good reporting, while also being incentivised to get through some kind of deception or error.

Actually, there was a paper published just the other day, or a preprint by some folks from Redwood Research, that did a similar mechanism in programming. They had GPT-4 try and generate solutions to coding problems that had backdoors in it, and then tested various procedures to try and get the right answers from this, while not allowing any backdoors to get through. They didn’t do it perfectly, but they were able to make some incremental progress. And if we think about the apotheosis of that line of research, it would be that we have developed epistemic systems and rules and enforcement mechanisms and audits that are so robust that even when we have a super intelligent AI that is trying to comply with all of these rules and deceive us, it just fails. Whereas if it’s trying to tell the truth, it can succeed.

And if you have that kind of system, that holds up even against intense negative pressure, then it can also hold up against, say, attempts by humans directing AIs to make propaganda to deceive people about various falsehoods, because it won’t be able to survive these rules that have been scientifically supported. And then we take AIs that follow all these rules of good reasoning that work everywhere, we can be sure, and then we look at their extension to more challenging topics.

And I think there’s pretty good reason to think that extending that set of reasoning onto the challenging topics will be better than the way that we do it naively, and probably quite dramatically so, but we have a chance to validate it. So we talked earlier about forecasting. With all of the data from 1900, could you forecast quantum mechanics? Could you forecast modern AI technology? And those are very challenging subjective questions. But if we make really robust reasoning mechanisms and ways of thinking about things, and forecasting and dealing with challenging questions without determinative data, we can just validate them and see if they were able to get all of these other challenging questions without having overwhelming evidence.

Rob Wiblin: Maybe to be a bit more concrete, could you give an example of a philosophical or moral controversy that you think might be resolved thanks to this AI epistemic revolution, and how do you think that might look in practice?

Carl Shulman: Well, this is a bit of a tricky question, because if I take any example of something that’s very controversial and divisive now that I expect AI might resolve, then that’s naturally going to offend some proportion of people. So I’m not sure how far I want to go down that route.

But looking backwards in time, we can see that these sorts of things have been important in the past. So things like the divine right of kings involve claims about whether there in fact was a divine mandate laid down on behalf of monarchs

You can see advantages that come from solving difficult problems about divine command theory, and why it is that, say, philosophers are generally not that into divine command theory as a theory of ethics. As far back as Plato there’s some of that.

And then huge advantages from applying those sorts of philosophical tools and pieces of reasoning, even when they’re not that complex, in a consistent way — rather than scholars supported by the pursestrings of monarchs coming up with apologetics on their behalf.

It’s important also that there are so reliably and robustly empirical claims that get bundled up with seemingly purely value or deontological claims. So in the expansion of rights and liberties to women, there were frankly absurd arguments of the form, “If women are allowed into X profession, that will be bad, because women don’t want to enter that profession” — which is absurd, because if that were really the case, what was the point of the bar? And there were invariably sets of false empirical claims about how the world worked, how society would be worse if reforms were adopted and enacted.

And so I think you can extrapolate against this sort of history that you would see a lot more change. And inevitably, based on history, that will lead to many oxen being gored, and no political or philosophical or religious system or ideology will come out unscathed.

Looking back, having much more powerful systems for understanding and assessing claims about the world, or about logic and reasoning — and then the ability to communicate and assess them — we’ve liked how society has adjusted its values in response to that. And so I think, on the whole, we should look forward to the incremental changes from really souping up and improving the reliability and honesty and robustness and sophistication of those processes.

Now, inevitably, oxen will be gored for any political or philosophical or religious ideology one can find. It’s extremely unlikely that total perfection has been attained for the first time by one particular ideological faction in our current era, but still a lot to be gained on the philosophical front.

How to make it happen [01:37:50]

Rob Wiblin: If I think about the AI applications that I’ve heard people working on in businesses, most of them don’t sound like this; they don’t sound like an application that’s focused on epistemic quality, exactly. What sort of business model might exist to pursue these sorts of epistemically focused AI applications?

Carl Shulman: I think you may be imagining these things as more distinct than they are in terms of the underlying capabilities and organisation. So if you have an AI that is supposed to provide assistance with programming or just to produce entire, say, novels or computer programs to spec, then the AI agents and systems of AI agents that you build out of that are going to have to be doing things like predicting “Will this program work when it is deployed in some use case? How often will it crash?”

And to the extent that you’re developing these sort of general problem-solving abilities — making AI agents that can do a long chain of thought, making use of tools in order to figure out answers — then a natural way to test those capabilities and be sure that they work is apply those to, say, predicting datasets and held-out data from the real world. Just the capability to correctly call things, figure out what will happen based on your actions at the small scale, is integral to tasks everywhere. So a very large portion of this is closely connected to just making the systems very smart, very capable at productive occupations in general.

A thing that is more distinct is this training and performance on longer-term forecasts, or forecasts where in some sense the answers have been spoiled in the training set, and you want to assess the capacity to get them right for the first time and not cheat by providing a memorised answer. So there you don’t get that on the very short timescale; you have to set things up differently. But the fundamental intelligence and ability to do organised reasoning to figure out an answer, those are things that are just core to making AI more powerful in general.

Rob Wiblin: That makes me wonder whether it seems like this sort of withholding part of the training dataset — like you train the AI on data up to 2021, and then get it to predict things that will happen in 2022 — is that just a way that you could make these models much smarter? That they’re being denied when they’re given the whole training set all at once, and they don’t have to do this sort of out of sample reasoning and generalisation?

Carl Shulman: Well, it’s certainly a way to measure abilities that are otherwise hard to distinguish. In terms of training, there are issues. People have the complaint about macroeconomics that it’s not very good at forecasting recessions and inflation and whatnot. One of the reasons for that is the datasets that they’re working from are very small. There’s just only so much history; the global economy is integrated through trade and whatnot. So you can generate millions of Go games, learn from them locally and individually, and produce something incredible.

And now, if you had amazing reasoners with no political bias, who were super intelligent, worked out all of the mathematics, used clever data that people hadn’t considered applies to macroeconomics, I’m sure you would do better than our existing macroeconomic forecasters by a lot — but you’d still not do nearly as well or learn nearly as much as if you were able to look at billions of different planets with different economies and different circumstances. So when we talk about learning from these long-term macro-scale things — things like predicting quantum mechanics before seeing any data about it — there’s a problem that you still only have so many data points, because a lot of the world is correlated.

So say we start fine-tuning on predictions, and then we score your predictions about quantum mechanics, and then the AI is adjusted so as to give correct answers on those predictions. Then you’re faced with a question about, are there nuclear weapons? And when the AI was fine-tuned on the questions about quantum mechanics, that is going to be shifting its beliefs and answers, so that it has, in some sense, been spoiled on the answers to the nuclear weapons questions.

There are other aspects of the world that are independent. If you ask about what happened to the romantic lives of 100 million people, say: you get datasets from social media or something, the individualistic factors that are uncorrelated between people; you could get very large datasets for that, so you could show AI that is great at long-term forecasting with respect to these uncorrelated things between people.

But with respect to the big correlated ones, you’re not going to be developing your basic capabilities that way; you’re not going to be able to generate trillions of data points like that. So you could have a very interesting effect of fine-tuning, where you fine-tune on an enormous set of these long-term predictions, but they’re effectively only so many independent data points on stuff like the macroeconomy. And maybe fine-tuning really applies the trained intelligence of the AI very well to that task, but you’re not going to be able to develop the basic capabilities in the same fashion.

International negotiations and coordination and auditing [01:43:54]

Rob Wiblin: Yeah. So if we could produce these truthful, very reliable AI assistants to help us with really tricky issues like international negotiations and building trust and collaboration between countries… So it’d be very helpful if the US could design one of those for itself whose answers it trusted, because they had been demonstrated to be reliable. It would be even more useful if you could get the US and China to both agree that the same model was consistently reliable and that they could both trust its answers and that it hadn’t been backdoored in some way that would cause it to give answers that allowed one nation to get an advantage.

How can you get agreement between different parties, I guess especially parties that are somewhat hostile to one another, that a given particular model can be trusted to advise both of them, and to give them sound answers about how they could achieve more, how they could both get more out of the world?

Carl Shulman: For the case of a technologically mature society where all of this is well established technology, that problem seems relatively easy, in that you can have multiple parties, using the textbook knowledge of AI science, that can train things up themselves and get their own local advice. And that’s all very handy.

That’s not very helpful for the situations early on, where, say, one nation-state or company has a significant lead in AI technology over others. Because say that you’re behind in AI, and you’re considering negotiating a deal that will apply safety standards to the further refinement and development of this AI, and ensure sharing of the proceeds from that. Your ability to independently assess the thing is limited, and it might be that the leader is reluctant to hand over the trained software and weights of the AI, because that’s handing over the very thing that’s providing a lot of their negotiating leverage.

So those, I think, are the really hard cases. As well as early on, just most voters, even if you gave them the source code and hundreds of gigabytes of neural network weights, they’re not really going to be able to make heads or tails of it themselves. That’s a problem even on issues where right now there’s a very strong scientific consensus for sciences of all political persuasions. That doesn’t necessarily mean that the general public can, A, detect that consensus, and secondly, detect that it is reliable. So it could be young Earth creationism or COVID vaccines, whatever. To deal with those problems, I think you’re going to need to probably really invest in a combination of human and technical infrastructure.

Rob Wiblin: What does that infrastructure potentially look like?

Carl Shulman: On the human side, that means it’s quite important that there be representation of people that different factions trust in in the creation or training or auditing of these models. For example, Elon Musk, with his Grok AI: the claim is that that is going to be more honest AI and have different political biases than other chatbots. Unclear to what extent that has happened or will happen. But since Musk has greater street cred among Republicans these days than some other technology executives and companies, that might be a situation where it makes a big difference whether conservative or Republican legislators or voters in the United States have an AI model that they can to a greater extent trust was not made by their political opponents, or part of a political effort to have the model systematically deceive them or propagandise on behalf of political ideologies they’re not affiliated with.

And then all the better then, if Grok tells conservatives things that they did not expect to hear but that are true — and likewise, ChatGPT tells progressives things that they were reluctant to understand — then you can have, hopefully, convergence of a politically divided society on the cases where each side is correct, or the ones where neither are correct and the truth comes out with the assistance of AIs.

Rob Wiblin: So we can imagine a world in which different actors are training these extremely useful models that help them to understand the world better and make better decisions. We could imagine that the US State Department, for example, has a very good model that helps it figure out how it can coordinate better with other countries on AI regulation, among other things. I think it would be even nicer if both the US State Department and the Chinese government agreed that the same model was trustworthy and very insightful, and that both of them would believe the things that it said, especially regarding their interactions and their agreements.

But how could two different parties that are somewhat adversarial towards one another both come to trust that at least that the same model is reasonably trustworthy for both of them, and isn’t going to screw over one party because it’s kind of been backdoored by the people who made it? How can you get agreement and trust between adversaries about which models you can believe?

Carl Shulman: First of all, right now this is a difficult problem — and you can see that with respect to large software products. So if Windows has backdoors, say, to enable the CIA to route machines running it, Russia or China cannot just purchase off-the-shelf software and have their cybersecurity agencies go through it and find every single zero-day exploit and bug. That’s just quite beyond their capabilities. They can look, and if they find even one, then say, “Now we’re no longer going to trust commercial software that is coming from country X,” they can do that, but they can’t reliably find every single exploit that exists within a large piece of software.

And there’s some evidence that may be true with these AIs. For one thing, there will be software programs running the neural network and providing the scaffolding for AI agents or networks of AI agents and their tools, which can have backdoors in the ordinary way. There are issues with adversarial examples, data poisoning and passwords. So a model can be trained to behave normally, classify images accurately, or produce text normally under most circumstances, but then in response to some special stimulus that would never be produced spontaneously, it will then behave in some quite different way, such as turning against a user who had purchased a copy of it or had been given some access.

So that’s a problem. And developing technical methods that either are able to locate that kind of data poisoning or conditional disposition, or are able to somehow moot it — for example, by making it so that if there are any of these habits or dispositions, they will wind up unable to actually control the behaviour of the AI, and you give it some additional training that restricts how it would react to such impulses. Maybe you have some majority voting system. You could imagine any number of techniques, but right now, I think technically you have a very difficult time being sure that an AI provided by some other company or some other country genuinely had the loyalties that were being claimed — and especially that it wouldn’t, in response to some special code or stimulus, suddenly switch its behaviour or switch its loyalties.

So that is an area where I would very much encourage technical research. Governments that want to have the ability to manage that sort of thing, which they have very strong reasons to do, should want to invest in it. Because if government contractors are producing AIs that are going to be a foundation not just of the public epistemology and political things, but also of industry, security, and military applications, the US military should be pretty wary of a situation where, for all they know, one of their contractors supplying AI systems can give a certain code word, and the US military no longer works for the US military. It works for Google or Microsoft or whatnot. That’s just a situation that just —

Rob Wiblin: Not very appealing.

Carl Shulman: Not very appealing. It’s not one that would arise for a Boeing. Even if there were a sort of sabotage or backdoor placed in some systems, the potential rewards or uses of that would be less. But if you’re deploying these powerful AI systems at scale, they’re having an enormous amount of influence and power in society — eventually to the point where ultimately the instruments of state hinge on their loyalties — then you really don’t want to have this kind of backdoor or password, because it could actually overthrow the government, potentially. So this is a capability that governments should very much want, almost regardless, and this is a particular application where they should really want it.

But it also would be important for being sure that AI systems deployed at scale by a big government, A, will not betray that government on behalf of the companies that produce them; will not betray the constitutional or legal order of that state on behalf of, say, the executive officials who are nominally in charge of those: you don’t want to have AI enabling a coup that overthrows democracy on behalf of a president against a congress. Or, if you have AI that is developed under international auspices, so it’s supposed to reflect some agreement between multiple states that are all contributing to the endeavour or have joined in the treaty arrangement, you want to be sure that AIs will respect the terms of behaviour that were specified by the multinational agreement and not betray the larger project on behalf of any member state or participating organisation.

So this is a technology that we really should want systematically, just because empowering AIs this much, we want to be able to know their loyalties, and not have it be dependent on no one having inserted an effective backdoor anywhere along a chain of production.

Rob Wiblin: Yeah. I guess if both you and the other party were both able to inspect all of the data that went into training a model, and all of the reinforcement that went into generating its weights and its behaviours, it seems like that would put you in a better position for both sides to be able to trust it — because they could inspect all of that data and see if there’s anything sketchy in it, and then they could potentially train the model themselves from scratch using that data and confirm that, yes, if you use this data, then you get these weights out of it. It’s a bit like how multiple parties could look at the source code of a program, and then they could compile it and confirm that they get the same thing out of it at the other end.

I suppose the trickier situation is one in which the two parties are not willing to hand over the data completely and allow the other party to train the model from scratch, using that data, to confirm that it matches. But in fact, that would be the situation in many of the most important cases that we’re concerned about.

Carl Shulman: I think you’re being a bit too optimistic about that now. People have inserted vulnerabilities intentionally into open source projects, so exchanging the source code is not enough on its own. And even a history of every single commit and every single team meeting of programmers producing the thing isn’t necessarily enough. But it certainly helps. The more data you have explaining how a final product came to be, the more places there are for there to be some slipup, something that reveals shenanigans with the process. And that actually does point to a way in which even an untrusted model — where you’re not convinced of its loyalties or whether it has a backdoor password — can provide significant epistemic help in these kinds of adversarial situations.

The idea here is it can be easier to trace out that some chain of logic or some argument or demonstration is correct than it is to find it yourself. So say you have one nation-state whose AI models are somewhat behind another’s. It may be that the more advanced AI models can produce arguments and evidence in response to questions and cross-examination by the weaker AI models, such that they have to reveal the truth despite their greater abilities.

So earlier we talked about adversarial testing, and how you could see, can you develop a set of rules where it’s easy to assess whether those rules are being followed? And while complying with those rules, even a stronger model that is incentivised to lie is unable to get a lie past a weaker judge, like a weaker AI model or a human. So it may be that by following rules analogous to preregistering your hypotheses, having your experiments all be under video cameras, following rules of consistency, passing cross-examination of various kinds, that the weaker parties’ models are able to do much more with access to an untrusted, even more capable model than they can do on their own.

And that might not give you the full benefit that you would realise if both sides had models they fully trust with no suspicion of backdoors, but it could help to bridge some of that gap, and it might bridge the gap on some critical questions.

Opportunities for listeners [02:00:09]

Rob Wiblin: Are there any profitable or scalable businesses that people might be able to start building around this vision today, that would then put them in a good position to be able to jump on opportunities to create these epistemically focused AIs for good in future?

Carl Shulman: The largest and most obvious one, again, is just the core application of these technologies. Large AI companies want to eliminate hallucinations and errors that reduce the economic functionality of their systems, so that’s the immediate frontier of the biggest short-term changes: making the models more capable; creating ways for them to check their sources; creating ways to have them do calculations and verify them, rather than hallucinating erroneous answers to math questions.

But creating AIs to forecast economic and political events is something that obviously has huge economic value, by providing signals for financial trading. There is huge social value potentially to be provided by predicting the political consequences and economic consequences of different policies. So when we talked earlier about the application to COVID, if politicians were continuously getting smart feedback about how this will affect the public’s happiness two years later, four years later, six years later, and their political response to the politician, that could really shift discourse.

But it’s not the kind of thing that’s likely to result in an enormous amount of financing, unless you might have some government programme to fight misinformation that attempts to create models, or fine-tune open source models, or contract large AI companies to produce AI that appears trustworthy on all of the easy examinations and probes and tests one can make for bias. And it might be that different political actors in government could demand that sort of thing as a criterion for AI being deployed in government, and that could be potentially significant.

Rob Wiblin: Yeah. Are there any other opportunities for listeners potentially to cause this epistemic revolution to happen sooner or better that are worth shouting out?

Carl Shulman: Yeah. Some small academic research effort or the like is going to have difficulty comparing to the resources that these giant AI companies can mobilise. But one enormous advantage they have is independence. So watchdog agencies or organisations that systematically probe the major corporate AI models for honesty, dishonesty, bias of various kinds — and attempt also to fine-tune and scaffold those models to do better on metrics of honesty of various kinds — those could be really helpful, and provide incentives for these large companies to produce models that both do very well on any probe of honesty that one can muster from the outside, and secondly, do so in a way that is relatively robust or transparent to these outside auditors.

But right now this is something that is, I think, not being evaluated in a good systematic way, and there’s a lot of room for developing metrics.

Why Carl doesn’t support enforced pauses on AI research [02:03:58]

Rob Wiblin: OK, we’re just about ready to wrap up this pretty extensive set of interviews. There was one thing I really wanted you to talk about that I couldn’t find a very neat place to fit in anywhere else, so I’m just going to throw it in here at the end.

I read this excellent bit of commentary from you; it was on a post discussing the possibility of trying to impose mandatory pauses on AI research or deployment in order to buy ourselves more time, in order to figure out all of the kinds of problems that we’ve been talking about the last few hours. But you’re not a huge fan of that approach. You think that it’s suboptimal in various ways. Can you explain why that is?

Carl Shulman: The big question that one needs to answer is what happens during the pause. I think this is one of the major reasons why there was a much more limited set of people ready to sign and support the open letter calling for a six-month pause in AI development, and suggesting that governments figure out their regulatory plans with respect to AI during that period. Many people who did not sign that letter then went on to sign the later letter noting that AI posed a risk of human extinction and should be considered alongside threats of nuclear weapons and pandemics. I think I would be in the group that was supportive of the second letter, but not the first.

Rob Wiblin: And why is that?

Carl Shulman: I’d say that for me, the key reason is that when you ask, when does a pause add the most value? When do you get the greatest improvements in safety or ability to regulate AI, or ability to avoid disastrous geopolitical effects of AI? Those make a bigger difference the more powerful the AI is, and they especially make a bigger difference the more rapid change in progress in AI becomes.

And as we discussed earlier, and as I discussed on the Dwarkesh Podcast, I think the pace of technological, industrial, and economic change is going to intensify enormously as AI becomes capable of automating the processes of further improving AI and developing other technologies. And that’s also the point where AI is getting powerful enough that, say, threats of AI takeover or threats of AI undermining nuclear deterrence come into play. So it could make an enormous difference whether you have two years rather than two months, or six months rather than two months, to do certain tasks in safely aligning AI — because that is a period when AI might hack the servers it’s operating on, undermine all of your safety provisions, et cetera. It can make a huge difference, and the political momentum to take measures would be much greater in the face of clear evidence that AI had reached such spectacular capabilities.

To the extent you have a willingness to do a pause, it’s going to be much more impactful later on. And even worse, it’s possible that a pause, especially a voluntary pause, then is disproportionately giving up the opportunity to do pauses at that later stage when things are more important. So if we have a situation where, say, the companies with the greatest concern about misuse of AI or the risk of extinction from AI — and indeed the CEOs of several of these leading AI labs signed the extinction risk letter, while not the pause letter — if those companies, only the signatories of the extinction letter do a pause, then the companies with the least concern about these downsides gain in relative influence, relative standing.

And likewise in the international situation. So right now, the United States and its allies are the leaders in semiconductor technology and the production of chips. The United States has been restricting semiconductor exports to some states where it’s concerned about their military use. And a unilateral pause is shifting relative influence and control over these sorts of things to those states that don’t participate — especially if, as in the pause letter, it was restricted to training large models rather than building up semiconductor industries, building up large server farms and similar.

So it seems this would be reducing the slack and intensifying the degree to which international competition might otherwise be close, which might make it more likely that things like safety get compromised a lot.

Because the best situation might be an international deal that can regulate the pace of progress during that otherwise incredible rocket ship of technological change and potential disaster that would happen near when AI was fully automating AI research.

Second best might be you have an AI race, but it’s relatively coordinated — it’s at least at the level of large international blocs — and where that race is not very close. So the leader can afford to take six months rather than two months, or 12 months or more to not cut corners with respect to safety or the risk of a coup that overthrows their governmental system or similar. That would be better.

And then the worst might be a very close race between companies, a corporate free-for-all.

So along those lines, it doesn’t seem obvious that that is a direction that increases the ability for later explosive AI progress to be controlled or managed safely, or even to be particularly great for setting up international deals to control and regulate AI.

Now, I might have a different view if we were talking about a binding international agreement that all the great powers were behind. That seems much more suitable. And I’m enthusiastic about measures like the recent US executive order, which requires reporting of information about the training of new powerful models to the government, and provides the opportunity to see what’s happening and then intervene with regulation as evidence of more imminent dangers appear. Those seem like things that are not giving up the pace of AI progress in a significant way, or compromising the ability to do things later, including a later pause.

Rob Wiblin: Yeah. The way the argument that you had in that comment stuck in my mind was as a much simpler argument that I might try to represent here, because it might be memorable for people, and it’s a way of framing it that I hadn’t really thought about myself before.

That was that you’ve got currently a range of different views on how worried we ought to be about AI extinction: you’ve got some people who are extremely worried, plenty of people who are kind of in the middle, and some people who think it’s ridiculous and not an issue at all. And if you think about the people who are really quite worried and are interested in doing something substantial about it, ideally they should be thinking what policy proposals can we put forward that are enormously beneficial from our point of view, that do a lot to improve safety while not being that irritating to the people who don’t agree with us about this and would rather maybe go ahead quite quickly?

And for any given level of general public support for taking action, taking costly action in order to help make AI more likely to go well and less likely for rogue AI to take over, it’s probably not going to be part of the efficient bundle, or at least not under current circumstances, to push for an AI pause — because for the reasons that you’ve laid out, the gain in safety is kind of unclear, could conceivably be even in the other direction, or at least the gain in safety is not enormous. And yet the cost — certainly from the perspective of people who are not very worried about rogue AI, and really think that we should be pushing forward and trying to get the benefits of AI advances as quickly as possible — the idea of a moratorium and AI research for six months is incredibly aggravating to them, while not being so beneficial from the perspective of people who are very worried and want to take action.

So for any given level of public support, or any given level of typical concern, you still want to be thinking, “What’s part of the efficient bundle of policy that we want to put forward that has a big punch in terms of safety, and not such a big cost from the perspective of people who don’t agree with us?” Have I understood at least one of the arguments you’re making?

Carl Shulman: Yes, that’s right. So I was just now discussing, in a case where there was no cost and say there’s a referendum, how would I vote? Or why didn’t I sign that letter? Why didn’t I sign the pause AI letter for a six-month pause around now?

But in terms of expending political capital or what asks would I have of policymakers, indeed, this is going to be quite far down the list, because its political costs and downsides are relatively large for the amount of benefit — or harm. At the object level, when I think it’s probably bad on the merits, it doesn’t arise. But if it were beneficial, I think that the benefit would be smaller than other moves that are possible — like intense work on alignment, like getting the ability of governments to supervise and at least limit disastrous corner-cutting in a race between private companies: that’s something that is much more clearly in the interest of governments that want to be able to steer where this thing is going.

And yeah, the space of overlap of things that help to avoid risks of things like AI coups, AI misinformation, or use in bioterrorism, there are just any number of things that we are not currently doing that are helpful on multiple perspectives — and that are, I think, more helpful to pursue at the margin than an early pause.

How Carl is feeling about the future [02:15:47]

Rob Wiblin: We’ve talked about a lot of very good and very bad things through these various different interview sessions. All things considered, bringing it together: How excited versus scared are you about the future?

Carl Shulman: I’d say both are present. So my median expectation, I think it’s more likely than not that things wind up looking quite good, that we avoid a disaster that kills off humanity, and that probably we don’t get a permanent global totalitarianism or something like that. Probably we have a world where, with advanced technology and improved public epistemology, and just pluralism and the amount of goodwill that people have going, we get to a society where everyone can enjoy a prosperous standard of living and be pretty happy if they want to.

But also, I’m worried about disaster at a personal level. If AI was going to happen 20 years later, that would better for me. But that’s not the way to think about it for society at large. And so I’m just going to try and make it as likely as I can that things go well and not badly, and live with the excitement of both potential good and potential bad.

Rob Wiblin: Well, conversations with you are always phenomenally dense and phenomenally informative. So I really appreciate you giving us so much of your time and so much of your wisdom today, Carl. My guest today has been Carl Shulman. Thanks so much for coming on The 80,000 Hours Podcast, Carl.

Carl Shulman: Thanks.

Rob’s outro [02:17:37]

Rob Wiblin: All right, if you’d like to learn more about these topics, here are some places to go.

Of course you can find Carl’s other two interviews on the Dwarkesh Podcast, both from June 2023:

Here on The 80,000 Hours Podcast, the most related episodes are probably:

We also interviewed Carl in another similarly insight dense episode a few years ago, where we covered non-AI threats to our future: #112 – Carl Shulman on the common-sense case for existential risk work and its practical implications.

If you could imagine yourself ever wanting to join or launch a project focused on safely navigating the societal transition to powerful AI, or you already are, then 80,000 Hours has a census that we’d love for you to fill out at 80000hours.org/aicensus.

We’ll share your responses with organisations working on reducing risks from AI when they’re hiring and with individuals who are looking for a cofounder.

We’re interested in hearing from people with a wide range of different skill sets — including technical research, governance, operations, and field-building.

Naturally, we’ll only share your data with organisations, teams, or individuals who we think are making positive contributions to the field.

Beyond your name, email, and LinkedIn (or CV), all the other questions are optional, so it need not take long to fill it out.

That URL again is 80000hours.org/aicensus.

All right, The 80,000 Hours Podcast is produced and edited by Keiran Harris.

The audio engineering team is led by Ben Cordell, with mastering and technical editing by Milo McGuire / Simon Monsour, and Dominic Armstrong.

Full transcripts and an extensive collection of links to learn more are available on our site, and put together as always by Katy Moore.

Thanks for joining, talk to you again soon.

Learn more

AI governance and policy

Improving decision making (especially in important institutions)

Forecasting and related research and implementation

The 80,000 Hours Podcast on Artificial Intelligence and related topics

Related episodes

June 27, 2024

#191 (Part 1) – Carl Shulman on the economy and national security after AGI

Listen now

October 5, 2021

#112 – Carl Shulman on the common-sense case for existential risk work and its practical implications

Listen now

October 23, 2023

#168 – Ian Morris on whether deep history says we're heading for an intelligence explosion

Listen now

February 21, 2024

#180 – Hugo Mercier on why gullibility and misinformation are overrated

Listen now

June 28, 2019

#60 – Prof Tetlock on why accurate forecasting matters for everything, and how you can do it better

Listen now

May 5, 2023

#150 – Tom Davidson on how quickly AI could transform the world

Listen now

August 23, 2023

#161 – Michael Webb on whether AI will soon cause job loss, lower incomes, and higher inequality — or the opposite

Listen now

October 12, 2023

#166 – Tantum Collins on what he's learned as an AI policy insider at the White House, DeepMind and elsewhere

Listen now

About the show

The 80,000 Hours Podcast features unusually in-depth conversations about the world's most pressing problems and how you can use your career to solve them. We invite guests pursuing a wide range of career paths — from academics and activists to entrepreneurs and policymakers — to analyse the case for and against working on different issues and which approaches are best for solving them.

Get in touch with feedback or guest suggestions by emailing [email protected].

What should I listen to first?

We've carefully selected 10 episodes we think it could make sense to listen to first, on a separate podcast feed:

Check out 'Effective Altruism: An Introduction'

Subscribe here, or anywhere you get podcasts:

If you're new, see the podcast homepage for ideas on where to start, or browse our full episode archive.

On this page:

Highlights

How AI advisors could have saved us from COVID-19

Why Carl doesn't support enforced pauses on AI research

Value lock-in

How democracies avoid coups

Building trust between adversaries about which models you can believe

Opportunities for listeners

Articles, books, and other media discussed in the show

Transcript

Cold open [00:00:00]

Rob’s intro [00:01:16]

The interview begins [00:03:24]

COVID-19 concrete example [00:11:18]

Sceptical arguments against the effect of AI advisors [00:24:16]

Value lock-in [00:33:59]

How democracies avoid coups [00:48:08]

Where AI could most easily help [01:00:25]

AI forecasting [01:04:30]

Application to the most challenging topics [01:24:03]

How to make it happen [01:37:50]

International negotiations and coordination and auditing [01:43:54]

Opportunities for listeners [02:00:09]

Why Carl doesn’t support enforced pauses on AI research [02:03:58]

How Carl is feeling about the future [02:15:47]

Rob’s outro [02:17:37]

Learn more

AI governance and policy

Improving decision making (especially in important institutions)

Forecasting and related research and implementation

The 80,000 Hours Podcast on Artificial Intelligence and related topics

Related episodes

About the show

What should I listen to first?