15 expert takes on infosec in the age of AI

By The 80,000 Hours podcast team · Published March 28th, 2025 ·

15 expert takes on infosec in the age of AI

By The 80,000 Hours podcast team · Published March 28th, 2025

What happens when a USB cable can secretly control your system? Are we hurtling toward a security nightmare as critical infrastructure connects to the internet? Is it possible to secure AI model weights from sophisticated attackers? And could AI might actually make computer security better rather than worse?

With AI security concerns becoming increasingly urgent, we bring you insights from 15 top experts across information security, AI safety, and governance, examining the challenges of protecting our most powerful AI models and digital infrastructure — including a sneak peek from an episode that hasn’t yet been released with Tom Davidson, where he explains how we should be more worried about “secret loyalties” in AI agents.

You’ll hear:

Holden Karnofsky on why every good future relies on strong infosec, and how hard it’s been to hire security experts (from episode #158)
Tantum Collins on why infosec might be the rare issue everyone agrees on (episode #166)
Nick Joseph on whether AI companies can develop frontier models safely with the current state of information security (episode #197)
Sella Nevo on why AI model weights are so valuable to steal, the weaknesses of air-gapped networks, and the risks of USBs (episode #195)
Kevin Esvelt on what cryptographers can teach biosecurity experts (episode #164)
Lennart Heim on on Rob’s computer security nightmares (episode #155)
Zvi Mowshowitz on the insane lack of security mindset at some AI companies (episode #184)
Nova DasSarma on the best current defences against well-funded adversaries, politically motivated cyberattacks, and exciting progress in infosecurity (episode #132)
Bruce Schneier on whether AI could eliminate software bugs for good, and why it’s bad to hook everything up to the internet (episode #64)
Nita Farahany on the dystopian risks of hacked neurotech (episode #174)
Vitalik Buterin on how cybersecurity is the key to defence-dominant futures (episode #194)
Nathan Labenz on how even internal teams at AI companies may not know what they’re building (episode #176)
Allan Dafoe on backdooring your own AI to prevent theft (episode #212)
Tom Davidson on how dangerous “secret loyalties” in AI models could be (episode to be released!)
Carl Shulman on the challenge of trusting foreign AI models (episode #191, part 2)
Plus lots of concrete advice on how to get into this field and find your fit

Audio engineering: Ben Cordell, Milo McGuire, Simon Monsour, and Dominic Armstrong
Content editing: Katy Moore and Milo McGuire
Transcriptions and web: Katy Moore

Transcript

Table of Contents

1 Cold open [00:00:00]
2 Rob’s intro [00:00:49]
3 Holden Karnofsky on why infosec could be the issue on which the future of humanity pivots [00:03:21]
4 Tantum Collins on why infosec is a rare AI issue that unifies everyone [00:12:39]
5 Nick Joseph on whether the current state of information security makes it impossible to responsibly train AGI [00:16:23]
6 Nova DasSarma on the best available defences against well-funded adversaries [00:22:10]
7 Sella Nevo on why AI model weights are so valuable to steal [00:28:56]
8 Kevin Esvelt on what cryptographers can teach biosecurity experts [00:32:24]
9 Lennart Heim on the possibility of an autonomously replicating AI computer worm [00:34:56]
10 Zvi Mowshowitz on the absurd lack of security mindset at some AI companies [00:48:22]
11 Sella Nevo on the weaknesses of air-gapped networks and the risks of USB devices [00:49:54]
12 Bruce Schneier on why it’s bad to hook everything up to the internet [00:55:54]
13 Nita Farahany on the possibility of hacking neural implants [01:04:47]
14 Vitalik Buterin on how cybersecurity is the key to defence-dominant futures [01:10:48]
15 Nova DasSarma on exciting progress in information security [01:19:28]
16 Nathan Labenz on how even internal teams at AI companies may not know what they’re building [01:30:47]
17 Allan Dafoe on backdooring your own AI to prevent someone else from stealing it [01:33:51]
18 Tom Davidson on how dangerous “secret loyalties” in AI models could get [01:35:57]
19 Carl Shulman on whether we should be worried about backdoors as governments adopt AI technology [01:52:45]
20 Nova DasSarma on politically motivated cyberattacks [02:03:44]
21 Bruce Schneier on the day-to-day benefits of improved security and recognising that there’s never zero risk [02:07:27]
22 Holden Karnofsky on why it’s so hard to hire security people despite the massive need [02:13:59]
23 Nova DasSarma on practical steps to getting into this field [02:16:37]
24 Bruce Schneier on finding your personal fit in a range of security careers [02:24:42]
25 Rob’s outro [02:34:46]

Cold open [00:00:00]

Sella Nevo: Let’s put highly secure air-gapped networks aside for a moment and just talk about getting a USB to connect to a network. It’s worth flagging that this is a really easy thing to do.

One thing that people will do — and this is not just nation-states and whatnot; this is random hackers that want to do things for the fun of it — can just drop a bunch of USB sticks in the parking lot of an organisation, and someone will inevitably be naive enough to be like, “Oh no, someone has dropped this. Let’s plug it in and see who this belongs to.” And you’re done. Now you’re in and you can spread in the internal network.

This happens all the time. It’s happened multiple times in multiple nuclear sites in the United States. So yeah, this is a pretty big deal.

Luisa Rodriguez: That’s unreal!

Rob’s intro [00:00:49]

Rob Wiblin: Hey listeners, Rob here. You were just listening to Sella Nevo from episode #195 on who’s trying to steal frontier AI models and what they could do with them.

Today we’ve got another compilation of great bits from the show, this time about information and computer security — something that has come up many times because, as the guests will explain, it really does matter an awful lot.

We’ve even got a preview of an upcoming episode that we thought would be out by this week but has had to be held back while the paper it talks about receives some final tweaks. See if you can figure out which that is.

I really enjoyed listening back over these extracts and getting a reminder of how much funny and important stuff has appeared on the show over the years.

Coming up we have:

Holden Karnofsky on why it is that infosec could be the issue on which the future of humanity pivots
Tantum Collins on why infosec is a rare AI issue that unifies everyone
Nick Joseph on whether the current state of information security makes it impossible to responsibly train AGI
Nova DasSarma on the best available defences against well-funded adversaries
Sella Nevo on why AI model weights are so valuable to steal
Lennart Heim on the possibility of an autonomously replicating AI computer worm
Zvi Mowshowitz on the absurd lack of security mindset at some AI companies
Sella Nevo on the weaknesses of air-gapped networks and the risks of USB devices
Bruce Schneier on why it’s bad to hook everything up to the internet
Nita Farahany on the possibility of hacking neural implants
Vitalik Buterin on how cybersecurity is the key to defence-dominant futures
Tom Davidson on how dangerous “secret loyalties” in AI models could get
Carl Shulman on whether we should be worried about backdoors as governments adopt AI technology
Allan Dafoe on backdooring your own AI to prevent someone else from stealing it

Plus another 10 or so extracts on top of that.

If you’d like to learn more about impact-driven information security careers, we’ve got an article on the 80,000 Hours website titled “Information security in high-impact areas.” Probably best to just google that.

We’ve also got 91 infosec-related jobs listed on our job board at the moment, from junior to senior roles. You can set up email alerts to tell you whenever we add new ones that meet your specific requirements so you never miss out.

You can find that at jobs.80000hours.org. There’s 825 opportunities listed on the job board in total at the moment, so plenty of other non-infosec stuff on there as well.

All right, here’s those 15 different guests on information security!

Holden Karnofsky on why infosec could be the issue on which the future of humanity pivots [00:03:21]

From #158 – Holden Karnofsky on how AIs might take over even if they’re no smarter than humans, and his 4-part playbook for AI risk

Rob Wiblin: So this kind of a worldview leads into something that you wrote, which is this four-intervention playbook for possible success with AI. You described four different categories of interventions that we might engage in in order to try to improve our odds of success: alignment research, standards and monitoring, creating a successful and careful AI lab, and finally, information security.

We’ve been trying to get information security folks from AI labs on the show to talk about this, but understandably, there’s only so much that they want to divulge about the details of their work. Why is information security potentially so key here?

Holden Karnofsky: I’ve done a lot of thinking of what are different ways the future could go well, and there are some themes in them. There’s almost no story of the future going well that doesn’t have a part that’s like “…and no evil person steals the AI weights and goes and does evil stuff.”

And so it has highlighted the importance of information security: “You’re training a powerful AI system; you should make it hard for someone to steal” has popped out to me as a thing that just keeps coming up in these stories, keeps being present. It’s hard to tell a story where it’s not a factor. It’s easy to tell a story where it is a factor.

I think you could build these powerful, dangerous AI systems, and you can do a lot to try to mitigate the dangers — like limiting the ways they can be used, you can do various alignment techniques — but if some state or someone else steals the weights, they’ve basically stolen your AI system, and they can run it without even having to do the training run. So you might spend a huge amount of money on a training run, end up with this AI system that’s very powerful, and someone else just has it.

And they can then also fine-tune it, which means they can do their own training on it and change the way it’s operating. So whatever you did to train it to be nice, they can train that right out; the training they do could screw up whatever you did to try and make it aligned.

And so I think at the limit of “it’s really just trivial for any state to just grab your AI system and do whatever they want with it and retrain it how they want,” it’s really hard to imagine feeling really good about that situation. I don’t know if I really need to elaborate a lot more on that. So making it harder seems valuable.

This is another thing where I want to say, as I have with everything else, that it’s not binary. So it could be the case that, after you improve your security a lot, it’s still possible for a state actor to steal your system — but they have to take more risks, they have to spend more money, they have to take a deeper breath before they do it. It takes them more months. Months can be a very big deal. As I’ve been saying, when you get these very powerful systems, you could do a lot in a few months. By the time they steal it, you could have a better system. So I don’t think it’s an all-or-nothing thing.

But no matter what risk of AI you’re worried about — you could be worried about the misalignment; you could be worried about the misuse and the use to develop dangerous weapons; you can be worried about more esoteric stuff, like how the AI does decision theory; you could be worried about mind crime — you don’t want just anyone, including some of these state actors who may have very bad values, to just be able to steal a system, retrain it how they want, and use it how they want. You want some kind of setup where it’s the people with good values controlling more of the more powerful AI systems, using them to enforce some sort of law and order in the world, and enforcing law and order generally — with or without AI. So it seems quite robustly important.

You know, I think if you were to throw in a requirement with the licence, I would make it about information security. Government requiring at least minimum security requirements for anyone training frontier models just seems like a good idea — just getting them on that ramp to where it’s not so easy for a state actor to steal it. Arguably, governments should just require all AI models to be treated as top-secret classified information — which means that they would have to be subject to incredible draconian security requirements involving like air-gapped networks and all this incredibly painful stuff.

Arguably, they should require that at this point, given how little we know about what these models are going to be imminently capable of. But at a minimum, some kind of security requirement seems good.

Other things about security is that I think it’s very, very hard, just very hard to make these systems hard to steal for a state actor, and so there’s just a tonne of room to go and make things better. There could be security research on innovative new methods, and there can also be a lot of blocking and tackling — just getting companies to do things that we already know need to be done, but that are really hard to do in practice, take a lot of work, take a lot of iteration.

Also, a nice thing about security, as opposed to some of these other things: it is a relatively mature field, so you can learn about security in some other context and then apply it to AI. So yeah, I think security is a really big deal. I think it hasn’t gotten enough attention.

Rob Wiblin: It’s a shame that more people haven’t gone into it, because even setting all of this aside, it seems like going into information security, computer security is a really outstanding career. It’s the kind of thing that I would have loved to do in an alternative life, because it’s kind of tractable and also it’s exciting, and really important things you can do. It’s very well paid as well.

Holden Karnofsky: Yeah. I think the demand is crazily out ahead of the supply in security, which is another reason I wish more people had gone into it.

Rob Wiblin: Yeah. So I’m basically totally on board with this line of argument. I guess if I had to push back, I’d say maybe we’re just so far away from being able to secure these models that you could put in an enormous amount of effort — maybe the greatest computer security effort that’s ever been put towards any project — and maybe you would end up with it costing a billion dollars in order to steal the model. But that’s still peanuts to China or to state actors, and this is obviously going to be on their radar by the relevant time.

So maybe really the message we should be pushing is: because we can’t secure the models, we just have to not train them. And that’s the only option here. Or perhaps you just need to move the entire training process inside the NSA building or whoever has the best security — you just basically take that and then use that as the shell for the training setup.

Holden Karnofsky: I don’t think I understand either of these alternatives. I think we can come back to the billion-dollar point, because I don’t agree with that either.

But let’s start with this: the only safe thing is not to train. I’m just like, how the heck would that make sense? Unless we get everyone in the world to agree with that forever, that doesn’t seem like much of a plan. So I don’t understand that one.

I don’t understand moving inside the NSA building, because if it’s possible for the NSA to be secure, then it’s probably possible for a company to be secure with a lot of effort. Neither of these is making sense to me as an alternative.

Rob Wiblin: Because they’re two different arguments. So the NSA one I suppose will be saying that it’s going to be so hard to convert a tech company into being sufficiently secure that we just need to get the best people in the business, wherever they are, working on this problem, and basically, we have to redesign it from the ground up.

Holden Karnofsky: Well, that might be what we have to do. I mean, a good step toward that would be for a lot of great people to be working in security to determine that that’s what has to happen: to be working at companies, to be doing the best they can, and say, “This is what we have to do.” But let’s try and be as adaptable as we can.

I mean, it’s like zero chance that the company would just literally become the NSA. They would figure out what the NSA is doing that they’re not, they would do that, and they would make the adaptations they have to make. That would take an enormous amount of intelligence and creativity and personpower — and the more security people there are, the better they would do it. So I don’t know that that one is really an alternative.

Rob Wiblin: OK, and what about the argument that we’re not going to be able to get it to be secure enough? So it might even just give us false comfort to be increasing the cost of stealing the model when it’s still just going to be sufficiently cheap.

Holden Karnofsky: I don’t think it’ll be false comfort. I think if you have a zillion great security people, and they’re all like, “FYI, this thing is not safe,” I think we’re probably going to feel less secure than we do now, when we just have a lot of confusion and FUD [fear, uncertainty, and doubt] about exactly how hard it is to protect the model. So I don’t know. I’m kind of like, what’s the alternative?

But putting aside what’s the alternative, I would just disagree with this thing that it’s a billion dollars and it’s peanuts. I would just say that at the point where it’s really hard, anything that’s really hard there’s an opportunity for people to screw it up.

Rob Wiblin: Sometimes it doesn’t happen.

Holden Karnofsky: It doesn’t happen. They might not be able to pull it off. They might just screw it up a bunch of times, and that might give us enough months to have enough of an edge that it doesn’t matter.

I think another point in all this is that if we get to a future world where you have a really good standards and monitoring regime, one of the things you’re monitoring for could be security breaches. So you could be saying we’re using AI systems to enforce some sort of regulatory regime that says you can’t train a dangerous system. Well, not only can’t you train a dangerous system; you can’t steal any system — if we catch you, there’s going to be consequences for that. Those consequences could be arbitrarily large.

And it’s one thing to say a state actor can steal your AI; it’s another thing to say they can steal your AI without a risk of getting caught. These are different security levels. So I guess there’s a hypothetical world in which no matter what your security is, a state actor can easily steal it in a week without getting caught. But I doubt we’re in that world. I think you can make it harder than that, and I think that’s worth it.

Rob Wiblin: Yeah. Well, I’ve knocked it out of the park in terms of failing to disprove this argument that I agree with.

So please, people, go and learn more about this. We’ve got an information security career review. There’s a post up on the Effective Altruism Forum called “EA Infosec: skill up in or make a transition to infosec via this book club,” which you could go check out. There’s also the EA infosec Facebook group. So quite a lot of resources. Hopefully, finally, people are waking up to this as a really impactful career.

Holden Karnofsky: This is a lot of different jobs, by the way. There’s security researchers, there’s security engineers, there’s security DevOps people and managers. And this is a big thing. We’ve oversimplified it, and I’m not an expert at all.

Tantum Collins on why infosec is a rare AI issue that unifies everyone [00:12:39]

From #166 – Tantum Collins on what he’s learned as an AI policy insider at the White House, DeepMind and elsewhere

Rob Wiblin: One other suggestion I’ve heard for something that should be viewed as good by the lights of many of these different interest groups is cybersecurity.

Cybersecurity looks really good from a national security point of view, because you really don’t want dangerous technology leaking out to terrorists, or to adversary states potentially. It’s really good from an extinction risk point of view. I would guess that most people either think that that’s really good or at worst neutral. So that seems like one where you could potentially build quite a broad coalition around requiring extremely high cybersecurity for frontier AI models potentially.

Tantum Collins: Yeah, I think that’s totally right, actually. I agree with that. I guess there’s a long list of AI application areas that almost nobody would say are bad. So for instance, using AI to help develop cures for really dangerous diseases, using AI to improve cyberdefences, stuff like that. But I think that’s maybe a slightly different level of abstraction than the category of policies that we’re talking about.

Rob Wiblin: Well, I guess improving the methods that we have for keeping models on track, in terms of what goals they’re pursuing and why, as they become substantially more capable. That is kind of one of the main thrusts of the extinction-focused work that I think that at least NatSec people would also think is probably good, and I would think that AI ethics people would also think is probably good. I guess I don’t know of anyone who really objects to that line of research. At worst, they might think it’s a bit of a waste of resources or something, if they don’t think it’s promising.

Tantum Collins: Yeah, I think that’s right. Maybe another one that would be interesting is improvements to privacy-preserving machine learning infrastructure and techniques. These are techniques like differential privacy, homomorphic encryption, secure multiparty computation, and federated learning that give you some level of privacy guarantee about saying you can train a model on data in ways that don’t have to give the model owner repeatable interpretable access to the data, nor give the data owner repeatable interpretable access to the model, but still you get the benefits of model improvement at the end of the day.

And historically, these have usually had a big additional cost, computationally. There are some interesting groups, most notably OpenMined, which is an open source community that is developing a library called PySyft that is trying to reduce the computational cost of those systems.

There are a lot of areas where we would like to have the public benefits of stuff that is trained on sensitive data, but we don’t want to compromise individuals’ personal information.

Some of the most obvious ones here are medical use cases: it would be great to have systems that are trained on the distributed, let’s say, mammography data in order to improve tumour detection — but, completely understandably, people don’t want to hand their sensitive healthcare information over to for-profit companies.

There are also a whole bunch of other applications, including actually model governance itself: How can you deploy systems that will assess the capabilities of certain models, or the size of training runs, or what have you, without getting any other creepy surveillance data from that stuff?

So there is a whole world of possibilities in this direction that could be helpful for developing models that would have to be trained on sensitive data and/or handling governance responsibilities that might otherwise bring surveillance implications that make us uncomfortable.

I think that is a technical stack that is under development now, but where lots more work could be done. And the overwhelming majority of the use cases that it enables, at least the ones that I have come across, are quite positive. So that might be another area that there would be shared interest in developing.

Nick Joseph on whether the current state of information security makes it impossible to responsibly train AGI [00:16:23]

From #197 – Nick Joseph on whether Anthropic’s AI safety policy is up to the task

Rob Wiblin: If I think about how this is most likely to play out, I imagine that at the point that we do have models that we really want to protect from even the best state-based hackers, there probably has been some progress in computer security, but not nearly enough to make you or me feel comfortable that there’s just no way that China or Russia might be able to steal the model weights.

And so it is very plausible that the RSP will say, “Anthropic, you have to keep this on a hard disk, not connected to any computer. You can’t train models that are more capable than the thing that we already have that we don’t feel comfortable handling.”

And then how does that play out? There are a lot of people who are very concerned about safety at Anthropic. I’ve seen that there are kind of league tables now of different AI companies and enterprises, and how good do they look on an AI safety point of view — and Anthropic always comes out of the top, I think by a decent margin.

But months go by, other companies are not being as careful as this. You’ve complained to the government, and you’ve said, “Look at this horrible situation that we’re in. Something has to be done.” But I don’t know. I guess possibly the government could step in and help there, but maybe they won’t. And then over a period of months or years, doesn’t the choice effectively become, if there is no solution, either take the risk or just be rendered irrelevant?

Nick Joseph: Maybe just going back to the beginning of that, I don’t think we will put something in that says there is zero risk from something. I think you can never get to zero risk.

I think often with security you’ll end up with some security/productivity tradeoff. So you could end up taking some really extreme risk or some really extreme productivity tradeoff where only one person has access to this. Maybe you’ve locked it down in some huge amount of ways. It’s possible that you can’t even do that. You really just can’t train the model. But there is always going to be some balance there. I don’t think we’ll push to the zero-risk perspective.

But yeah, I think that that’s just a risk. I don’t know. I think there’s a lot of risks that companies face where they could fail. We also could just fail to make better models and not succeed that way. I think the point of the RSP is it has tied our commercial success to the safety mitigations, so in some ways it just adds on another risk in the same way as any other company risk.

Rob Wiblin: It sounds like I’m having a go at you here, but I think really what this shows up is just that, I think that the scenario that I painted there is really quite plausible, and it just shows that this problem cannot be solved by Anthropic. Probably it can’t be solved by even all of the AI companies combined.

The only way that this RSP is actually going to be able to be usable, in my estimation, is if other people rise to the occasion, and governments actually do the work necessary to fund the solutions to computer security that will allow us to have the model weights be sufficiently secure in this situation. And yeah, you’re not blameworthy for that situation. It just says that there’s a lot of people who need to do a lot of work in coming years.

Nick Joseph: Yeah. And I think I might be more optimistic than you or something. I do think if we get to something really dangerous, we can make a very clear case that it’s dangerous, and these are the risks unless we can implement these mitigations. I hope that at that point it will be a much clearer case to pause or something.

I think there are many people who are like, “We should pause right now,” and see everyone saying no. And they’re like, “These people don’t care. They don’t care about major risks to humanity.” I think really the core thing is people don’t believe there are risks to humanity right now. And once we get to this sort of stage, I think that we will be able to make those risks very clear, very immediate and tangible.

And I don’t know. No one wants to be the company that caused a massive disaster, and no government also probably wants to have allowed a company to cause it. It will feel much more immediate at that point.

Rob Wiblin: A metaphor that you use within your responsible scaling policy is putting together an aeroplane while you’re flying it.

I think that is one way that the challenge is particularly difficult for the industry and for Anthropic: unlike with biological safety levels — where basically we know the diseases that we’re handling, and we know how bad they are, and we know how they spread, and things like that — the people who are figuring out what BSL-4 security should be like can look at lots of studies to understand exactly the organisms that already exist and how they would spread, and how likely they would be to escape, given these particular ventilation systems and so on. And even then, they mess things up decently often.

But in this case, you’re dealing with something that doesn’t exist — that we’re not even sure when it will exist or what it will look like — and you’re developing the thing at the same time that you’re trying to figure out how to make it safe. It’s just extremely difficult.

And we should expect mistakes. That’s something that we should keep in mind: even people who are doing their absolute best here are likely to mess up. And that’s a reason why we need this defence in depth strategy that you’re talking about, that we don’t want to put all of our eggs in the RSP basket. We want to have many different layers, ideally.

Nick Joseph: It’s also a reason to start early. I think one of the things with Claude 3 was that that was the first model where we really ran this whole process. And I think some part of me felt like, wow, this is kind of silly. I was pretty confident Claude 3 was not catastrophically dangerous. It was slightly better than GPT-4, which had been out for a long time and had not caused a catastrophe.

But I do think that the process of doing that — learning what we can and then putting out public statements about how it went, what we learned — is the way that we can have this run really smoothly the next time. Like, we can make mistakes now. We could have made a tonne of mistakes, because the stakes are pretty low at the moment. But in the future, the stakes on this will be really high, and it will be really costly to make mistakes. So it’s important to get those practice runs in.

Nova DasSarma on the best available defences against well-funded adversaries [00:22:10]

From #132 – Nova DasSarma on why information security may be critical to the safe development of AI systems

Rob Wiblin: My perception, as someone who takes a slight amateur interest in information security issues, is that the state of the art is very bad. That we do not really have reliable ways of stopping a really advanced, well-funded adversary from stealing data, if this is something that they’re willing to invest a lot of human capital in. Is that kind of right?

Nova DasSarma: I think that’s kind of right. I’ve got a story here around this, if you want to hear it.

Rob Wiblin: Yeah. Go for it.

Nova DasSarma: A state that will not be named had an attack that was in the news recently, that was a zero-click vulnerability on iMessage. A “zero-click vulnerability” is one where the user doesn’t have to take any actions for them to be compromised.

And this had to do with something called the JBIG2 compression algorithm, which you might have heard about, because back in the day, Xerox used to use this for copiers. It’s a compression algorithm, which means that you can copy things faster. But it turns out that if you turn the compression up too high, it turns zeros to nines and vice versa, which is quite bad for numerics.

That being said, JBIG2 was also the culprit in this case, where their compression algorithm is dynamic — which means that you can specify patterns on the fly. It turns out that if you construct a file that has the JBIG2 codec in it, then you can construct logical gates out of this. Which means that in theory, it’s Turing complete — and in practice, it was Turing complete. So to deliver this vulnerability, they produced a computer within the JBIG2 decompression algorithm to deliver the payload to these phones.

And that’s the sort of thing where you could theoretically have defended against this, but the way that you defended against this was least access — so not being able to access anything on your phones, or not having phones. Both of these things are really quite difficult to implement in an organisation above a certain size that doesn’t have a very, very strong security mindset.

Rob Wiblin: Security culture.

Nova DasSarma: Yeah. So that’s on the state access side. That being said, the thing that works the most is always going to be a social attack. So something where you meet someone at a party, and they seem nice, and they become your friend. And then you let them into your building when you maybe shouldn’t have done that, and they plug a USB into your system, and you’re done. We talk about physical access being the end of the line in security oftentimes.

Rob Wiblin: Right. OK, so one thing is a very well-endowed actor can develop zero-days. We basically live in a world where states are able to figure out completely new ways of breaking into computers, into phones, that people can’t protect against, because no one else is aware of them and potentially they can require no action whatsoever.

Even more accessible to actors who have less money is this kind of social engineering attack, where they’ll convince someone to give them access. And this doesn’t require quite the same level of technical chops.

But nonetheless, basically it’s extremely hard to secure a system to be very confident that it’s not vulnerable to one or the other of these approaches.

Nova DasSarma: For sure. And on the social engineering side, you don’t need the folks who have the most access in your organisation to be compromised by social engineering attacks. Oftentimes those are the folks who are least vulnerable to that. All you need to do is have somebody who is on operations — or somebody who is maybe even the physical security person for the building who connects to your corporate wifi — be compromised, and then they can be the threat vector into your organisation.

Rob Wiblin: Yeah. So given that we live in that kind of world, should we just not be training models where it will be disastrous if they leak? It just seems like we just don’t live in a world that’s safe for that kind of ML model yet.

Nova DasSarma: For sure. And that’s something that a lot of labs would definitely have on their minds. I think that it being difficult to secure models is one of the reasons why we wouldn’t want to train such models.

Models with a lot of capabilities are oftentimes very alluring for people to build anyways though, so my perspective on this is that it’s important for me, and people who I work with, to develop tools to defend things anyways. Because if you can disrupt that sort of attack while it’s happening, if you can notice it’s happening, then you’ve got a better chance of keeping things contained longer.

Rob Wiblin: Yeah. Coming back to the preventability, I guess you’re just saying that given the current state of information security, in order to keep this information secret, they would have to bury it so deeply that it would interfere with their operations more than it would even be worth, in terms of protecting the information. So basically, if a piece of information is this valuable and this obvious, we should often expect it to be stolen in one way or another.

Nova DasSarma: I think that’s a good expectation to have. It might not be what actually happens. I wouldn’t give greater than 50% on a one-year timeline of, for example, Hofvarpnir’s CTL command being stolen. But it’s a good mindset to have. It makes you think more carefully about what sorts of capabilities you’re developing and things like that. Because if you assume that a bad actor is going to use it, then you are going to be in a better state if they do actually end up using it.

Rob Wiblin: So to what degree could we solve any of these information security issues by putting information that we don’t want to get out there on ice somehow? Like putting things in cold storage, except for the exceptional cases where occasionally you want to access them for some practical reason?

Nova DasSarma: I highly recommend it. Certainly limiting the amount of information you have that is eligible to be exploited and that is easily accessible is a great way to limit your footprint.

So things like, if you’ve got a model that you trained and you have a whole bunch of checkpoints and you’re storing them on some online system, consider whether you need to do that. Consider whether you could instead encrypt those and put them on something like Amazon Glacier or Google Cloud Storage or something like that, where you’ve got a cold storage — it’s going to take several hours to restore it, and you can absolutely set an alarm if somebody tries to restore that information without letting you know.

Rob Wiblin: Interesting. What are some of the most secure networks that exist today, or most secure computer systems that exist?

Nova DasSarma: Well, I think that my TI-84 Plus calculator is pretty secure, because it can’t connect. It’s hard for me to comment really on the security of other organisations. I think that everyone’s trying very, very hard to produce systems that are secure and reliable, because that’s very much important for their bottom line.

Sella Nevo on why AI model weights are so valuable to steal [00:28:56]

From #195 – Sella Nevo on who’s trying to steal frontier AI models, and what they could do with them

Sella Nevo: The work that we did over the past year focused specifically on the confidentiality of the weights, which is a way of saying we want to make sure that the model weights are not stolen. And the reason we decided to at least start there is because the model weights represent kind of a unique culmination of many different costly prerequisites for training advanced models.

So to be able to produce these model weights, you need significant compute. It was estimated that GPT-4 cost $78 million and thousands of GPU years. Gemini Ultra cost nearly $200 million. And these costs are continuing to rise rapidly.

A second thing you need is enormous amounts of training data. It’s been rumoured to be more than 10 terabytes of training data for GPT-4. You need all those algorithmic improvements and optimisations that are used during training that you mentioned.

So if you can access the weights directly, you bypass at least hundreds of millions of dollars — and probably in practice a lot more that comes with talent and infrastructure and things like that that are not counted in the direct training cost.

But on the other hand, as soon as you have the weights, computing inference from a large language model is usually less than half a cent per 1,000 tokens. There’s still some compute involved, but it’s negligible. There are other things you need. Maybe you need to know the exact architecture, and you can’t always fully infer that from the weights. Obviously you need to have some machine learning understanding to be able to deploy this. But these are all fairly small potatoes relative to being able to produce the weights yourself. So there’s a lot of value in getting to those weights.

Critically, once you do that, you can pretty much do whatever you want: a lot of other defences that labs may have in place no longer apply. If there’s monitoring over the API to make sure you’re not doing things you’re not supposed to, that no longer matters because you’re running it independently. If there are guardrails that are trained into the model to prevent it from doing something, we know you can fine-tune those away, and so those don’t really matter. So really, there’s almost nothing to stop an actor from being able to abuse the model once they have access to the weights.

Luisa Rodriguez: Is their value limited by the fact that once you’ve got the model weights, that model will soon be surpassed by the next generation of frontier models?

Sella Nevo: I think that really depends on what the attacker wants to use them for, or what you as the defender are worried about. If we’re thinking about this like global strategic competition considerations — which countries will have the most capable models for economic progress and things like that — then I think that’s relevant. Still, stealing the models might give an attacker years of advantage relative to where they would have been otherwise.

I’m most concerned about just the abuse of these models to do something terrible. So if we were to evaluate a model and know that you can use it to do something terrible, I don’t really care that the company a few months later is even more capable. Still someone can abuse it to do something terrible.

Kevin Esvelt on what cryptographers can teach biosecurity experts [00:32:24]

From #164 – Kevin Esvelt on cults that want to kill everyone, stealth vs wildfire pandemics, and how he felt inventing gene drives

Kevin Esvelt: Due to my efforts on DNA synthesis screening, I’ve been spending a lot of time with cryptographers. And this is an area of cultural conflict between biosecurity and cybersecurity.

Cryptographers in particular make a number of assumptions going into their work. They say: Assume there is an adversary. Assume the adversary is smarter than you, better resourced than you, and is operating in the future with the benefit of technologies and advances that you don’t know and can’t imagine. And of course, they’ve had the opportunity to look at your defences after you construct them. So design accordingly.

Luisa Rodriguez: That’s a pretty different approach to my impression of the way biologists are thinking about this.

Kevin Esvelt: And even biosecurity people — which again, this is a nascent field, but come on — are still struggling with maybe we should require DNA synthesis screening at all, never mind ensuring that it actually is up to date and verifiable. And what about questions of information hazards? Maybe we shouldn’t disclose everything that we’re screening because the adversary can both use it against us and evade it. And maybe you shouldn’t have a screening criteria on a device. Maybe, maybe, maybe.

These are all much more advanced questions where, perhaps understandably, most people in the field are just focused on, “But we haven’t even gotten screening at all!” And my point is: But if any teenage malcontent can get ahold of your software or one device off of eBay, and then endlessly interrogate your screening criteria and then write a quick algorithm that can convert anyone’s DNA sequence into something that will evade screening, they’ve just negated your entire effort like that. So what’s the point?

You have to think at least a little bit about how to do it right. You have to think about more than just the next step. And what’s more, you need technical advances in order to meet those other goals. So if you’re doing technical research on what you need, then you should think about those later steps and try to learn from those disciplines.

I’ve often said that after working with the cryptographers and infosec folks for years, I now have the security mindset of about a three-year-old toddler compared to that — but even my three-year-old toddler self can say that you really don’t want to rely on our expectations and genetic engineering detection and similar algorithms to be reliable against a sophisticated adversary, if there is one.

Lennart Heim on the possibility of an autonomously replicating AI computer worm [00:34:56]

From #155 – Lennart Heim on the compute governance era and what has to come after

Rob Wiblin: So we’ve been talking a little bit about my nightmares and my bad dreams and where Rob’s imagination goes when he imagines how this is all going to play out. Maybe let’s talk about another one of these that I’ve been mulling over recently as I’ve been reading a lot about AI and seeing what capabilities are coming online, this time a bit more related to computer security specifically.

I basically am describing a computer worm, which I think our youngest listeners might not really have that much exposure to computer worms. But from what I understand, from the early days of the internet and networking computers through to about the period of Windows Vista, this was a regular occurrence — where basically people would find some vulnerability within an operating system, or sometimes within email software, that would basically allow you to break into a computer, then email everyone with a copy of the virus. And then it would spread to other computers until basically everything was shut down, just in a cacophony of people passing this malware or this virus between all of their computers.

You could Google this question, like the “largest computer worms” or the largest outbreaks. I remember when I was a kid there were a handful of times that enormous numbers of computers went down basically for a day or two until these vulnerabilities could be patched. You would have just companies completely inoperable, more or less, because their computer systems had been infected with these worms.

I think that stopped more or less because computer security got better. It’s still very bad, but it was so bad then that it’s not so easy now. There’s now a lot more firebreaks that make it hard to put together all of the security vulnerabilities that you need for a worm like that to operate.

So why could this come back? In the worm case, it was just a person or a group that programmed this tiny piece of software to use just a handful of vulnerabilities, or maybe just a single vulnerability, in order to break into these computer systems one after another in a kind of exponential growth situation.

In this new world, we’re imagining an ML system that is extremely good at doing security research, more or less, and discovering all kinds of different vulnerabilities. It basically has all of the knowledge that one might need in order to be an incredibly effective hacker. And so it’s going to just keep finding new vulnerabilities that it can use. So you shut it down from one avenue and then now it’s discovered something else and it’s copying itself using this other mechanism.

And potentially it could also self-modify in order to obfuscate its existence or obfuscate its presence on a computer system. So it’s quite hard to clear it out. So it can kind of lie idle for a very long time using very little compute and then come to life again and copy itself elsewhere using some new zero-day exploit, some new as-yet-unknown computer vulnerability that it’s picked up in the meantime.

So it seems like serious people worry that something along these lines could happen. I think jokers have already tried doing this with existing language models. Of course they don’t have the capabilities required to simply pull this off. It’s not simple, so it actually hasn’t happened. But if the capabilities got to a sufficiently high level, then this could be something that we could observe.

Lennart Heim: Yeah, it seems like everybody’s computer security worst nightmares are these kinds of things. Like having thought a bit and worked in information security, I was like, yep, information security is pretty bad. There’s definitely different companies with different standards there. And as you just described, back in the days, it used to be like the wild west, where literally kids were able to take down Myspace because they found some bugs, and then this thing was like self-replicating.

Um, what do we do about this?

Rob Wiblin: Well, it seems like the most realistic way to defend against this is that you would expect that the white hat people would have a larger budget than pranksters or terrorists or just ne’er-do-wells that are doing this. And so why didn’t Google train the hacking model first, and then use that to detect all the vulnerabilities that this model could possibly find and then patch them all?

I think the way that this could have legs is, firstly, it might just be that no one’s on the ball and so no one produces this ML hacking model for benevolent purposes first. So it might be that the bad people have this idea and put it into operation before the necessary security work has been done on the other side.

It might also just be that you might have an offence advantage here: where they only have to find one vulnerability, whereas you have to patch everything. And even if you have a security model that can discover the vulnerabilities, in fact, patching them might be a hell of a lot of work in many cases. There just aren’t enough system operators in the entire world to do all of the necessary software updates.

So anyway, it might be a danger during this kind of intermediate stage. Is there anything you want to add to this?

Lennart Heim: Yeah, maybe there are two different notions we should try to disentangle here. I think there’s one idea of just like there’s this AI system which is self-replicating and going around. And the best way to defend against this is you train the system ideally on an air gap server — so systems which are not connected to the internet — and you try to evaluate it there for these dangerous and self-replicating capabilities. And if they have it, well, please don’t deploy it. This is the first thing.

Another notion we can talk about is that you can use these models to help you code, and they help you to produce new malware. And then as we just basically described, it’s going from server to server and it’s doing X. Really depends what X is how you would eventually detect it.

And an AI system self-replicating, going from place to place to acquire more copies of itself, I think it’s something different than just like malware going around. Because this is already the case which we see a lot of times, right? We just expect like these offender capabilities, like these script kiddie capabilities to become significantly better in the near future to do this.

But yeah, for these AI systems, that’s why we need these capabilities evals: just like, people should really check: Do these systems have the idea in some ways to self-replicate? And I expect this to not come immediately from one system to the other, but rather like, cool, maybe certain systems can theoretically do it — you can basically talk to them and get them to do it over time, and in the future they might do it on their own. But we will see some prompts and ways, like some signs of this previously, where we should be really careful.

But this whole idea of having air gap servers really helps there. I think one of the worst things you can do with AI systems you don’t understand is deploy them on the internet. This seems really bad — the internet is just a wild west. And also just defend our critical infrastructure against the internet. Just don’t hook everything up to the internet. It’s just a bad idea.

Rob Wiblin: I’ve been just banging my head against walls for the last five years watching everything get connected to the internet, and it’s like this is a completely centralised failure node now for everything — for our water, for our electricity, for our cars. I think just based on common sense, given how bad computer security is, this has been a foolish move. There are benefits, but we’ve just been completely reckless I think in the way that we’ve connected essential services to the internet. At least, so far as I understand it, we haven’t connected the nukes to the internet, but that seems to be almost the only thing that we haven’t decided to make vulnerable.

Lennart Heim: Yeah, at least we leave this alone. This seems really good. But like everything else is. It seems really bad, and I think we have not seen the worst yet because nobody has deployed the capabilities yet. Most nation states know each other’s critical infrastructure. If they want to they can pull the plug, and for some reason they don’t do it — not for some reason; it makes sense to not do it — but if they wanted to, they could pull it. And having AI systems doing this is definitely not great. Some things should just simply not be connected to the internet.

It’s funny, as a technical guy I’ve always been the one who’s like, “Please let’s not hook it up to the internet.” Like this whole idea of internet of things is like… Don’t get me wrong, seems great — it’s a lot of fun having all of these fancy blue lights in your room — but oof.

Rob Wiblin: Yeah, we’re just going to lose all electronics simultaneously in a worst-case scenario, where someone sufficiently malicious, or an agent that’s sufficiently malicious is interested in basically shutting down society. And I mean, people would starve en masse. That’s the outcome of the way that we’re setting things up.

Lennart Heim: And we see this right now already with just companies where ransomwares are getting deployed, right? Just like whole companies. We had this in our hospitals. But lucky enough some ransomware is like, “Oh, sorry guys, we only meant to target financial corporations, not your hospitals. Here, here’s the encryption key. But sorry for taking off your whole network for a month or for a week or something.”

They’re not defended; nobody’s on the ball on cybersecurity. I feel pretty confident on this one statement. Some people just way more than others. But it just goes hand in hand with AI systems if we don’t figure this one out. And maybe you should leverage it in the meanwhile to make more systems secure. And if we can’t, just let’s not hook it up to the internet — that would be great.

Rob Wiblin: I think, unfortunately, we’ve just completely lost on that one. The not hooking up to the internet, there’s almost nothing left.

Lennart Heim: I think there’s like some critical infrastructure where we just don’t do it. I feel like I would expect some power facilities to not be hooked up with the internet, but maybe I’m just wrong and naive, maybe too optimistic.

Rob Wiblin: Yeah. I think this raises an entire intervention class that I haven’t seen discussed very much among AI existential risk folks, which is that maybe a very valuable thing to do is to start a computer security business that uses ML models to find vulnerabilities and alert people and try to get them to patch it, to try to get as much of a lead as you possibly can on just improving computer security in general against this broad threat of this new way that people can try to identify and take advantage of vulnerabilities.

Lennart Heim: Ideally the AI labs would do it. My colleague Markus had this idea that there needs to be some responsible disclosure. Where it’s like, “Hey, we’re an AI lab, we developed this system. Hello, society. There are these vulnerabilities. I think the system might be able to exploit it. We can only deploy the system if we patch these vulnerabilities, which we know the system can exploit for sure.” Or else we should not deploy the system, right?

Rob Wiblin: Yeah. One model that has better incentives might be that they have to notify people about all of these ways that it could harm those folks, or that their systems are vulnerable to it. And then they say, “Well, we’re going to deploy this in a month, so you’ve got a month to fix this” — or you’ve got six months, or whatever it is.

Lennart Heim: Ideally you have more time, and ideally you can also say no. It’s like, oh gosh, you’re not alone. There are still other people who eventually decide. Eventually it’s up to governments and democracies to decide what gets deployed.

Rob Wiblin: Yeah. I mean, currently if people don’t patch their computers, mostly that harms them, because maybe their money will be stolen or their data is going to be stolen.

Lennart Heim: I mean, there’s a harm to society, right? It’s just like insurances for these kinds of stuff.

Rob Wiblin: Well, the direction I was going was saying right now I bear most of the costs. But in this new world where compute can be used for hostile purposes, it becomes a whole societal issue if people aren’t patching their servers, or people’s servers are vulnerable — such that it may be necessary to have much more serious regulation saying it’s unacceptable to have large amounts of compute hooked up to the internet that are vulnerable to infiltration. It’s just a threat to all.

Lennart Heim: Yeah, I think so. I think in general, one good policy addition is that data centres should have certain security norms. It’s that simple. Certain security norms regarding physical access, and certain security norms regarding cyber access for these kinds of systems.

Rob Wiblin: And they have to add a red button.

Lennart Heim: Ideally the red button. We have to be a bit careful there about the design — again, it’s dual-use; maybe the wrong people push the red button, or the AI system could do it. But maybe that’s the thing we eventually want, where this red button is in this case favourable, because there are less copies or something. But yeah, this should be explored in more detail to which degree this is good, but having some failsafe switch for these kinds of systems — where you see, like, “Oh my god, this AI system is going haywire” — yeah, that’d be good.

Rob Wiblin: I guess one thing is just to turn off the system. Another thing that the red button could do — or I suppose, I don’t know, I guess it’s the yellow button next to the red button —

Lennart Heim: Let’s call it the yellow button, yeah.

Rob Wiblin: I think at the moment it probably takes quite a bit of work to basically start up an entire set of compute and to reset it basically back to factory settings, or reset it back to some known safe pre-infection setting and then turn it back on again. Maybe that needs to be much more automated, so that basically you press this yellow button and the whole thing goes through some process of clearing out everything and basically resetting something from a previous known safe state, because that reduces the cost to doing it because you’re not denying people access to their email.

Lennart Heim: Yeah, but eventually there’s still some cost. That’s the whole idea why we have these power supplies and these batteries in these data centres: if you run out of power, you never want your system to shut down because you just lose data, right? So first they run off the battery, and if they still don’t get power back by then, they turn on their generators to generate their own power. These things are right now optimised to just always stay up, have their uptime up.

We need some innovations there where we can try to think about these things in particular. A bunch of these things we’re now talking about are pretty speculative: like AI systems self-replicating, going from data centre to data centre, or even malwares. And I think this is a failsafe, but there are a bunch of interventions in between where we could just detect this. Monitoring is just the first idea: just knowing what’s going on in your data centres, what’s going in and what’s going out, and what is using your compute.

Information security, computer security, big thing. Please work on this. I think recently there was actually a career review published on this. So this seems to be a great thing to take a look at and work on this.

And there are more people trying to do this work at the labs. Once we’ve got the labs secure, we also need to secure people like me and others, because we also work with more and more confidential information. That seems important. Yeah, make the computers secure, make the labs secure, and also make the AI governance people secure.

Zvi Mowshowitz on the absurd lack of security mindset at some AI companies [00:48:22]

From #184 – Zvi Mowshowitz on sleeping on sleeper agents, and the biggest AI updates since ChatGPT

Rob Wiblin: What went badly, in your view, in 2023?

Zvi Mowshowitz: Meta’s statement that they’re going to literally build AGI and then just give it to everybody, open model weights, with no thought to what that might mean. I don’t think they mean it. Manifold Markets does not seem to think that they mean it, when I asked the question, but it’s a really bad thing to say out loud.

Mistral seeming to be able to produce halfway decent models with their attitude of “damn the torpedoes” is an L. The fact that their model got leaked against their will, even, is also an L. I mean, it’s kind of insane that happened this week. Their Mistral medium model, it seems, at least some form of it, got leaked out onto the internet when they didn’t intend this, and their response was, “An over-eager employee of an early access customer did this. Whoops.”

Think about the level of security mindset that you have for that statement to come out of the CEO’s mouth on Twitter. You’re just not taking this seriously. “Whoops. I guess we didn’t mean to open source that one for a few more weeks.” No, you are counting on every employee of every customer not to leak the model weights? You’re insane. Like, can we please think about this for five minutes and hire someone? God.

Sella Nevo on the weaknesses of air-gapped networks and the risks of USB devices [00:49:54]

From #195 – Sella Nevo on who’s trying to steal frontier AI models, and what they could do with them

Sella Nevo: There are quite a few different ways to still get into an air-gapped network. The first thing maybe that’s worth noting is that just because you’re not connected by a network doesn’t mean your system doesn’t interact with the rest of the world in other ways.

So usually, if you’re not connected to the internet via a network connection, let’s say an ethernet connection, you still need to communicate in some other ways. How do you get your security updates? In the AI context, how do you bring in training data? How do you take out the models that you’ve trained after you’ve trained them in such a secure system? Often the way you do that is through USB sticks. That’s the most common way that people interact with air-gapped networks.

So what an attacker could do is run malware that copies itself from a computer. It infects originally in a computer on the outside that is connected to the internet — maybe like one of the engineers that work with the system. That malware copies itself into the USB stick, and once you plug in the USB stick into the air-gapped network, it then copies itself into the computer, then does whatever it wants inside this air-gapped network. And then the next time they plug in a USB stick, that malware sends out the information through the USB sticks that it wants to exfiltrate — for example, the weights.

Luisa Rodriguez: That’s terrible! That seems like it really defeats the point of an air-gapped network. It just seems like a huge flaw. Is the situation better than that?

Sella Nevo: I mean, I’d give it a slightly less harsh review, because it does make things a lot harder. It is a pain in the ass to write malware that can do all of this. Maybe as opposed to having a continuous connection to the computer, you kind of are limited by these occasional interactions and things like that. So I think it’s definitely better to have an air-gapped network than to have your computer connected to the internet. But it’s definitely not impenetrable.

Luisa Rodriguez: OK. So it’s still an improvement, so it’s good and worth doing. But this is a vulnerability. Are there real-world examples of this kind of malware being installed through a USB?

Sella Nevo: Yeah, this has happened many times. One minor thing to say: the fact that you use a USB stick is not in and of itself an information security vulnerability — because allegedly, the fact that you use a USB stick doesn’t mean they should be able to run code. Allegedly, you should be able to look at the content of the USB, and code that you never intended to run is not supposed to be able to be run.

But again, as with everywhere else, there are vulnerabilities. USB sticks are actually a great candidate, because they’re a pretty complex protocol. And when you plug it in, sometimes a thing will pop up and it’ll ask you what do you want to happen when you plug this in? That is always an indication that some things are happening automatically. So that can potentially be abused. Again, in a perfect world, it wouldn’t be able to do that. But we have to be sufficiently paranoid because we’ve seen many times that it happened.

Let’s put highly secure air-gapped networks aside for a moment and just talk about getting a USB to connect to a network. It’s worth flagging that this is a really easy thing to do.

This happens all the time. It’s happened multiple times in multiple nuclear sites in the United States. So yeah, this is a pretty big deal.

Luisa Rodriguez: That’s unreal!

Sella Nevo: Now, I think that many people, like you, will find that surprising. I think security folks are kind of being like, “Well, no one would. Everyone knows, everyone in security knows that you shouldn’t plug in a USB stick.”

Luisa Rodriguez: Shouldn’t just pick up a USB stick. Yeah.

Sella Nevo: But let me challenge even those folks who think that this is obvious, and also in that way bring it back to the more secure networks we were talking about before.

So indeed organisations with serious security know not to plug in random USB sticks. But what about USB cables? So Luisa, let me ask you, actually: if you needed a USB cable, and you just saw one in the hallway or something, would you use it?

Luisa Rodriguez: 100% I would use that. Absolutely. Actually, I’m sure I’ve literally already done that.

Sella Nevo: So here’s an interesting fact, which I think even most security folks don’t know. You could actually buy a USB cable — not a USB stick, a USB cable — for $180 that is hiding a USB stick inside and can communicate wirelessly back home.

So once you stick that cable in, an attacker can now control your system from afar — not even in the mode that I mentioned before where you wait until the USB stick will be put in again: it just continuously can communicate with and control your system. I guarantee you that if you toss that cable into a tech organisation’s cables shelf, I guarantee it’ll be plugged in.

Luisa Rodriguez: Absolutely. Yeah. That’s really crazy. Has that been used in the real world?

Sella Nevo: I don’t know. There’s a company that’s selling them. I haven’t seen reports of when it’s been used, but presumably if it’s a product on the market, someone is buying it.

Bruce Schneier on why it’s bad to hook everything up to the internet [00:55:54]

From #64 – Bruce Schneier on how insecure electronic voting could break the United States — and surveillance without tyranny

Bruce Schneier: My hope is that we can have computers working alongside humans in some of these areas. A bunch of reasons why this’ll make a big difference. Attacks happen at computer speeds. Defence often happens at human speeds. That’s kind of not fair. The more that defence that can happen at computer speeds, the better off we’ll be.

Some aspects of computer security, like vulnerability finding, seem really ripe for mechanisation and you can have machine learning systems find vulnerabilities, which would do an enormous amount of good. Because a lot of our vulnerability stems from the fact that there are vulnerabilities in the software.

Because we’re terrible at writing secure code. We have no idea how to do it. And if computers can find vulnerabilities, it benefits the attacker and the defender, of course, but if you think about it, once you have this automatic system, you build it into the compilers and code generation tools and suddenly vulnerabilities are a thing of the past — and that’s actually possible in five, 10, 20 years and that would make a huge difference.

Rob Wiblin: Some people worry that we’re going to have a computer security apocalypse basically, because we’ll design ML algorithms that can find weaknesses and find computer bugs and security weaknesses incredibly quickly. But I guess you’re saying maybe in the short run that does look bad, because potentially someone with bad intentions will get that early on, but in the longer term that’s actually a more generalised solution — because if you can just run these algorithms against every piece of software and then patch all the bugs, then we end up in a better place.

Bruce Schneier: Right. And that’s where the defender wins here. The attacker finds a vulnerability —

Rob Wiblin: Because they can do it before they release it.

Bruce Schneier: Right. The defender finds it and fixes it and it no longer exists. So you’re right, you have this very bad intermediate time when the vulnerabilities are found in everything that exists today, and there you’re going to see systems that monitor these insecure systems that know those vulnerabilities because they found them too, watches them from being used, and destroys them in the network.

So you’ll see solutions like that, but then the endgame is vulnerabilities are a thing of the past. We could have this podcast 20 years from now and you can ask me, “Wow, remember 20 years ago when software vulnerabilities were a thing, wasn’t that a crazy time? It’s great that we’re past that!” And that’s not unreasonable.

Rob Wiblin: So I guess maybe this is the highest leverage opportunity if you’re someone who both has expertise in ML and in computer security, is trying to figure out how do we make defensive ML algorithms that can just, in a very general sense, go out and find weaknesses and figure out how to fix them?

Bruce Schneier: Yeah, and I think that’s really valuable and then also find weaknesses and unfixable things. We talk about the internet of things. Everything becoming a computer. One of the worries of this is that there’ll be a lot of vulnerable things lying around our environment for years and decades.

So if you think about your phone, your computer, it’s as secure as it is for two basic reasons. One, the team of engineers at Microsoft and Apple and Google have designed them securely. And two, those engineers who are constantly writing and pushing patches down to our devices when vulnerabilities are discovered. That ecosystem doesn’t exist in low-cost systems, like DVRs and home routers and toys.

Rob Wiblin: Lightbulbs.

Bruce Schneier: Right. And toasters that are designed offshore by third parties. They don’t have engineering teams. They often can’t be patched and they’re going to be around for decades. So this insecure toaster, 15 years from now, is still making toast and still sending spam or DDOS attacks or whatever because it’s horribly insecure.

And this is going to be a big problem. I mean, our phones and computers, we throw them away after a few years. Actually a car is a good example. You buy a car today. It’s two years old. You drive it for 10 years. You sell it. Somebody else buys it. They drive for 10 years. They sell it. Probably at that point it’s put on a boat, sent somewhere in the southern hemisphere where someone else buys it and drives it for another 10 to 20 years.

You find a computer from 1977. You turn it on. Try to make it secure. Try to make it run. We have no idea how to secure 40-year-old consumer software. Both Apple and Microsoft depreciate operating systems after like five to seven years because it’s hard to maintain the old stuff.

So we’re going to need systems that live in our network that kind of monitor all of this old cruft. The toy that someone bought in 2020 that was on the internet and now it’s 2040 and the thing is still on the internet, even though nobody’s played with it in a decade and a half because it somehow gets its power remotely. We can’ make this stuff up.

And this is going to be a security nightmare. We’re going to need some new technology to solve it. Now, there are people thinking about this, I mean, I didn’t just make this up. Again, ideas are easy, everyone thinks of everything all the time. But we really need to start thinking about how to deploy these. Do they go in the routers? Do they go in the backbone? Who’s liable? What are the regulatory mechanisms by which these things work?

Rob Wiblin: Yeah, the internet of things drives me a little bit crazy. I somewhat skipped over that because it’s covered pretty well in your book and we can link to talks where you’ve described all the issues there.

I mean, are there any particularly high-impact things that people can do? It seems like we’re just heading towards a worse and worse situation with so many little pieces of hardware being computerised, and they’re all going to end up insecure eventually, right? And often not getting patched?

Bruce Schneier: I can talk about two other things. I just talked about patching, and the way patching is going to fail in this world of low-cost, embedded, not-maintained old systems. So we need to solve that. I think that’s a really big problem that we need to figure out.

Second thing is authentication. Authentication kind of only ever just barely worked. And we got solutions. We have two-factor which is great if you can do it, and we often can’t back up systems we need and they’re often terrible.

But authenticating is going to explode in a new way. Right now, if you authenticate, you’re doing either one of two things. I have my phone in my hand and I put my fingerprint on the reader and then I pushed a button and checked my email. So there’s the authentication. It was me authenticating to a device and me authenticating to a remote service. Those are both me authenticating to something else.

What we’re going to see the rise of is thing-to-thing authentication. So the whole point of 5G is actually not for you to watch Netflix faster; it’s so things can talk to things without your intervention. And they’re going to have to authenticate.

So you think of all these smart city things or imagine either a driverless car or some kind of driver-assisted car. That car’s going to have to authenticate, in real time, ad hoc, to thousands of other cars and road signs and traffic signals and emergency alerts and everything.

And we don’t know how to do that. We don’t know how to authenticate thing-to-thing at that scale. We do it a little bit. I mean, right now, when I go to my car, my phone automatically authenticates to the car and uses the microphone and speakers. But if you think about it, that’s Bluetooth. That works because I was there to set it up. I set it up manually.

That’s not going to work, ad hoc, as I’m driving through a city. That’s not going to work if I have 100 different IOT devices at my home. Not going to pairwise-connect 5,000 connections. So, we don’t have an answer for that. I think that’s an area that we need a lot of good research.

The third is supply chain security. This is in the news a lot. Right now, it’s Huawei and 5G. Should we trust Chinese-made networking equipment? Two years ago, the problem was Kaspersky. In the US should we trust Russian-made antivirus programs?

Rob Wiblin: Yeah. Should we trust things that are shipped by USPS?

Bruce Schneier: Yeah, and that’s the point. I mean the question, “Can you trust a company that operates in a country you don’t trust?” is an obvious one. But all computer systems are deeply international. iPhones are American, but they’re not made in the US. The chips aren’t made in the US. Their programmers carry 100 different passports.

And you have to trust update mechanisms and distribution mechanisms. And you mentioned shipping mechanisms. And you mentioned that because you know of a very famous photograph of NSA employees opening a Cisco router that was destined for the Syrian telephone company. And supply chain is an insurmountably hard problem because we are very international in our industry. And subversion of that supply chain is so easy. I saw a paper that you can hack an iPhone through a malicious replacement screen.

So you have to trust every aspect of the supply chain, from the chips, to the shipping. And we can’t. And that’s something that is a very difficult problem. Some of that I think is an internet-like problem. The origins of the internet was a research solution to the problem: “Can I build a reliable network out of unreliable parts?” I’m asking a similar question: “Can I build a secure network system infrastructure out of insecure parts?” And that I think is a research question on par with the internet.

Nita Farahany on the possibility of hacking neural implants [01:04:47]

From #174 – Nita Farahany on the neurotechnology already being used to convict criminals and manipulate workers

Luisa Rodriguez: Thinking about other risks and things we should be worried about for these kind of military applications of this technology, one risk that comes to mind is the potential for hacking.

To the extent that you’d be kind of uploading a bunch of data from your brain — sending it out brain-to-brain, or just uploading it to physical machines — does that make brains in general more vulnerable to some kind of hacking by another state or a non-state actor?

Nita Farahany: Maybe. We’ve talked a little bit about how we’re not quite as there yet in writing to the brain as we are in reading the brain, but we are somewhat there in writing to the brain. I’ll answer this a little bit by analogy.

There was a patient who was suffering from really severe depression — to the point where she described herself as being terminally ill, like she was at the end of her life — and every different kind of treatment had failed for her.

Finally, she agreed with her physicians to have electrodes implanted into her brain, and those electrodes were able to trace the specific neuronal firing patterns in her brain when she was experiencing the most severe symptoms of depression. And then were able to, after tracing those, every time that you would have activation of those signals, basically interrupt those signals. So think of it like a pacemaker but for the brain: when a signal goes wrong, it would override it and put that new signal in.

And that meant that she now actually has a typical range of emotions, she has been able to overcome depression, she now lives a life worth living.

That’s a great story, right? That’s a happy story and a happy application of this technology. But that means we’re down to the point where you could trace specific neuronal firing patterns, at least with implanted electrodes, and then interrupt and disrupt those patterns.

Can we do the same for other kinds of thoughts? Could it be that one day we get to the point where if you’re wearing EEG headsets that also have the capacity for neurostimulation, that you could pick up specific patterns of thoughts and disrupt those specific patterns of thoughts if they’re hacked? If your device is hacked, for example. Maybe.

I mean, we’re now sort of imagining a science fiction world where this is happening. But that’s how I would imagine it would first happen: that you could have either first very general stimulation — like I experienced at the Royal Society meeting, where suddenly I’m experiencing vertigo — and somebody could hack your device. Like, I’m wearing this headset for meditation, but it’s hacked, and suddenly I’m experiencing vertigo and I’m disabled.

You know, devices get hacked. We can imagine devices getting hacked — and especially ones that have neurostimulation capacity, they could be hacked either in really specific patterns or they could be hacked in ways that generally could just take a person out.

Luisa Rodriguez: Well, that is incredibly horrifying.

Nita Farahany: So do I worry about that? Yes, I worry about it. I’ve been talking with a lot of neurotech companies about how there’s not a lot of investment in cybersecurity that’s been happening in this space.

When you start to imagine a world in which not only could the information like what you’re thinking and feeling be hacked — so from a privacy concern — but if you’re wearing a neurostimulation device, can the device be hacked to create kinds of stimulation that would be harmful to the person? Maybe so. It seems like a really basic and fundamental requirement for these technologies should be to have really good cybersecurity measures that are implemented.

I do a lot on the ethics of neurotechnology, and I am far more concerned from an ethical perspective about wide-scale, consumer-based neurotechnology than I am about implanted neurotechnology. And the reasons that’s true are both a very different risk-benefit calculus for the people who are currently part of the population who would receive implanted neurotechnology, but also because it’s happening in a really tightly regulated space — as opposed to consumer technology, where there’s almost no regulations and it’s just the wild west.

But in the dystopian world — and with all of those caveats, which I think are really important — I think it’s still possible, without really good cybersecurity measures, that there’s a backdoor into the chips: that some bad actor could gain access to implanted electrodes in a person’s brain.

And if they’re both read and write devices — not just interrupting a person’s mental privacy, but have the capacity of stimulating the brain and changing how a person behaves — there’s no way we would really even know that’s happening, right? When something is sort of invisibly happening in a person’s brain that changes their behaviour, how do you have any idea that that’s happening because somebody is hacked into their device versus that’s coming from their will or their intentionality?

We have to understand people’s relationship to their technology, and we have to be able to somehow observe that something has happened to this person, which would lead us to be able to investigate that something has happened to their device and somebody has gained access to it or interference with it or something like that.

You know, we’re dealing with such small, tiny patient populations. It’s not like the president of the United States has implanted neurotechnology, where some foreign actor is going to say it’s worth it to hack into their device and turn them into the Manchurian candidate. But in the imagined sci-fi world of what could go wrong: what could go wrong if this goes to scale, and if Elon Musk really does get a brain-computer interface device into every one of our brains, is that we’d have almost no idea that the person had been hacked, and that their behaviour is not their own.

Vitalik Buterin on how cybersecurity is the key to defence-dominant futures [01:10:48]

From #194 – Vitalik Buterin on defensive acceleration and how to regulate AI when you fear government

Rob Wiblin: I think you’ve been quite excited about this idea: people are worried that AI could be very bad for cybersecurity, but it also seems like if you have extremely good AI that’s at the frontier of figuring out how to break things, if it’s in the hands of good people and they share the lessons with people so they can patch their systems first, then potentially it could massively improve things. And currently stuff in the crypto world that we’re unsure whether it’s safe, we could get a lot more confidence in.

Vitalik Buterin: Exactly, yeah. The way that I think about this is if you extrapolate that space to infinity, then this is actually one of those places where it becomes very defence-favouring, right?

Because imagine a world where there are open source, infinitely capable bug finders: if you have a code with a bug, they’ll find it. Then what’s going to happen? The good guys have it and the bad guys have it. So what’s the result? Basically, every single software developer is going to put the magic bug finder into their GitHub continuous integration pipeline. And so by the time your code even hits the public internet, it’ll just automatically have all of the bugs detected and possibly fixed by the AI. So the endgame actually is bug-free code, very plausibly.

That’s obviously a future that feels very far away right now. But as we know, with AI, going from no capability to superhuman capability can happen within half a decade. So that’s potentially one of those things that’s very exciting. It definitely is something that in Ethereum we care about a lot.

Rob Wiblin: That reminds me of something I’ve been mulling over, which is that very often the question comes with some branch of technology — in this case AI, but we could think about lots of other things — is it offence-favouring or defence-favouring?

And it can be quite hard to predict ahead of time. With some things, maybe horse archery, historically, maybe you could have guessed ahead of time that that was going to be offence-favouring and going to be very destabilising to the steppes of Asia. But with AI, it’s kind of a difficult thing to answer.

But one idea that I had was, when it comes to machine-versus-machine interactions, like with cybersecurity, seems like it might well be defence-favouring, or at least neutral, because any weakness that you can identify, you can equally patch it almost immediately — because the machines that are finding the problems are kind of the same being; they’re the same structure as the thing that is being attacked, and the thing that’s being attacked you can change almost arbitrarily in order to fix the weakness.

When it comes to machine-versus-human interactions, though, the dynamic is quite different, in that we’re kind of stuck with humans as we are. We’re this legacy piece of technology that we’ve inherited from evolution. And if a machine finds a bug in humans that it can exploit in order to kill us or affect us, you can’t just go in and change the code; you can’t just go and change our genetics and fix everyone in order to patch it. We’re stuck doing this really laborious indirect thing, like using mRNA vaccines to try to get our immune system that’s already there to hopefully fight off something. But you could potentially find ways that that would not work — you know, diseases that the immune system wouldn’t be able to respond to. I guess HIV has to some extent had that.

What do you think of this idea? That machine-versus-machine may be neutral or defence-favouring, but machine-versus-humans, because we just can’t change humans arbitrarily and we don’t even understand how they work, is potentially offence-favouring?

Vitalik Buterin: Actually, I think a big part of the answer to this is something I wrote about in the post, which is that we need to get to a world where humans have machines protecting us as part of the interface that we use to access the world.

And this is actually something that’s really starting to happen more and more in crypto. Basically, wallets started off as being this very dumb technology that’s just there to manage your private key and follow a standardised API. You want to sign it, then you sign it.

But if you look at modern crypto wallets, there’s a lot of sophisticated stuff that’s going on in that. So far it’s not using LLMs or any of the super fancy stuff, but it’s still pretty sophisticated stuff to try to actually identify what things you might be doing that are potentially dangerous, or that might potentially go against your intent and really do a serious job of warning you.

In MetaMask, it has a list of known scam websites, and if you try to go and access one of them, it blocks it and shows a big red scam warning. In Rabby, which is an Ethereum wallet developed by this lovely team in Singapore that I’ve been using recently, they really go the extra mile. If you’re sending money to an address you haven’t interacted with, they show a warning for that; if you’re interacting with an application that most other people have not interacted with, it shows a warning for that. It also shows you the results of simulating transactions, so you get to see what the expected consequences of a transaction are. It just shows you a bunch of different things, and tries to put in speed bumps before doing any actually dangerous stuff. And there definitely were some recent scam attempts that actually Rabby successfully managed to catch and prevent people from falling for.

So the next frontier of that, I think, is definitely to have AI-assisted bots and AI-assisted software actually being part of people’s windows to the internet and protecting them against all of these adversarial actors.

I think one of the challenges there is that we need to have a category of actors that is incentivised to actually do that for people. Because the application’s not going to do that for you. The application’s interest is not to protect you: the application’s interest is to find ways to exploit you. But if there can be a category of actor where their entire business model actually is dependent on long-term satisfaction from users, then they could actually be the equivalent of a defence lawyer, and actually fight for you and actually be willing to be adversarial against the stuff that you access.

Rob Wiblin: I guess that makes sense in the information space, where you can imagine you interact with your AI assistant, that then does all of the information filtering and interaction with the rest of the world.

The thing I was more worried about was bioweapons or biodefence, where one of the big concerns people have about AI is, couldn’t it be used to help design extremely dangerous pathogens? And there, it seems harder to patch human beings in order to defend them against that.

Although the weakness of that argument is we were just saying we’re on the cusp of coming up with very generic technologies like air purification that we could install everywhere. That seemed like they would do a lot to defend us against the diseases, at least that we’re familiar with. So maybe there are some generic things that even AI could possibly help, technologies that AI could help advance, then that still would be defence-dominant. I don’t know what the underlying reason would be, but maybe.

Vitalik Buterin: This is one of those things where I think both offence and defence have these big step functions, right?

Where one question is, what is the level of capability of printing a super plague? Are you just making minor modifications and fine-tuning COVID the same way the Hugging Face people are fine-tuning Llama? Or are you actually really doing serious shit that goes way beyond that?

And if, for example, you’re just fine-tuning COVID, then wastewater detection becomes much easier, because the wastewater detectors are already tuned to COVID as a sequence. But if you have to defend against arbitrary dangerous plagues, it’s actually a significantly harder problem.

Then for vaccines, it’s similar. And then for the level of dangerousness, that’s similar. And then step function is like if you actually can make all of this air purification infrastructure much more powerful, then R0s go way down, and you actually get possibly some kind of upper limit there.

But then the other step function on the offence side is, what if you go beyond biological diseases and you figure out crazy nanotechnology, how do you start defending against that?

I think trying to push defensive technologies forward is something that’s really important and it can even have positive knock-on effects. One example of this is if we fix cybersecurity, then we kneecap the entire class of superintelligence doom scenarios that involve the AI hacking things.

Rob Wiblin: Yeah. A lot of people are working on that, and I think it’s among the most important things that anyone is doing at the moment.

Vitalik Buterin: Right, exactly.

Nova DasSarma on exciting progress in information security [01:19:28]

From #132 – Nova DasSarma on why information security may be critical to the safe development of AI systems

Rob Wiblin: Do you have any view on whether ML itself is making security better or worse?

Nova DasSarma: Hard question. I think that probably it’s going to make things better. The reason I believe this is I think that a lot of the work that’s being done at places like Apple on security is also driven by ML. I know I’ve specifically called out Apple a couple of times here. There are other things that I like — for example, the Google Security Blog is fantastic to read.

But I think that security researchers getting access to more sophisticated models is going to be quite positive. I also think that there’s a possibility that we can use ML models to drive formal verification forward, and to drive adoption of programming practices that are more defensive, that are harder to break into, with ML.

It is definitely a Spy vs. Spy sort of scenario, where the technology is very much dual use. And it’s certainly going to have some growing pains as these things become more capable, but I’m hopeful that that’s going to result in a more secure — while still being usable — future.

Rob Wiblin: To what extent would computer security benefit from just more money being spent on it by the relevant actors?

I ask this because I think I remember reading about someone who got paid a bug bounty for finding this horrific flaw. I can’t remember what it was in — MacOS or the iPhone or something like that. But plausibly, they could have sold this for tens of millions, hundreds millions of dollars, because of the power of this exploit. I think they got a million dollars or something from Apple for going through their bug bounty programme.

Nova DasSarma: That’s pretty good. I see thousands of dollars often, or hundreds.

Rob Wiblin: Right, yeah. So I think this was one of the largest, but it was nevertheless much less than you imagine they might be able to sell it to criminals for, or to state actors for. I suppose most people would rather sell the thing to Apple, all else equal, and they’d probably rather not be on the run from the feds.

But would it maybe just help to throw more money basically at getting people who are kind of on the fence between being good actors and bad actors to be good actors?

Nova DasSarma: That’s a great question. Honestly I’m not sure, but my guess is yes. It definitely seems like something where we’ve seen bug bounties show up more often. We’ve seen security researchers get paid better. And I think we’ve seen an increase in the resulting security and a decrease in the prevalence of people trying to passively break into systems for money instead of getting to work on a team of people who are all trying to break into things for money and not going to jail.

Rob Wiblin: OK. Well, I started out this conversation feeling some degree of despair, because it sounded like this was borderline futile trying to prevent this information from getting out. But it seems like there are people who are making some serious effort, and hopefully if we have years or decades to try to improve what the state of the art is, maybe we’re not completely screwed, that any ML model is going to be stolen pretty quick smart.

Nova DasSarma: I think there’s some pretty exciting progress. I think that right now, if we were to produce AGI in our kitchen, it would be stolen. But if you don’t have that case and the timelines are longer, then you have time to do things like build your own formally verified systems that are very robust to these sorts of attacks.

Rob Wiblin: Can you explain what formal verification is?

Nova DasSarma: Formal verification is a logical technique for talking about the execution of an algorithm. You would use something like this to describe the behaviour that you want in a system, and then basically create a very, very long, complicated proof that proves that there are no vulnerabilities or unexpected behaviours or undefined behaviours in the system.

If you look at something like, for example, the C language — which is a systems language that is used for producing things like the Linux kernel and other really important pieces of software — there are certain kinds of operations that you can do that are just simply not defined in the specification for it.

If you look at almost any piece of software, there are ways you can give it sufficiently weird, sufficiently well-crafted inputs that will cause it to crash, that will cause it to override pieces of memory that you didn’t expect it to. Formal verification is a technique for preventing those things going into software in the first place.

Of course, it’s quite difficult. You don’t see very many large pieces of software produced this way. But we’ve seen some pretty interesting examples of things like microkernels that come out of mostly academic labs that are interested in this sort of thing — where you’ve got a kernel that has some very limited functionality, but is completely proven.

Rob Wiblin: Interesting. So this is something that we kind of know how to do, but it’s quite challenging. And I guess the more complicated the program you are making, the harder it is to formally prove that there’s no circumstance under which it might do X. That there’s no input that can produce X as an output.

Nova DasSarma: Yes. If you think about the complications inside of a program, you have a lot of cases where you’re multiplying and putting exponents on the complication of the kinds of inputs that might come into the system. A good technique for making your program able to be formally verified is splitting up the input in ways where you can formally verify parts, and then talk about things at that level of abstraction. So you can say that these parts interact in a particular way.

Rob Wiblin: Is this a kind of new technology? I remember hearing people bring up formal verification years ago, and I think they were talking about it with the sense that this was something that didn’t really work at the time, but might get better in the future. Is it an area of research where we’re improving?

Nova DasSarma: It definitely seems like we’re improving. I think that programming language design in general has come a really long way since the dawn of computing. So I definitely think that there’s progress there, and it’s a pretty exciting field to be working in. That being said, I think we will have to have some really significant advances in programming language theory to have formal verification become a very accepted part of the toolbelt.

Rob Wiblin: Yeah. I was saying I was optimistic because it sounded like there were some people who were succeeding partially when it came to information security. But I guess we should only be optimistic if we think that on balance, information security or computer security is improving over time, rather than staying static or getting worse. Would you say we’re getting better, or at least maybe building up the capability to in future become better at securing information?

Nova DasSarma: Yes. I think I can confidently say that things have gotten better over time. Things have gotten much more complicated and obviously the surface has increased, but as that surface has increased, you’ve also gotten the incentives in play for people to secure things.

Moving commerce to ecommerce, having online systems that are handling large amounts of money, has been a fantastic motivator in getting people to think about how they can write software in a way that doesn’t get you broken into.

When I was in middle school, we had some software on the Macs in the computer lab that didn’t let me run NetHack — which is like a text adventure game — and I was pretty upset about it. So at the time it was extremely trivial for me at like 12 to boot this thing into a recovery mode where it didn’t check for a password, and then you could change the password in the password file because it wasn’t very well encrypted and things like that.

And that just wouldn’t happen today. If you look at something like the MacBook M1 chip, a lot of the passwords are secured in a hardware chip that is specifically designed to resist this sort of thing, as opposed to on a file on disk.

Now, that’s not true of everything. For example, in the Okta hack recently, there was an Excel file that contained the password for a LastPass account — which is certainly something you don’t want to see as somebody on the security side of things.

So obviously the last system to secure is humanity. You can have as secure a system as you want, but if you put a sticky note on the outside of your laptop that has your password, which is going to be visible to somebody who is on a Zoom meeting with you, then perhaps your security is not so good after all.

Rob Wiblin: Yeah. I suppose you could have two-factor authentication. Use the keys that you have to physically have in order to secure something, so the password isn’t so central.

Nova DasSarma: For sure. Yeah. I think multifactor authentication is very, very exciting, actually. Something where you can both prove who somebody is and what somebody has is pretty important.

We’ve seen some advances in this. For example, now there are more places that might require two-factor authentication, that might require two-factor authentication that isn’t through SMS, and might even support something like FIDO2, which is a protocol where not only is the key that you’re producing out of this hardware device specific to that device and that user, but it’s also specific to the site that you’re authenticating with as well.

Rob Wiblin: Yeah. I guess Anthropic is a reasonably new organisation, it’s a year or two old. Is there anything important that you did in setting up the way the systems work that secure it to some degree?

Nova DasSarma: Sure. I think that having corporate devices is pretty important. Another thing to think about is we used to talk about trusted networks and having things like a corporate network.

And Google’s done some pretty good work with things like BeyondCorp, where you don’t really think about a VPN or something like that for security — you instead think about identity-based authentication. There’s no such thing as a “trusted network” where when you get on it, you no longer have to authenticate to grab files off of a shared SharePoint drive or something like that. You’re always authenticating at every step.

The other thing that we do — that I suggest to every organisation — is to think about single sign-on to make sure that you don’t have your users managing passwords for many services, juggling passwords around, where it can get very tedious for them to use a different password for every service. Using things like a password manager and single sign-on can help mitigate some of those flaws.

Rob Wiblin: Yeah. Anthropic has appeared during the COVID era, which I imagine means that you’ve probably been working remotely, at least in part. Do you think as security becomes an even bigger concern, as hopefully your models become more capable of doing important things, is there a chance that you’ll basically have to stop being remote or have some kind of physical restriction on access to models or data in order to ensure that they are sufficiently secure?

Nova DasSarma: I would not be surprised if that’s something that happens in the future. That being said, I think that we’ve actually had some advantages in terms of starting out with this remote policy. You can’t have a trusted network that everybody’s on if everybody’s on their own network. So it’s been important in sort of driving forward the identity-based authentication policies there.

I agree that I think physical security is going to be quite important in the future, and there are sorts of mitigations that you can’t really express remotely, but I think that’s in our future timeline.

Nathan Labenz on how even internal teams at AI companies may not know what they’re building [01:30:47]

From #176 – Nathan Labenz on the final push for AGI, understanding OpenAI’s leadership drama, and red-teaming frontier models

Nathan Labenz: And this is one thing too that is really interesting about the Anthropic approach. I don’t know a lot about this, but my sense is that the knowledge sharing at OpenAI is pretty high. They’re very tight about sharing stuff outside the company, but I think inside the company people broadly have a pretty good idea of what’s going on.

At Anthropic, I have the sense that they have a highly collaborative culture, people speak very well about working there and all that, but they do have a policy of certain very sensitive things being need-to-know only.

And this realisation that we’re getting to the point where the fog may be lifting and it’s possible now to start to squint and kind of see specific forms of AGI has me a little bit questioning that need-to-know policy within one of the leading companies. Because on the one hand, it’s an anti-proliferation measure; I think that’s how they’ve conceived of it. They don’t want their stuff to leak, and they know that it’s inevitable that they’re going to have an agent of the Chinese government work for them at some point.

Rob Wiblin: At some point?

Nathan Labenz: Maybe already. But if not already, then certainly at some point. And so they’re trying to harden their own defences, so that even if they have a spy internally, then that would still not be enough for certain things to end up making their way to the Chinese intelligence service or whatever. And obviously that’s a very worthwhile consideration, both for straightforward commercial reasons for them as well as broader security reasons.

But then at the same time, you do have the problem that if only a few people know the most critical details of certain training techniques or whatever, then not very many people — even internally at the company that’s building it — maybe have enough of a picture to really do the questioning of, “What is it that we are exactly going to be building, and is it what we want?” And I think that question is definitely one that we really do want to continue to ask.

I don’t know enough about what’s been implemented at Anthropic to say that this is definitely a problem or not, but it’s just been a new thought that I’ve had recently: that if the team is the check, that is really going to matter. If we can’t really rely on these protocols to hold up under intense global pressure, but the team can walk, then there could be some weirdness if you haven’t even shared the information with most of the team internally. So they’ve got a lot of considerations to try to balance there and I hope they at least factor that one in.

Allan Dafoe on backdooring your own AI to prevent someone else from stealing it [01:33:51]

From #212 – Allan Dafoe on why technology is unstoppable & how to shape AI development anyway

Rob Wiblin: This is slightly out of place, but I want to mention this crazy idea that you could use the possibility of having backdoored a model to make it extremely undesirable to steal a model from someone else and apply it. You can imagine the US might say, “We’ve backdoored the models that we’re using in our national security infrastructure, so that if they detect that they’re being operated by a different country, then they’re going to completely flip out and behave incredibly differently, and there’ll be almost no way to detect that.”

I think it’s a bad situation in general, but this could be one way that you could make it more difficult to hack and take advantage, or just steal at the last minute the model of a different group.

Allan Dafoe: Yeah. Arguably, this very notion of backdooring one’s own models as an antitheft device, just the very idea could deter model theft. It makes a model a lot less useful once you’ve stolen it if you think it might have this “call home” feature or “behave contrary to the thief’s intentions” feature.

Another interesting property of this backdoor dynamic is it actually provides an incentive for a would-be thief to invest in alignment technology. Because if you’re going to steal a model, you want to make sure you can detect if it has this backdoor in it. And then for the antitheft purposes, if you want to build an antitheft backdoor, you again want to invest in alignment technology so you can make sure your antitheft backdoor will survive the current state of the art in alignment.

Rob Wiblin: A virtuous cycle.

Allan Dafoe: So maybe this is a good direction for the world to go, because it, as a byproduct, incentivises alignment research. I think there could be undesirable effects if it leads models to have these kinds of highly sensitive architectures to subtle aspects of the model. Or maybe it even makes models more prone to very subtle forms of deception. So yeah, more research is needed before investing.

Rob Wiblin: Sounds a little bit like dancing on a knife edge.

Tom Davidson on how dangerous “secret loyalties” in AI models could get [01:35:57]

Full episode to be released soon — enjoy the sneak peek!

Tom Davidson: So it seems to me that it may well be technically feasible to create AI systems that appear, when you interact with them, to have the broad interests of society in mind, to respect the rule of law — but actually secretly are loyal to one person.

This is what I call the problem of “secret loyalties”: if there was someone who was powerful in an AI project and they did want to ultimately seize power, it seems like one thing that they could try to do is actually make it so that all superhuman AI that is ever created is actually secretly loyal to them. And then, as it’s deployed throughout the economy, as it’s deployed in the government, as it’s deployed in the military, as it’s deployed talking with people every day — advising them on their work, advising them on what they should do — it’s constantly looking for opportunities to secretly represent the interests of that one person and seize power.

And so between them — this possibility of secret loyalties and the possibility of using this vast amount of intellectual labour for the purposes of seizing power — it does seem to me scarily technologically feasible that you could have a tiny group or just one person successfully seizing power.

Rob Wiblin: Just to clarify why it is that you could end up with all of these AIs, these many different AI models through society, all being loyal to one person: the poison begins at the very beginning. Because it’s the AIs that are doing the AI research and figuring out how to train these other models, if the first one that is responsible for that kind of AI research is loyal to one individual, and that individual instructs them to make sure that all subsequent models also maintain this secret loyalty, then it kind of just continues on indefinitely from there.

Tom Davidson: Exactly. So initially, probably it’s just the AI-research AIs that are secretly loyal, just the AIs that are operating within the lab. But then later, they’ll be making other AIs — maybe they’ll be making AIs to control military equipment, maybe they’ll be making specialised AI for controlling robots — and they could, as you say, place those secret loyalties in all those other AI systems, or place backdoors in those other AI systems so that ultimately that one person is able to maintain effective control over this broad infrastructure of AIs and robots.

Rob Wiblin: This issue of secret loyalties might sound a little bit sci-fi to people or a little bit strange, but I guess we should say it’s a very live problem. It kind of is the case that, with current models, with our current level of ability to look inside and understand what they’re doing, it’s possible to give them secret agendas that they don’t reveal, except at the time when that agenda is kind of called on. And even if you inspect the model, you can’t figure out that there is such a secret loyalty there. Is there more to it?

Tom Davidson: I’d say today the systems aren’t clever enough for this to be very sophisticated. It is true that there’s this “Sleeper agents” paper from Anthropic, where one example is they have an AI system that writes secure code when it can see it’s 2023, but it then inserts vulnerabilities when it’s 2024. And that’s a loose analogue to an AI system that mostly seems to act in the interests of everyone, but when it sees there’s a chance, it will advance one person’s interests because of the secret loyalty.

But the truth is that they’re not able to very sophisticatedly execute on that kind of strategy. So if you’re playing around with the system and you could give it a million different inputs and see how it reacts, you would probably be able to detect that this is an AI system that’s got some kind of secret hidden agenda. So it’s not, I would say, a risk today that there could actually be secret loyalties we’re not aware of.

But we’re looking forward to much more powerful systems. There’s an example with human spies who are able to live in a different country, work in a different organisation, and consistently fool all of their colleagues about their true intentions while taking rare opportunities to defect. So that’s an example of the kind of thing we could see in the future.

And I expect that AI systems will be much more intelligent than humans eventually, so that’s kind of a minimum threshold for how sophisticated the strategies could be that these secretly loyal AIs could execute.

Rob Wiblin: Yeah. My understanding from the sleeper agents paper was that the key issue is that if you don’t know what the trigger is for the loyalty to be activated, then it’s very hard to observe it in any of your evaluations or testing of the model.

So if you know what the trigger is and you can kind of trick it into being activated, then you can tell that it’s there. And I guess people are trying to work on other methods that would allow you to scrutinise the models and detect other loyalties or other abnormal behaviour that’s been implanted in it in response to particular triggers that you might not know. But we’re kind of not there yet.

Tom Davidson: That’s right. There’s a bit of a cat-and-mouse game. So there are methods for detecting at least simple triggers, where you can look for what inputs would trigger certain problematic behaviours, and there’s a bit of a cat-and-mouse game. But yeah, I agree that broadly, there’s not any super robust ways of detecting all the different triggers that could exist.

One thing may be worth mentioning here is an analogy with misalignment, where people worry about schemers — that is, AIs that are pretending to be aligned with us, but are secretly pursuing their own agendas.

One thing that’s worth mentioning is that I think that these secret loyalties are strictly more plausible, technically speaking, than scheming AIs. Because with scheming AIs, the worry is that you just train it, and then without you intending to do so, the AI just ends up with this kind of secret agenda that it manages to conceal from you. Here we’re actually imagining that there’s potentially a significant technological effort to install that behaviour and make it so that no one can detect it when they do testing.

So I would expect that if there’s any risk of scheming being possible, then there would be a very large risk that it would be possible to do very sophisticated secret loyalties.

…

Rob Wiblin: Are there any interventions that we can use to focus on the secret loyalties problem in particular?

Tom Davidson: Yeah, there’s two that I have in mind.

The first is inspections of the model to make sure that there aren’t secret loyalties. I think this is a very early-stage technical research problem, but essentially, in principle, there’s various types of technical checks you could do to see if there is a chance that secret loyalties are present in this trained model.

The most basic is probably behavioural testing of the model after it’s been trained: give it loads of different inputs and see if you can, under any scenario, kind of trick it into revealing its secret loyalty.

Another one would be looking at the data and the algorithms used to train it, and looking for patterns of data there that could encode some kind of secret loyalty. As we alluded to earlier, this is tricky, because there’s loads of different code words or triggers that you could use to elicit the undesirable behaviour. So that is potentially challenging.

But on the other hand, if you have what’s called “white box access” to the model, you can search for inputs that would produce particular types of outputs, or you can do interpretability in terms of the model weights, and there might be more hope there for actually detecting secret loyalties.

And maybe the most promising is actually monitoring and then inspecting the process by which the algorithms and data inputs were gathered together in the first place. So if there’s a very structured, step-by-step process by which the training data is procedurally generated by code, and you can inspect that code and you can inspect how that code was written step-by-step, then you might just see that each of these steps is so reasonable and so standard that there’s no point at which someone has been able to introduce a particular effort to subvert the training process; there’s no point at which someone has been able to throw in extra data that would systematically produce a sophisticated secret loyalty. So that could potentially give us some level of assurance.

Rob Wiblin: In my own mind, should I be imagining that the secret loyalty is introducing this kind of ambient bias in the advice and the actions of all of these AIs across society as they’re embedded into decision making and businesses and the military and so on? Or rather that they all act completely normal for some period of time until they kind of flip completely, because they’re going to strike and enable a coup by the group to which they’re secretly loyal? Seems like there are quite different pictures of what this might look like.

Tom Davidson: I think it could be either. Probably the most worrying version is where the model is just secretly aware on some level that it’s trying to advance the interests of one person and then picks whatever strategy it judges will be best.

So maybe in some scenarios, like the autocratisation scenario, it’s speaking to lots of politicians and advising them, and it does kind of bias its advice on a daily basis in order to promote one person’s interests. But then when it’s being deployed in the military, the strategic situation is very different, so maybe in that context it more makes sense, as you say, to completely behave until the one moment where you start executing a coup.

And if we’re talking about truly superhuman intelligent AI systems, then there could be a risk that they can do both of these.

Rob Wiblin: Yeah, they can figure out which one is best depending on the circumstances.

So it seems like there’s broadly two different approaches that one might take to heading this off early on. One is advancing the state of interpretability or our ability to scrutinise models after the fact, and then see if they have any secret loyalty. Trying to detect that better than we currently can.

The other I think of as more access control or looking at the training data that produced a given model. And even if you couldn’t tell after the fact whether a model does have a secret loyalty, if you could literally look at all of the reinforcement, all of the materials that were fed into it to begin with, presumably, in order to create a secret loyalty, you would have to actually reinforce it at some point for going along with that.

Tom Davidson: Yeah, that’s right. I’d add that we could do an extra level of check here, where we could have a team within the organisation tasked with actually trying to produce a secret loyalty and trying to sneak it past the inspection process. That team could use the most sophisticated techniques possible to try and insert a secret loyalty, and then if they’re able to sneak it past the inspector, then that would give us reason for alarm. But if consistently and robustly the inspection process is able to detect this, then actually that might give us some real confidence.

I should back up and say that everything we’ve been discussing is really falling for me within this inspecting the model behaviour, inspecting the inputs to the model behaviour. There’s a whole other class of interventions that are needed for secret loyalties on the internal infosecurity side. So even if you’ve got the most amazing inspection process and you absolutely guarantee that the model you trained has no secret loyalties, there’s then this more basic worry that someone just hacks the computer system and then is able to sub out your perfectly safe system and then sub in another system that they’ve trained secretly off to the side which does have a secret loyalty.

So to robustly prevent secret loyalties, we need to be kind of reliably willing on two fronts: firstly, the technical assessment of the AI system that was trained; and then secondly, on the internal infosecurity — making sure that no insider threat is able to compromise the computer system in a way that allows them to alter the model that’s being served.

Rob Wiblin: I think there’s three fronts, maybe more of them? So you’ve got to make sure that the model that’s being deployed hasn’t had a secret loyalty put into it. You’ve got to ensure that nobody can get access to the purely helpful model, where all you have to do is just give it the instruction and it’s going to go out and do the thing. You also have to make sure that there’s no flipping, no switching of the model. I don’t know how difficult or easy that is, but someone could introduce a secret loyalty into a different model and then make that the one that’s deployed, and then make it difficult to pick that up.

Tom Davidson: That’s right. I mean, if you really do very good inspection of the trained model and very good internal infosecurity, then even if someone gets access to a helpful-only model, you might say that your other two defences should catch that — because at some point, that helpful-only model will have to either compromise the internal infosecurity to do this swapping in/swapping out, or it will have to insert a secret loyalty into another model which is being trained, which then maybe you could spot when that happens.

In principle, yeah, you definitely do want to lock down the helpful-only model as well, because that would just be a big risk factor. But in principle, if you get the inspection and the infosecurity right, then that should defend.

Rob Wiblin: Do you have any sense of how close the companies are to having any of these things implemented?

Tom Davidson: My impression is that things are at a very early stage in terms of having a robust technical process for inspecting for secret loyalties. We’re only just recently seeing the sleeper agents paper from Anthropic, and it’d be a very natural extension of that to then look into techniques for detecting sleeper agents and beginning this kind of cat-and-mouse game about whether they can then find new techniques to hide more subtle sleeper agents.

On the infosecurity side, my understanding is that these organisations are not where they want to be in terms of preventing their model weights being exfiltrated. And my guess is that, to the extent they are really focused on infosecurity, they’re mostly focused on stopping that from happening. They may be a lot less focused, or not at all focused, on the internal infosecurity that would allow this sabotage of the training process or the swapping in/swapping out more at the end. So there might be very little work on that happening, or none.

Rob Wiblin: Yeah. I think all of these companies ultimately started as technology companies, and probably their information security is very good by the standards of a tech startup, which I guess is appropriate given what they’re doing.

But I think it’s not normal for a tech company to think that it’s absolutely essential that we have all of these internal controls on what our own staff, our own researchers, our own CEO could plausibly do. That’s quite an unusual circumstance that maybe you might see more in banking or in the military, but not so much normally in a tech company. So it’s quite a shift of frame, and probably requires quite a lot of stuff that is not standard in their industry.

Tom Davidson: That’s right. Although it’s interesting, because these companies are increasingly being quite explicit about the fact that they expect to develop superhuman capabilities in the next few years. So there really should increasingly be a realisation that the infosecurity needs to improve.

Rob Wiblin: When I spoke with Carl Shulman about issues in this general direction, he pointed out that, inasmuch as any country wants to deploy AI in the military in order to remain competitive, it’s extremely important from their point of view that they be able to detect secret loyalties, and that they’d be able to detect any abnormal behaviour that might be possible to trigger — because that could just be completely catastrophic from their point of view.

Especially, you could even imagine that they might have been inserted by a foreign military that was trying to introduce some codeware that they could use to deactivate the other side’s military. That’s a scenario that’s so unacceptable that you wouldn’t accept even a low probability of it. So you really need to be able to inspect the training data, make sure that that kind of thing is not the case.

Is it possible to get big government grants, big military grants? This seems like the sort of agenda that DARPA might be able to fund, or IARPA perhaps. Because it is just so important to the sorts of applications that governments might want to make of AI.

Tom Davidson: Yeah, it’s a great point. I think there should and will be very wide interest in this problem. There’s an existing research field into backdooring AI systems and lots of papers written about different techniques for introducing backdoors and detecting them. So that’s a research field which I imagine could absorb more funding and would be relevant to this.

Then recently, as I said, Anthropic had the sleeper agent problem. And that’s getting into more sophisticated types of backdoors, where the AI is purposefully being deceptive the whole time and choosing when it should or should not reveal its hand. And it does seem like there’s room for a lot of research to be done into different techniques here, different ways of detecting it. It does seem like something that could potentially be scaled up a lot.

Carl Shulman on whether we should be worried about backdoors as governments adopt AI technology [01:52:45]

From #191 – Carl Shulman on government and society after AGI (Part 2)

Rob Wiblin: So we can imagine a world in which different actors are training these extremely useful models that help them to understand the world better and make better decisions. We could imagine that the US State Department, for example, has a very good model that helps it figure out how it can coordinate better with other countries on AI regulation, among other things.

I think it would be even nicer if both the US State Department and the Chinese government agreed that the same model was trustworthy and very insightful, and that both of them would believe the things that it said, especially regarding their interactions and their agreements.

But how could two different parties that are somewhat adversarial towards one another both come to trust that at least that the same model is reasonably trustworthy for both of them, and isn’t going to screw over one party because it’s kind of been backdoored by the people who made it? How can you get agreement and trust between adversaries about which models you can believe?

Carl Shulman: First of all, right now this is a difficult problem — and you can see that with respect to large software products. So if Windows has backdoors, say, to enable the CIA to route machines running it, Russia or China cannot just purchase off-the-shelf software and have their cybersecurity agencies go through it and find every single zero-day exploit and bug.

That’s just quite beyond their capabilities. They can look, and if they find even one, then say, “Now we’re no longer going to trust commercial software that is coming from country X,” they can do that, but they can’t reliably find every single exploit that exists within a large piece of software.

And there’s some evidence that may be true with these AIs. For one thing, there will be software programs running the neural network and providing the scaffolding for AI agents or networks of AI agents and their tools, which can have backdoors in the ordinary way.

There are issues with adversarial examples, data poisoning and passwords. So a model can be trained to behave normally, classify images accurately, or produce text normally under most circumstances, but then in response to some special stimulus that would never be produced spontaneously, it will then behave in some quite different way — such as turning against a user who had purchased a copy of it or had been given some access.

So that’s a problem. And developing technical methods that either are able to locate that kind of data poisoning or conditional disposition, or are able to somehow moot it — for example, by making it so that if there are any of these habits or dispositions, they will wind up unable to actually control the behaviour of the AI, and you give it some additional training that restricts how it would react to such impulses. Maybe you have some majority voting system. You could imagine any number of techniques.

But right now, I think technically you have a very difficult time being sure that an AI provided by some other company or some other country genuinely had the loyalties that were being claimed — and especially that it wouldn’t, in response to some special code or stimulus, suddenly switch its behaviour or switch its loyalties.

So that is an area where I would very much encourage technical research. Governments that want to have the ability to manage that sort of thing, which they have very strong reasons to do, should want to invest in it.

Because if government contractors are producing AIs that are going to be a foundation not just of the public epistemology and political things, but also of industry, security, and military applications, the US military should be pretty wary of a situation where, for all they know, one of their contractors supplying AI systems can give a certain code word, and the US military no longer works for the US military. It works for Google or Microsoft or whatnot. That’s just a situation that just —

Rob Wiblin: Not very appealing.

Carl Shulman: Not very appealing. It’s not one that would arise for a Boeing. Even if there were a sort of sabotage or backdoor placed in some systems, the potential rewards or uses of that would be less.

But if you’re deploying these powerful AI systems at scale, they’re having an enormous amount of influence and power in society — eventually to the point where ultimately the instruments of state hinge on their loyalties — then you really don’t want to have this kind of backdoor or password, because it could actually overthrow the government, potentially. So this is a capability that governments should very much want, almost regardless, and this is a particular application where they should really want it.

But it also would be important for being sure that AI systems deployed at scale by a big government will not betray that government on behalf of the companies that produce them; will not betray the constitutional or legal order of that state on behalf of, say, the executive officials who are nominally in charge of those — you don’t want to have AI enabling a coup that overthrows democracy on behalf of a president against a congress.

Or, if you have AI that is developed under international auspices, so it’s supposed to reflect some agreement between multiple states that are all contributing to the endeavour or have joined in the treaty arrangement, you want to be sure that AIs will respect the terms of behaviour that were specified by the multinational agreement and not betray the larger project on behalf of any member state or participating organisation.

So this is a technology that we really should want systematically, just because empowering AIs this much, we want to be able to know their loyalties, and not have it be dependent on no one having inserted an effective backdoor anywhere along a chain of production.

Rob Wiblin: Yeah. I guess if both you and the other party were both able to inspect all of the data that went into training a model, and all of the reinforcement that went into generating its weights and its behaviours, it seems like that would put you in a better position for both sides to be able to trust it — because they could inspect all of that data and see if there’s anything sketchy in it.

And then they could potentially train the model themselves from scratch using that data and confirm that, yes, if you use this data, then you get these weights out of it. It’s a bit like how multiple parties could look at the source code of a program, and then they could compile it and confirm that they get the same thing out of it at the other end.

I suppose the trickier situation is one in which the two parties are not willing to hand over the data completely and allow the other party to train the model from scratch, using that data, to confirm that it matches. But in fact, that would be the situation in many of the most important cases that we’re concerned about.

Carl Shulman: I think you’re being a bit too optimistic about that now. People have inserted vulnerabilities intentionally into open source projects, so exchanging the source code is not enough on its own. And even a history of every single commit and every single team meeting of programmers producing the thing isn’t necessarily enough. But it certainly helps. The more data you have explaining how a final product came to be, the more places there are for there to be some slipup, something that reveals shenanigans with the process.

And that actually does point to a way in which even an untrusted model — where you’re not convinced of its loyalties or whether it has a backdoor password — can provide significant epistemic help in these kinds of adversarial situations.

The idea here is it can be easier to trace out that some chain of logic or some argument or demonstration is correct than it is to find it yourself. So say you have one nation-state whose AI models are somewhat behind another’s. It may be that the more advanced AI models can produce arguments and evidence in response to questions and cross-examination by the weaker AI models, such that they have to reveal the truth despite their greater abilities.

So earlier we talked about adversarial testing, and how you could see, can you develop a set of rules where it’s easy to assess whether those rules are being followed? And while complying with those rules, even a stronger model that is incentivised to lie is unable to get a lie past a weaker judge, like a weaker AI model or a human.

So it may be that by following rules analogous to preregistering your hypotheses, having your experiments all be under video cameras, following rules of consistency, passing cross-examination of various kinds, that the weaker parties’ models are able to do much more with access to an untrusted, even more capable model than they can do on their own.

And that might not give you the full benefit that you would realise if both sides had models they fully trust with no suspicion of backdoors, but it could help to bridge some of that gap, and it might bridge the gap on some critical questions.

Nova DasSarma on politically motivated cyberattacks [02:03:44]

From #132 – Nova DasSarma on why information security may be critical to the safe development of AI systems

Rob Wiblin: Yeah. I remember a couple of years ago, folks were really worried that the GPT-2 language model and then the GPT-3 language model — if people had broader access to them, or they could reproduce that kind of result — that those models could then be used for crime, or just some kind of negative purpose that we haven’t yet thought of. People thought that perhaps that’d be used to simulate actors on social media, and just create so much noise on social media, and make it impossible to tell who was a real person and who was not.

For that matter, a lot of people, including me, predicted that during the Russian invasion of Ukraine, we’d see a lot of cyberattacks — I guess, competition between the US and Russia, as well as Russia using cyberattacks in order to deactivate infrastructure or personnel within Ukraine.

But as far as I know, we haven’t seen very much of that. Is this a little reassuring that, although in any specific case, there’s a lot of potential for information to leak or to be misused, in practice, lots of things that could happen don’t happen?

Nova DasSarma: Well, I hate to disagree here.

Rob Wiblin: OK, yeah. Go for it.

Nova DasSarma: Unfortunately we actually have seen quite a few things that look like this. Not on the AI side. But during the Russian invasion of Ukraine, ongoing, one of the most successful Russian efforts was to disrupt communications out of Kyiv, and that was definitely something that involved a cyberattack on Ukraine. We’ve also seen Ukraine punch back, have a call to action for their homegrown hackers to take on the Russian state there, and we’ve seen that sort of thing.

In the software ecosystem, we’ve seen some serious disruptions from individual actors doing things. Like there’s an npm package — npm being the JavaScript package repository — where somebody pushed a malicious code update that checked whether the code was running on a Russian or a Belarusian computer. And if it was, then it deleted everything on the hard drive.

And this was a project that was included by many, many other projects, and it turns out that was quite damaging. There’s a thread on their GitHub, which I have no way of verifying, from somebody claiming to be an NGO operating for whistleblowers out of Belarus, claiming that this actually ended up deleting a whole bunch of data for them. So certainly we have at least some people claiming that this was something damaging.

We’ve also seen on social media evidence of manipulated profiles and things like that. Where images were generated, not by GPT-2, GPT-3, but by things like CLIP. You can see the telltale signs of an AI-generated image, where there are things like an earring is only on one side, various kinds of oddities around the corners of the eyes and hairline and things like that, where we see some of these things.

And I think honestly, there are more of these than we know, because if they’re successful, then they’re undetected. You have to be doing it quite badly to be detected there.

Rob Wiblin: Yeah.

Nova DasSarma: One other thing though. People are very, very good at doing these sorts of attacks on their own. And humans are quite cheap compared to somebody who can play GPT-3 like a piano. It’s easier to just hire 1,000 very low-paid workers to do this and have them do this all the time. And it’s way easier to train them than it is to train an ML model.

So I think that’s part of the reason. And I think as capabilities increase of these large language models, the potential for abuse increases, because their capabilities outstrip that very cheap labour.

Rob Wiblin: Yeah, that makes a lot of sense.

Bruce Schneier on the day-to-day benefits of improved security and recognising that there’s never zero risk [02:07:27]

From #64 – Bruce Schneier on how insecure electronic voting could break the United States — and surveillance without tyranny

Bruce Schneier: In general I think computer security is a great way to improve the world primarily because it is infrastructure. It doesn’t do anything but it enables everything else to be done.

If you think about it, security is kind of a weird thing because nobody actually wants to buy security. What they want is not to have the thing that security prevents. I don’t want a door lock, but I don’t want to get burglarised. So the door lock gives me the “not getting burglarized.” So security is never a thing, but it enables everything else. It’s core infrastructure.

When you think about all of the promise of computers, from AI to autonomy and physical agency and all of the things, all the magic, all the technology: we want it to be secure. We want it to not have any bad side effects. And computer security is how we get that.

So without computer security, nothing’s going to work. With it, everything will work. So it’s extremely important.

So the neat thing about computer security is not that you’re going to prevent a catastrophe in the future: you’re preventing catastrophe tomorrow. And this isn’t theoretical, this is real. I’ve got real problems right now that if I don’t solve, none of that future stuff is going to work well. So come join computer security not because you’re worried about the Terminator, but because you’re worried about the iPhone.

But all these machine learning security worries exist right now. We’re worried about adversarial machine learning, we’re worried about model stealing. We’re worried about algorithms that can’t explain themselves, or veer off into weird side effects, or they embed existing pathologies like discrimination and biases based on the data we give them or the feedback we give them.

These are all problems right now. And yeah, they are going to be bigger problems when these systems don’t just make parole decisions, and they make left-right turning decisions billions of times a day in cars. But in a lot of ways, it’s the same thing. So the research today is extraordinarily valuable for the problems today, which will extend to the problems tomorrow.

I will often get asked, “How do I help?” And one of the things I’ll say is, “Find an organisation you believe in, and help them.” I don’t think we need to optimise here, right? There are so many problems. So many areas in computer security, so many areas and sort of ways you can help the world. Pick the way that makes you excited to get up in the morning. Don’t pick the one that’s most optimal. You know, we are literally all in this together, and someone’s going to have to handle all the things you’re not handling. So I care less what you’re doing as long as you’re doing something. Because we all have to help, or this is not going to work at all.

So there’s a lot we can do to increase the work the attacker has to go through, and that is worth doing — even in a world where, yes, attack is easier than defence. Because there are no absolutes; this is all relative. So doing good helps.

Rob Wiblin: So I guess at the moment, Google DeepMind publishes basically everything, but in future they might be wanting to keep their algorithms more secret because they’re either more dangerous or just more commercially sensitive, and they could imagine that they’re forecasting that they’re very likely to be kind of close to the top of the list for, you know, something like the Chinese government wanting to steal commercially valuable information.

For someone who is likely to be kind of close to the top of the list where it’s like, the things that they are stealing are worth billions and billions of dollars, is it practical to hire some really great people and harden your systems against that? Or is it maybe we just don’t know where things will stand in 20 years’ time?

Bruce Schneier: We don’t know where things will stand in 20 years’ time at all. Certainly it’s practical to hire people. Google hires a lot of security people. And they have withstood government attacks. In 2011, they were penetrated by the Chinese trying to get information on Taiwanese dissidents. And they’ve done a lot of hardening since then. They were penetrated by the NSA; it came out in the Snowden documents. And then they’ve done a lot of work against nation-state attackers and they consider themselves secure against most nation-state attacks, and they probably are.

But yes, I think Google has spent a lot of time being secure against nations. Now, that doesn’t mean it’s impossible. Again, there are no absolutes here. So there’s a lot we can do.

And I certainly think that Google will get to the point where a lot of their algorithms will be kept as trade secrets. The obvious one now is PageRank, the algorithm by which they rank search. And that is secret for two reasons. One, because they don’t want rival search engines to use them, and two, because they don’t want optimisation companies to figure out how to game the system.

So you will probably see AI algorithms in that same boat. What we see now is that AI systems are all public but training data and the resulting model tends to be secret. Again, those aren’t secrets for long, right? And security will come from moving fast, not from having the secret pile. What is the state of the art today won’t be state of the art six months from now, and you have to just assume they will get out — and I think companies do.

Rob Wiblin: I guess if you have something that you really want to keep secret, do you largely have to air gap it? I mean, the Iranians tried this with their nuclear programme and even that didn’t work in that case. So they had a very concerted enemy.

Bruce Schneier: This is more complicated than the podcast probably can support. To keep a thing secret, it depends what it is, right? If it’s a small thing, you don’t write it down. You write it down on paper. You don’t put it on the internet. You don’t put it on computers.

Air gaps are just a very slow interface. We know that air gap systems are broken all the time. Actually Stuxnet was designed to cross an air gap into the Iranian nuclear power plant. The United States has an air-gapped private classified internet called SIPRNet. Actually, there are a few of them. And last time I saw writings — this a few years old; it’s probably still true — viruses tend to jump that air gap within 24 hours. Just because stuff happens.

Rob Wiblin: Someone sticks a USB drive —

Bruce Schneier: Someone sticks a drive, someone takes a computer home. Stuff happens. So air gaps help, but they’re not a panacea. In the Snowden documents, there were any number of programs designed to cross air gaps and move data through air gaps. So, you know, this isn’t something that’s going to solve things. If you want to keep something absolutely secret, there are things you can do, but largely, you recognise there are no absolutes.

Holden Karnofsky on why it’s so hard to hire security people despite the massive need [02:13:59]

From #158 – Holden Karnofsky on how AIs might take over even if they’re no smarter than humans, and his 4-part playbook for AI risk

Rob Wiblin: It is kind of weird that this is an existing industry that many different organisations require, and yet it’s going to be such a struggle to bring in enough people to secure what is probably a couple of gigabytes’ worth of data. It’s whack, right?

Holden Karnofsky: This is the biggest objection I hear to pushing security. Everyone will say alignment is a weird thing, and we need weird people to figure out how to do it. Security? What the heck? Why don’t the AI companies just hire the best people? They already exist. There’s a zillion of them.

And my response to that is basically that security hiring is a nightmare; you could talk to anyone who’s actually tried to do it.

You know, when Open Phil was looking for a security hire, I’ve never seen such a hiring nightmare in my life. I asked one security professional, “Hey, will you keep an eye out for people we might be able to hire?” And this person actually laughed, and said, “What the heck? Everyone asks me that. Like, of course there’s no one for you to hire. All the good people have amazing jobs where they barely have to do any work, and they get paid a huge amount, and they have exciting jobs. I’m absolutely never going to come across someone who would be good for you to hire. But yeah, I’ll let you know. Hahaha.” That was like a conversation I had. That was kind of representative of our experience. It’s crazy.

And I would love to be on the other side of that, as just a human being. I would love to have the kind of skills that were in that kind of demand. So yeah, it’s too bad more people aren’t into it. It seems like a good career. Go do it.

There may come a point at which AI is such a big deal that AI companies are actually just able to hire all the people who are the best in security, and they’re doing it, and they’re actually prioritising it — but I think that even now, with all the hype, we’re not even close to it. I think it’s in the future.

And I think that you can’t just hire a great security team overnight and have great security overnight. It actually matters that you’re thinking about the problems years in advance, and that you’re building your culture and your practices and your operations years in advance. Because security is not a thing you could just come in and bolt onto an existing company and then you’re secure. I think anyone who’s worked in security will tell you this.

So having great security people in place, making your company more secure, and figuring out ways to secure things well, well, well in advance of when you’re actually going to need the security — is definitely where you want to be if you can. And having people who care about these issues work on this topic does seem really valuable for that. It also means that the more these positions are in demand, the more they’re going to be in positions where they have an opportunity to have an influence and have credibility.

Rob Wiblin: Yeah. I think the idea that surely it’ll be possible to hire for this from the mainstream might have been a not-unreasonable expectation 10 or 15 years ago. But the thing is like, we’re already here. We can see that it’s not true. I don’t know why it’s not true. But definitely, people really can move the needle by one outstanding individual in this area.

Nova DasSarma on practical steps to getting into this field [02:16:37]

From #132 – Nova DasSarma on why information security may be critical to the safe development of AI systems

Rob Wiblin: I guess I have this stereotype from the past that computer security is bad enough that a motivated 14-year-old who hasn’t been to university yet, but just is really into computers, can probably do some interesting hacking, break into systems that you’d be kind of surprised that they could get into. But I wonder whether that might actually be an outdated stereotype, and whether perhaps things have improved sufficiently that a 14-year-old actually might struggle to do anything interesting at this point. Do you know where we stand on that?

Nova DasSarma: I think that stereotype is still quite accurate. Broadly, there is more software than there used to be. So a lot of the targets that were on that lower end of the security spectrum, there just are more of them.

I think that until we find ways to create secure systems by default, instead of having to do security as more of an afterthought, we are going to continue to see situations where a script kiddie with a piece of software that they downloaded off of GitHub can do a vulnerability scan and deface some website or something like that. I think it’s a lot harder for them than it used to be to break into things like whitehouse.gov or something like that.

Rob Wiblin: Yeah, I see. Maybe the top end has gotten more secure as this has become more professionalized, but there’s so many more things on computers now in general that the number of things that are not secure is still plenty.

Nova DasSarma: Exactly, yes. And I think in some ways this is good — having systems that kids are able to break into is in fact a good thing. But we’ve seen some really cool stuff in terms of websites where you’ve got a Capture the Flag scenario, where you’re meant to try and break into one level and then it gets to the next level. Then there’s some key that you have to find for the next one. And these are actually really, really fun. I think it’s a great way to get kids interested in security. I would obviously not condone somebody trying to break into arbitrary websites, but certainly there are tools that are actually fun to do this with.

Rob Wiblin: How illegal is it to break into a website or something that doesn’t matter that much, just as a matter of getting practice and training? Assuming you don’t do any damage whatsoever?

Nova DasSarma: Very illegal and you shouldn’t do it. But I would say that if you’re interested in trying to do some kind of vulnerability testing, I would contact that website and ask them. Because a lot of Silicon Valley mindset is to ask for forgiveness, not permission. Computer security and data losses is not one of those things. This is what one would call a crime. I don’t recommend it.

Rob Wiblin: But you’re saying if you contact a random website and say, “I think you might have a bunch of vulnerabilities. I am training in this. Would you like me to try to break into your systems and then tell you what to fix?” that enough of them will say yes that this is a viable method?

Nova DasSarma: I think not very many people will say yes to you, if you’re not somebody with a background in this sort of thing. And if you don’t have a background in this sort of thing, then I would recommend looking at some of these Capture the Flag websites, some of these other sorts of things where somebody has actively set up a really interesting puzzle for this. And I imagine that the NSA has some programmes around this, if you’re interested and on the younger end.

Rob Wiblin: If there’s young people in the audience who are interested to try their skills at this sort of thing, what resources can you point them towards? Is there like a Hacker Monthly magazine or a podcast they should be subscribing to?

Nova DasSarma: There’s a thing called the CVE, which is a centralised database for talking about various sorts of vulnerabilities and computer systems. Taking a look at the sorts of things that are there can be quite informative. Oftentimes they have exploits that come with them as a proof of concept for being able to break into those sorts of systems. That’s a good way to get acquainted with the sorts of vulnerabilities that people introduce into these systems.

There’s a site called CTF101.org that talks about forensics and cryptography and exploitation and reverse engineering and that sort of thing. That’s a pretty good resource. There’s a thing called Metasploit, which is another database of exploits that you might want to look at.

There are a lot of different kinds of Capture the Flags. Those specifically I think are really good. I think there’s nothing like experience in many, many computer things. It’s very easy to read about something and go, “Oh, that makes sense.” It’s a lot harder to put it into practice, and having a system that’s live that you can try stuff on where they won’t call the police to your house is really good. Trying those is great.

Rob Wiblin: We’ve talked quite a lot about self-directed and organic ways of building skills. Are there any more formal courses of training that people could use in order to build their skills, or is that maybe the wrong way to be thinking about it?

Nova DasSarma: I think that those courses can be useful if you’re on the web side of things. MDN, which is the Mozilla Developer Network, has some interesting stuff. I think that there’s some Coursera courses out there that have looked pretty interesting — mostly on the ML side and less on the DevOps side.

The best thing that you could do if you’re in a university kind of role, is you’re in a good position to apply to one of these roles at a software company. Two things to keep in mind there. One, infrastructure is super in demand, it turns out. I’m not sure why everyone isn’t doing it because it’s the most exciting thing in the world and possibly the best thing that you could be doing. But that’s in your advantage if you’re interested in this sort of thing: I think there are fewer people that you’re competing with compared to a generic software position.

The one choice you might want to make is between something like Google, Facebook, Amazon, those sorts of large companies — where your DevOps looks pretty different from doing something at, for example, a startup, where the work that you would be doing is very much greenfield work, very much working with tools in a more direct way.

You’ll get more mentorship at a place like Google, but you might learn slightly different things and you might need to do more projects on your own to see if you can apply those tools. Because for example, if you work at Google, a lot of the really hard problems have been solved for you. There’s a lot of people working there, and there’s a lot of tooling that’s been developed to make it so when a software engineer wants to launch a product, they have a very specific thing they can do.

Whereas I did a bunch of startups. I’ve worked at a bunch of Y Combinator places, and every single one of them has been from the ground up: you’ve got to look at the problems and draw out a thing on paper and then make that happen, and you can choose basically whatever tools you want. It’s just a very different experience, I think. But internships are a good place for this, if you’re so inclined.

I do still recommend doing your own projects though, because I think there’s nothing like that. If you’re looking for more feedback, then that’s where you want to launch, right? If you produce something that has users, those users will want things from you. And I think there’s nothing like that.

Rob Wiblin: Are there any sorts of people who think they’re not qualified or not suitable to work at Anthropic, or in these kinds of roles, but actually are? Is that a phenomenon?

Nova DasSarma: Yeah, I think we see that sometimes. The things that we’re looking for are folks who are relatively self-directed and are able to pick things up fast. The biggest thing is you might not have a huge ML background. But if you’re a really strong software engineer, I think sometimes the ML is pretty easy compared to the software engineering problems, and you can pick up the ML.

Jared Kaplan has a really good note out about learning ML, that’s really targeted at physicists, but I think it’s one of the clearest things out there on this. So if you think that’s readable, and you’re otherwise a pretty strong software engineer, then I encourage you to apply.

Bruce Schneier on finding your personal fit in a range of security careers [02:24:42]

From #64 – Bruce Schneier on how insecure electronic voting could break the United States — and surveillance without tyranny

Rob Wiblin: So if you want to become someone who changes the world by hardening systems that are really valuable to harden, what’s the best way to go about developing their skills? Imagine that maybe you’re talking to a 25-year-old CS grad who has some interest in computer security, but isn’t working in it yet.

Bruce Schneier: I get this question all the time, and they always use the word ‘best.’ And I tell them not to use the word ‘best,’ because what you want is the career that makes you excited to wake up in the morning — and the last thing you want to be told is that ‘this is the best thing’ and you’re miserable, where the second best thing would be great.

So find what you’re excited in. Computer security is a very varied career. There are lots of different things you can do, ranging from hardcore math, to hacking, to policy, and dealing with people and users. And figure out what gets you excited and do that, because you’ll do way better for yourself and the world by doing the thing that excites you than doing the thing that might objectively be better that doesn’t excite you.

But again and again students always ask it in that way: “What is the best way, what’s my best path?” Take a random path. Just wander through the space, do different things, see what’s interesting.

Rob Wiblin: What are maybe some promising options that people could take if they’re excited by them? Are there any courses that are interesting, or is this something you really have to learn by doing it yourself on your own systems or just get a job and learn on the fly?

Bruce Schneier: Well, that’s possible. We have something called a “cybersecurity skills gap” right now, which basically means that there are way more jobs than there are people to fill them at all levels. So yes, there’s a lot of on-the-job training that goes on, where companies hire people with general skills and give them more specific skills in any aspect of computer security.

There are lots of programmes. Most universities have some computer security either sub-degree or courses. So there’s any number of ways to engage, and again, poke around and see what’s exciting to you.

Rob Wiblin: I guess if there’s such a shortfall of skills, then it’s probably easier to get in on the ground floor right now?

Bruce Schneier: And shortfall is even an understatement. I mean hundreds of thousands of unfilled jobs. And that’s just today. And that’s just the United States. And worldwide, into the future, there’s going to be many, many more.

Rob Wiblin: This is my personality, but I find these computer security issues and looking at all the vulnerabilities and people fighting endlessly fascinating. I guess for you it’s all very entertaining.

Bruce Schneier: I think it’s the best field to be in too. I’m not going to deny that.

Rob Wiblin: Yeah, I think I took the wrong path somewhere and studied economics.

Bruce Schneier: But it’s very funny, economics matters a lot. My security problems actually are much more economic than technical. I have a lot of tech. My problem is not that it’s not being used. My problem is not that it’s not being deployed. My problem is, it’s not economically sound for companies to use this tech.

We have a conference, WEIS, Workshop on Economics and Information Security, where economists and techies get together and do research on the economic models that drive computer security.

So you think about something like spam. Spam was a really interesting problem that had an economic solution. So spam was a huge problem. And then we all would have spam checkers and they’d be on our mail servers. And they’d be pretty good and sort of not that great. We really wanted spam checking to be in the backbone, but the telcos had no economic incentive to deploy spam checkers at all. There was no upside, all downside. And that’s where the problem lay and no one ever solved spam.

It was solved because the economics of email changed and now there are only like seven email providers on the planet. So they were now big enough to internalise the problem and they tackled spam and now spam is not a problem at all for anybody.

Rob Wiblin: Yeah.

Bruce Schneier: And that had nothing to do with the tech. That was all the way the economic models of email shifted around and we have lots of those. So I welcome economists in computer security.

So if you ever decide that podcasting is not as exciting as you want it to be, you can come join us!

Rob Wiblin: Costs and benefits tend to sneak in everywhere. Economists kind of colonise it as everything.

Bruce Schneier: Psychologists as well.

Rob Wiblin: Yeah, social sciences in general.

Bruce Schneier: The human interface. A lot of systems we have fail because of the people and the way this tech interfaces with people. Psychology and sociology are also extraordinarily important in my field right now. What we’re recognising is we’re not building tech systems; we’re building socio-tech systems — and economics, psychology, sociology matter so much because they are core to what we’re building. And this is different, right?

Twenty years ago, we were building tech tools. To now, where you design Facebook, economics, sociology, psychology matter just as much as the tech, if not more. And all those groups need to be together in design, implementation, maintenance, upkeep, features.

Rob Wiblin: I guess it sounds like you’re saying it’s a very very interdisciplinary field. It’s going to have to be. But sticking on the cybersecurity aspect for a second, to prepare for the interview I read this article, “How to build a cybersecurity career” by Daniel Miessler. Have you read that? And if you have, did you like it? Are there any other kind of similar guides for people who you know, want to figure out, you know, “What first steps should I take if I’m really taken with this idea?”

Bruce Schneier: There are guides and that’s a good guide. I have recommended it to students who ask me. But I do tell them that they don’t need a guide. This is not like medicine where there is a defined career path. It’s not like law, or like accounting.

Rob Wiblin: There’s just a lot of ways in?

Bruce Schneier: There’s a lot of ways in and there are so many aspects to it and different things you can do in such a demand that anyway in is fine. That any path is fine. That meandering around is actually beneficial. So I don’t want people to be wedded to a guide and follow the guide. I want them to follow their curiosity, and they’ll learn more and do better that way.

Rob Wiblin: So yeah, it’s interesting. It seems like there’s not enough computer security people and yet it doesn’t seem like it’s that hard to break in. If you know, if you have the right mentality and you’ve got your head screwed on.

There’s so much demand that you can just play around in your basement and figure out a whole lot of stuff and then try to go get a job where you harden systems. You can go to university courses. You can learn this stuff online. Why is it that there still aren’t enough people, given that it’s so interesting? It seems like there’s not huge barriers to entry.

Bruce Schneier: Have you met the world? Everything is so interesting!

Rob Wiblin: OK, there’s a lot of competition.

Bruce Schneier: There’s a lot of competition.

Rob Wiblin: But it pays so well!

Bruce Schneier: So does designing video games. So does doing everything. As a society, we have a lot of choices of cool things to do with our time. And there’s no shortage. And I think doing computer security is a certain mentality, and that certain people are drawn to it and other people aren’t. People who want to build and create things want to go into the stuff that builds and creates.

And in computer security, we break things, right? We tell you you can’t do that. We hack systems. We do things a little differently around here. And there are only certain kinds of people like that. I think a lot of people are drawn to the more creative and building aspects of tech, and that’s fine. We need that too. There’s no computer security if there’s nothing to secure. So it takes all kinds. Actually, it doesn’t take all kinds. We just have all kinds.

Rob Wiblin: Yeah, I feel like I might have the right mentality for this in some ways because like everywhere, I’m obsessed with my own security. When I’m going around doing stuff, I’m constantly seeing the weaknesses in all the systems that these companies have built.

Like my phone was stolen last week and I had to reinstall the apps on my phone and I realised that everything that the bank demanded were only things that I knew that could be solicited through social manipulation. So there’s no objects that I had to have. There’s nothing that they couldn’t get through a phishing expedition or just calling up my friends to try to figure out stuff that they might know.

And I emailed them and I called them and was complaining about that their security here was bad. To be honest they probably might know more about the economics of this than me, that if this was being exploited very much then the system wouldn’t be designed that way. But at the same time, I was looking at this and like, “This is a terrible system for resetting someone’s phone access to their banking.”

Bruce Schneier: Oh, it is terrible, and hackers do exploit those. And yes, the banks and other systems don’t fix them for really two reasons. One is the cost of loss is cheaper than the cost of fixing. More importantly, the cost of losses to them is cheaper than fixing it.

Rob Wiblin: But it’s a whole lot of time for you, even if you get the money back.

Bruce Schneier: But you, Mr Economist, understand the notion of externalities, and a bank is not going to fix the problem if someone else has the problem.

So in 1978, in the United States, we passed the Fair Credit Reporting Act. And one of the things it did is it limited liabilities for credit card losses to the individual to $50. And this was a game-changer in credit card security. Before that law, credit card companies would basically charge the user for fraud. Your credit card got stolen or lost and you were stuck with the bill until two weeks until the company could print the new little book with bad numbers.

When Congress passed that law, suddenly the credit card companies were absorbing all the losses. They couldn’t pass it to the consumer.

Rob Wiblin: And they fixed it very fast.

Bruce Schneier: But they did so many things that the consumer could never do. So think of what they did. Real-time verification of card validity. Microprinting on the cards. And the hologram to make them less forgeable. Shipping the cards and a PIN to the user in separate envelopes. Requiring activation from a phone that was recognised.

Now, if you’re a user and you’re getting those losses, you couldn’t implement any of those things. But the credit card company could. They just never did because they never suffered the losses.

Rob Wiblin: Giving the cost to the group that can do the most to fix the problem is just an obvious approach.

Bruce Schneier: So we in computer security try to use that principle again and again and again because we have this tech, but it can’t be deployed. It’s not being used because the people who can afford the tech aren’t seeing losses.

Whenever I look at computer failures, I always try to look at the economic reasons and then see where you can move the liabilities to a place that’s consolidated, so the solutions can be researched, purchased, deployed, and used.

Rob’s outro [02:34:46]

Rob Wiblin: As I mentioned hours ago in the intro, if you’d like to learn more about impact-driven information security careers then we’ve got an article on the 80,000 Hours website titled “Information security in high-impact areas.” Probably best to just google that.

We’ve also got 91 infosec-related jobs listed on the 80,000 Hours job board at the moment, from junior to senior roles. You can set up email alerts to tell you whenever we add new ones.

You can find that at jobs.80000hours.org. There’s 825 opportunities listed on the job board in total at the moment, so plenty of other non-infosec stuff on there as well.

All right, thanks to the production team for putting that compilation together. We’ll be back with a fully new interview soon!

Learn more

Information security in high-impact areas

Why we’re adding information security to our list of priority career paths

AI safety technical research

The 80,000 Hours Podcast on Artificial Intelligence and related topics

Related episodes

August 1, 2024

#195 – Sella Nevo on who’s trying to steal frontier AI models, and what they could do with them

Listen now

June 14, 2022

#132 – Nova DasSarma on why information security may be critical to the safe development of AI systems

Listen now

October 25, 2019

#64 – Bruce Schneier on how insecure electronic voting could break the United States — and surveillance without tyranny

Listen now

July 26, 2024

#194 – Vitalik Buterin on defensive acceleration and how to regulate AI when you fear government

Listen now

July 31, 2023

#158 – Holden Karnofsky on how AIs might take over even if they’re no smarter than humans, and his 4-part playbook for AI risk

Listen now

June 27, 2024

#191 – Carl Shulman on the economy and national security after AGI (Part 1)

Listen now

June 22, 2023

#155 – Lennart Heim on the compute governance era and what has to come after

Listen now

February 10, 2025

AGI disagreements and misconceptions: Rob, Luisa, & past guests hash it out

Listen now

About the show

The 80,000 Hours Podcast features unusually in-depth conversations about the world's most pressing problems and how you can use your career to solve them. We invite guests pursuing a wide range of career paths — from academics and activists to entrepreneurs and policymakers — to analyse the case for and against working on different issues and which approaches are best for solving them.

Get in touch with feedback or guest suggestions by emailing [email protected].

What should I listen to first?

We've carefully selected 10 episodes we think it could make sense to listen to first, on a separate podcast feed:

Check out 'Effective Altruism: An Introduction'

Subscribe here, or anywhere you get podcasts:

If you're new, see the podcast homepage for ideas on where to start, or browse our full episode archive.

On this page:

Transcript

Cold open [00:00:00]

Rob’s intro [00:00:49]

Holden Karnofsky on why infosec could be the issue on which the future of humanity pivots [00:03:21]

Tantum Collins on why infosec is a rare AI issue that unifies everyone [00:12:39]

Nick Joseph on whether the current state of information security makes it impossible to responsibly train AGI [00:16:23]

Nova DasSarma on the best available defences against well-funded adversaries [00:22:10]

Sella Nevo on why AI model weights are so valuable to steal [00:28:56]

Kevin Esvelt on what cryptographers can teach biosecurity experts [00:32:24]

Lennart Heim on the possibility of an autonomously replicating AI computer worm [00:34:56]

Zvi Mowshowitz on the absurd lack of security mindset at some AI companies [00:48:22]

Sella Nevo on the weaknesses of air-gapped networks and the risks of USB devices [00:49:54]

Bruce Schneier on why it’s bad to hook everything up to the internet [00:55:54]

Nita Farahany on the possibility of hacking neural implants [01:04:47]

Vitalik Buterin on how cybersecurity is the key to defence-dominant futures [01:10:48]

Nova DasSarma on exciting progress in information security [01:19:28]

Nathan Labenz on how even internal teams at AI companies may not know what they’re building [01:30:47]

Allan Dafoe on backdooring your own AI to prevent someone else from stealing it [01:33:51]

Tom Davidson on how dangerous “secret loyalties” in AI models could get [01:35:57]

Carl Shulman on whether we should be worried about backdoors as governments adopt AI technology [01:52:45]

Nova DasSarma on politically motivated cyberattacks [02:03:44]

Bruce Schneier on the day-to-day benefits of improved security and recognising that there’s never zero risk [02:07:27]

Holden Karnofsky on why it’s so hard to hire security people despite the massive need [02:13:59]

Nova DasSarma on practical steps to getting into this field [02:16:37]

Bruce Schneier on finding your personal fit in a range of security careers [02:24:42]

Rob’s outro [02:34:46]

Learn more

Information security in high-impact areas

Why we’re adding information security to our list of priority career paths

AI safety technical research

The 80,000 Hours Podcast on Artificial Intelligence and related topics

Related episodes

About the show

What should I listen to first?