#132 – Nova DasSarma on why information security may be critical to the safe development of AI systems

If a business has spent $100 million developing a product, it’s a fair bet that they don’t want it stolen in two seconds and uploaded to the web where anyone can use it for free.

This problem exists in extreme form for AI companies. These days, the electricity and equipment required to train cutting-edge machine learning models that generate uncanny human text and images can cost tens or hundreds of millions of dollars. But once trained, such models may be only a few gigabytes in size and run just fine on ordinary laptops.

Today’s guest, the computer scientist and polymath Nova DasSarma, works on computer and information security for the AI company Anthropic with the security team. One of her jobs is to stop hackers exfiltrating Anthropic’s incredibly expensive intellectual property, as recently happened to Nvidia. As she explains, given models’ small size, the need to store such models on internet-connected servers, and the poor state of computer security in general, this is a serious challenge.

The worries aren’t purely commercial though. This problem looms especially large for the growing number of people who expect that in coming decades we’ll develop so-called artificial ‘general’ intelligence systems that can learn and apply a wide range of skills all at once, and thereby have a transformative effect on society.

If aligned with the goals of their owners, such general AI models could operate like a team of super-skilled assistants, going out and doing whatever wonderful (or malicious) things are asked of them. This might represent a huge leap forward for humanity, though the transition to a very different new economy and power structure would have to be handled delicately.

If unaligned with the goals of their owners or humanity as a whole, such broadly capable models would naturally ‘go rogue,’ breaking their way into additional computer systems to grab more computing power — all the better to pursue their goals and make sure they can’t be shut off.

As Nova explains, in either case, we don’t want such models disseminated all over the world before we’ve confirmed they are deeply safe and law-abiding, and have figured out how to integrate them peacefully into society. In the first scenario, premature mass deployment would be risky and destabilising. In the second scenario, it could be catastrophic — perhaps even leading to human extinction if such general AI systems turn out to be able to self-improve rapidly rather than slowly, something we can only speculate on at this point.

If highly capable general AI systems are coming in the next 10 or 20 years, Nova may be flying below the radar with one of the most important jobs in the world.

We’ll soon need the ability to ‘sandbox’ (i.e. contain) models with a wide range of superhuman capabilities, including the ability to learn new skills, for a period of careful testing and limited deployment — preventing the model from breaking out, and criminals from breaking in. Nova and her colleagues are trying to figure out how to do this, but as this episode reveals, even the state of the art is nowhere near good enough.

In today’s conversation, Rob and Nova cover:

  • How good or bad information security is today
  • The most secure computer systems that exist today
  • How to design an AI training compute centre for maximum efficiency
  • Whether ‘formal verification’ can help us design trustworthy systems
  • How wide the practical gap is between AI capabilities and AI safety
  • How to disincentivise hackers
  • What should listeners do to strengthen their own security practices
  • Jobs at Anthropic
  • And a few more things as well

Get this episode by subscribing to our podcast on the world’s most pressing problems and how to solve them: type ‘80,000 Hours’ into your podcasting app. Or read the transcript below.

Producer: Keiran Harris
Audio mastering: Ben Cordell and Beppe Rådvik
Transcriptions: Katy Moore

Highlights

How organisations can protect against hacks

Nova DasSarma: I think that one of the biggest things you can do as an organization to avoid people getting hacked is to give them hardware that you control, and to have a unified fleet of hardware. For example, if you’ve got a fleet of computers that are all MacBooks that are all centrally secured and have encryption on the disk, then you can limit the damage that’s done by a hack to some Windows firewall or something like that to non-existent — because you’re only running on MacOS. And you are also limiting things like, I went on AliExpress to buy a BLÅHAJ plushie or something and it turned out that the vendor sent me a Word macro file that broke into my computer. But I did that on my personal device, and so now they would need to be able to jump another level into your corporate device. And you can do things like lock down the software that’s on there, lock down the websites they can access.

Nova DasSarma: Please use an ad blocker. Ads are important for the health of many companies, but if they’re a legit company, oftentimes you can support them in other ways. Ad networks are one of the easiest ways for an actor to inject malicious code onto a site that might otherwise be safe. So use an ad blocker.

Rob Wiblin: I guess Anthropic is a reasonably new organization, it’s a year or two old. Is there anything important that you did in setting up the way the systems work that secure it to some degree?

Nova DasSarma: Sure. I think that having corporate devices is pretty important. Another thing to think about is we used to talk about trusted networks and having things like a corporate network. And Google’s done some pretty good work with things like BeyondCorp, where you don’t really think about a VPN or something like that for security — you instead think about identity-based authentication. There’s no such thing as a “trusted network” where when you get on it, you no longer have to authenticate to grab files off of a shared SharePoint drive or something like that. You’re always authenticating at every step.

Nova DasSarma: The other thing that we do — that I suggest to every organization — is to think about single sign-on to make sure that you don’t have your users managing passwords for many services, juggling passwords around, where it can get very tedious for them to use a different password for every service. Using things like a password manager and single sign-on can help mitigate some of those flaws.

Top recommendations for personal computer security

Nova DasSarma: Number one I would say is use two-factor authentication everywhere you can. It doesn’t matter if your password gets compromised if you are able to deny access anyways, because they don’t have a hardware key. I think on that same front, use a password manager. Please use a password manager. You are decreasing the blast radius of any given compromised password by doing this. So do that.

Rob Wiblin: And that’s because if you use similar passwords or the same passwords in lots of different places, then if someone steals one, then they’ve stolen all of them.

Nova DasSarma: Exactly. Yes. And that being said, even if you make no other changes and you don’t actually use a password manager, certainly have your mail password be different… That is the skeleton key for resetting so many things. Very frustrating sometimes. And if we had to pick a third thing, I mentioned use an ad blocker. I’m going to say it again. I think that using an ad blocker is really, really, really important here in preventing random malicious code from being injected in your computer.

—-

Rob Wiblin: Yeah. I imagine everyone listening to this show is going to be familiar with two-factor authentication, where you get that six-digit code that you take out of your phone or from SMS and plug it in. The thing that we ideally will be switching over to for almost everything is one of these hardware keys, which is a thing that you plug into your computer and you press a little button on it and it does a cryptographic operation to prove that you had that key on you.

Rob Wiblin: I think that there’s a bunch of different ways that’s a whole lot better, but one of them is that it’s a lot less vulnerable to phishing attacks. So effectively, even if you have two-factor authentication where you’re getting that six-digit number, if someone sends you an email and directs you to a fake login website, they can just immediately take both the password that you’ve put in and the six digits that you’ve put in — the second factor that you’ve put in — and just log in as if they were you somewhere else. And that is, I think, quite a common way to break into people’s accounts. But that is basically not possible with these U2F hardware keys that have become reasonably common, and you can now lock up your Google account and your Facebook account and many other accounts with those.

Nova DasSarma: Yeah. And I really recommend those. I also recommend getting two keys, because I think one of the concerns people often have is that they might lose this key, and that’s a reasonable concern. So you should have two of them. Almost any reputable site will let you register multiple keys. Keep one of them in a secure location and keep the other one on your keychain. And you’ll do a lot better.

Nova DasSarma: I also didn’t mention if you’re buying technology, try and ensure that you’re buying it from a vendor who is reputable. It’s very easy to buy things like USB cables and stuff like that from whoever has the lowest dollar amount on Amazon. Keep in mind that if you’re plugging something into your computer, you are giving whoever produced the device hardware access to it. Even something like a USB cable can be pretty compromised. You have no way of looking inside that cable really and checking if there’s a chip there that when your computer’s away will turn into a keyboard to start typing some stuff there or something like that. And that’s not a theoretical attack — we absolutely see these in the wild.

State of the art in information security

Rob Wiblin: My perception, as someone who takes a slight amateur interest in information security issues, is that the state of the art is very bad. That we do not really have reliable ways of stopping a really advanced, well-funded adversary from stealing data, if this is something that they’re willing to invest a lot of human capital in. Is that kind of right?

Nova DasSarma: I think that’s kind of right. I’ve got a story here around this. A state that will not be named had an attack that was in the news recently, that was a zero-click vulnerability on iMessage. A “zero-click vulnerability” is one where the user doesn’t have to take any actions for them to be compromised. And this had to do with something called the JBIG2 compression algorithm, which you might have heard about, because back in the day, Xerox used to use this for copiers. It’s a compression algorithm, which means that you can copy things faster. But it turns out that if you turn the compression up too high, it turns zeros to nines and vice versa, which is quite bad for numerics.

Nova DasSarma: That being said, JBIG2 was also the culprit in this case, where their compression algorithm is dynamic — which means that you can specify patterns on the fly. It turns out that if you construct a file that has the JBIG2 codec in it, then you can construct logical gates out of this. Which means that in theory, it’s Turing complete — and in practice, it was Turing complete. So to deliver this vulnerability, they produced a computer within the JBIG2 decompression algorithm to deliver the payload to these phones.

Nova DasSarma: And that’s the sort of thing where you could theoretically have defended against this, but the way that you defended against this was least access — so not being able to access anything on your phones, or not having phones. Both of these things are really quite difficult to implement in an organization above a certain size that doesn’t have a very, very strong security mindset.

Nova DasSarma: So that’s on the state access side. That being said, the thing that works the most is always going to be a social attack. So something where you meet someone at a party, and they seem nice, and they become your friend. And then you let them into your building when you maybe shouldn’t have done that, and they plug a USB into your system, and you’re done. We talk about physical access being the end of the line in security oftentimes. So that being said, yes.

Motivated 14-year-old hackers

Rob Wiblin: I guess I have this stereotype from the past that computer security is bad enough that a motivated 14-year-old who hasn’t been to university yet, but just is really into computers, can probably do some interesting hacking, break into systems that you’d be kind of surprised that they could get into. But I wonder whether that might actually be an outdated stereotype, and whether perhaps things have improved sufficiently that a 14-year-old actually might struggle to do anything interesting at this point. Do you know where we stand on that?

Nova DasSarma: I think that stereotype is still quite accurate. Broadly, there is more software than there used to be. So a lot of the targets that were on that lower end of the security spectrum, there just are more of them. I think that until we find ways to create secure systems by default, instead of having to do security as more of an afterthought, we are going to continue to see situations where a script kiddie with a piece of software that they downloaded off of GitHub can do a vulnerability scan and deface some website or something like that. I think it’s a lot harder for them than it used to be to break into things like whitehouse.gov or something like that.

Rob Wiblin: Yeah, I see. Maybe the top end has gotten more secure as this has become more professionalized, but there’s so many more things on computers now in general that the number of things that are not secure is still plenty.

Nova DasSarma: Exactly, yes. And I think in some ways this is good — having systems that kids are able to break into is in fact a good thing. But we’ve seen some really cool stuff in terms of websites where you’ve got a Capture the Flag scenario, where you’re meant to try and break into one level and then it gets to the next level. Then there’s some key that you have to find for the next one. And these are actually really, really fun. I think it’s a great way to get kids interested in security. I would obviously not condone somebody trying to break into arbitrary websites, but certainly there are tools that are actually fun to do this with.

Rob Wiblin: How illegal is it to break into a website or something that doesn’t matter that much, just as a matter of getting practice and training? Assuming you don’t do any damage whatever?

Nova DasSarma: Very illegal and you shouldn’t do it. But I would say that if you’re interested in trying to do some kind of vulnerability testing, I would contact that website and ask them. Because a lot of Silicon Valley mindset is to ask for forgiveness, not permission. Computer security and data losses is not one of those things. This is what one would call a crime. I don’t recommend it.

Is the typical engineer working on non-safety research increasing or decreasing the odds of an artificial intelligence-related catastrophe?

Nova DasSarma: You know what, that’s a really hard question to answer. I think that if you come at things from the perspective that all capabilities work of any kind is negative, and it is of itself increasing the odds of an AGI-related catastrophe, then that would be an answer to your question. But that’s not my model. In some ways, I think that having capabilities that are within organizations that have pledged to treat things responsibly are probably decreasing these odds.

Nova DasSarma: Things like the OpenAI APIs for accessing models mean that more people have access to language models in a way where there is a gatekeeper — where there is a layer of safety and a way of imposing values onto the users of that model in a way that is fundamentally not true if you have a large number of actors. So I think that it’s very hard to say. I would say that I think certainly there are safety researchers at OpenAI, certainly there are safety researchers at DeepMind, and I think that those organizations also are thinking very thoughtfully about these things. And I’m hopeful that they are decreasing the odds of an AGI-related catastrophe. If you made me answer that question, I think that would be my answer there.

Rob Wiblin: Yeah, it’s certainly the case that it’s not only possible for the safety orientation or the thoughtfulness or the niceness of the actors working on AGI to improve. It could also get worse over time, or people can just become more scattered and more arms race-y over time, which is definitely a factor that complicates this question.

Nova DasSarma: And I think that there’s in some ways an evaporative cooling effect if you say that you’re never going to work at a place that has any capabilities, because then all you have working there are people who are purely interested in, “Let’s crank it to 11. Let’s drive this forward.” I think that having safety researchers there is important. And having this sort of collegiality with other organizations, and having standards, and being able to talk to each other about these things is important. So I think that the typical engineer there is probably decreasing these odds, just out of a matter-of-fact consolidation of intellectual capital and capabilities within a smaller number of folks who can then cooperate.

Interesting design choices with big ML models

Rob Wiblin: Something we haven’t talked about yet that I’m really excited to learn more about is what interesting design choices there are when you’re putting together a cluster, to try to get as much useful compute when you’re doing big ML models. What are the interesting optimizations that you could make?

Nova DasSarma: I love this. This is one of the best parts of my job. So for making a cluster for ML models, there are two basic components: you’ve got these units that can do floating-point operations and matrix multiplication, and you’ve got cables that are connecting them together at a certain speed. And you’ve got some memory, that sort of thing. So the design choices that you’re making are tradeoffs between mostly price and how much bandwidth you have between these units. The other thing is a design choice on the software end: are you writing your software in a way where it can easily be distributed over multiple GPUs and things like that. So that’s also a design choice, though mostly on the software side.

Nova DasSarma: And some logistical choices. You want to try and run your stuff where the power is cheap and the real estate is cheap. It turns out that rent for a large number of servers is actually a big cost. You also want to think about how easy it’s going to be to expand. Hofvarpnir made a pretty big mistake starting out. As a bit of background, almost all of my systems that I run for myself are run on hardware that I own, in people’s basements and stuff. It’s this global network of lots of computers that are all talking to each other, but they’re all on residential connections and they have interruptions and stuff like that. And this turns out to be really cheap when you’re a college student. It’s a great idea for anyone who’s trying to work on a system, and they’re trying to do something scrappy — I recommend doing that first.

Nova DasSarma: But it turns out that time is the limiting factor in some ways. So when we first got our EA Infrastructure money, we were like, “OK, we’re going to buy all these components ourselves and assemble.” And then we had some technical difficulties, and we had some difficulties with the provider not having the right width for the racks and stuff like that, and all sorts of really weird integration issues. And it turns out that it was better in the end to go with an integrator for future expansions, because it means that we can ask them, “We’re going to use these standardized units, and I just want four more of them. We’re just going to put them right next to each other,” and that sort of thing.

Nova DasSarma: Going back to the design thing though, the bandwidth does become a concern there, because you have a limit to how much bandwidth a certain switch can carry, and you might want to get a more expensive switch if you think that your cluster’s going to expand more. So that’s one of those things that’s a more direct design choice.

Side-channel attacks

Rob Wiblin: A vulnerability that I’ve always worried about is that you are inserting passwords, including often your password to your main password manager, into a browser. And all of the extensions that you have within that browser, I think can kind of see those passwords, or they can see the keystrokes that you’re entering into the website. And Chrome extensions, Firefox extensions have a record of being regularly compromised or regularly sold to bad actors, who then use them potentially to steal passwords. Are there a lot of these just gaping holes in the way that ordinary people use computers that are making them much more vulnerable than they really ought to be?

Nova DasSarma: Yeah. In security, we talk about these as side-channel attacks, where the primary channel would be breaking into your bank and the side channel is when you put a keylogger on somebody’s desktop so you can grab their password instead. Certainly extensions are a big concern here. I use quite a few extensions. I think that this is definitely a thing that’s quite useful. I am also in a role where I am being paid to be professionally paranoid, and so I read the code that’s being added to those. Trying to limit the number of them that you’re using, trying to keep an eye on what’s happening there is important.

Nova DasSarma: I would say that browsers are more secure than I certainly thought they were back in the day. Chrome especially has had a lot of work done by a lot of people working full time to sandbox execution of arbitrary code. When you think about programs, oftentimes they are something that’s written by somebody else that’s running on your computer, where your data is.

Nova DasSarma: And the web is increasingly like this. We’re recording right now on Riverside FM. It’s got video, it’s got audio streaming, it’s uploading files, it’s downloading files. It’s able to do all sorts of really, really exciting things. And this is inside the browser. If it was something that you had asked me to download, I would’ve been a lot more concerned. I think that the JavaScript sandboxing ecosystem has gotten very, very advanced. People have put a lot of thought into how to do smart things with it.

Nova DasSarma: I think that browsers in particular are oftentimes more secure than things that you’re running unaudited on your laptop. This is actually something where desktop operating systems have taken a page out of mobile’s book though: sandboxing, by default, is something that’s true on many apps and things like that. Permission dialogues for requesting access to information were not a thing on desktops really, because we didn’t start out thinking about that.

Nova DasSarma: The permission scheme for files on a Linux or Unix operating system has a set of permissions for read, write, and execute for user, group, and everybody. And for the longest time, “everybody” could access everything. This was expected: you were inside of a university, everybody was trusted. And moving to this model where things aren’t trusted by default has been very painful, but I think that browsers have been leading the way on that. So that’s pretty exciting.

Rob Wiblin: As far as I understand what you’re saying, I guess in the bad old days, we had this issue that if you loaded up a website, Internet Explorer or whatever browser you were using was not sufficiently good at sandboxing — which I guess is kind of constraining the code that’s running within that webpage to just interact inside that webpage, inside the browser. Instead, they could frequently find ways to get their tentacles into other parts of the disk to run code that you wouldn’t expect them to be able to run.

Rob Wiblin: But these days we’ve gotten better with Chrome and Firefox, and I guess just better computer security in general — figuring out how do we make sure that this tab that we’re using right now to record this conversation can only do the things that we would expect it to be able to do inside this browser, and not to access files that it can’t access, not to do broader things on our MacBooks that are beyond the permissions that Chrome has given this particular tab. Is that right?

Nova DasSarma: Yeah. I think that’s a really good gloss of this.

Articles, books, and other media discussed in the show

Nova’s work:

Nova’s recommended reads and podcasts:

Ways to get involved in systems security:

Learn and practice:

Job opportunities:

Security recommendations for organisations and individuals:

Other podcast episodes and resources from 80,000 Hours:

Rob’s podcast recommendations:

Everything else:

Related episodes

About the show

The 80,000 Hours Podcast features unusually in-depth conversations about the world's most pressing problems and how you can use your career to solve them. We invite guests pursuing a wide range of career paths — from academics and activists to entrepreneurs and policymakers — to analyse the case for and against working on different issues and which approaches are best for solving them.

The 80,000 Hours Podcast is produced and edited by Keiran Harris. Get in touch with feedback or guest suggestions by emailing [email protected].

What should I listen to first?

We've carefully selected 10 episodes we think it could make sense to listen to first, on a separate podcast feed:

Check out 'Effective Altruism: An Introduction'

Subscribe here, or anywhere you get podcasts:

If you're new, see the podcast homepage for ideas on where to start, or browse our full episode archive.