AI security
Summary
In a nutshell:
Many of the worst outcomes from advanced AI start with security failure. If models can easily be stolen or manipulated, we’re all at greater risk from people who misuse them. If an AI system doesn’t have robust monitoring and control protocols, it might break out of its environment to pursue misaligned goals. But within AI safety, people with the security experience to address these problems are in short supply.
Pros:
- Security is critical for averting some of the most dangerous outcomes from advanced AI
- The field presents interesting technical challenges; you’ll work on the technological frontier and play defence against thieves and spies with access to vast resources.
- There’s high demand for people with the right skills, so you can start applying for jobs right away, without further training or a long transition period
Cons:
- AI changes very quickly; your work won’t always have a long shelf life
- Your colleagues may not have security experience, so it helps to be comfortable working on your own without much feedback
Key facts on fit:
- This work is a much better fit if you already have information security (infosec) experience, since it’s much easier to build AI context than to skill up quickly in security.
- Valuable traits include expertise in various subfields of security, management and research experience, and a strong “security mindset” that keeps you focused on the ways a system could be vulnerable.
- Overall, this work still resembles conventional infosec; the methods and mindset are similar, and some of the problems have familiar shapes. But given the speed and scale of AI progress and the technology’s unique properties, AI security requires novel approaches and techniques.
Recommended
If you are well suited to this career, it may be the best way for you to have a social impact.
Review status
Based on an in-depth investigation
In April 2026, Anthropic announced that it was delaying the release of its newest model — Mythos — because it posed severe cybersecurity risks. The company claimed that Mythos found vulnerabilities and exploits in “every major operating system and web browser”, including widely used and well-maintained systems that were assumed to be generally secure.

To avert the risk, instead of immediately making the model available to the public, Anthropic began working with a small number of partners to scan critical infrastructure and repair bugs on a vast scale.
But when Anthropic released a public version of the model (Fable) eight weeks later, it only stayed up for a few days before the U.S. government forced the company to disable it, citing national security concerns.
The precise risks from the public version of Mythos are disputed (including by Anthropic itself), and it isn’t clear that AI will help cyberattackers more than it helps defenders. But Mythos is just one example of what powerful models can achieve — pushing governments to take drastic action or even threatening to disrupt the infrastructure that modern life depends on. Future models will be even more capable: AI progress isn’t slowing down.
—
As we create increasingly powerful AI systems, we need security that measures up to their dangerous capabilities. Most of the worst-case outcomes from AI are downstream of security failures: a rogue state stealing model weights to carry out cyberattacks, terrorists using a jailbroken model to design a virus, an AI quietly subverting the systems meant to monitor it, or a major power conducting a secret training run and triggering a global arms race.
But the organisations trying to shape and govern AI often lack sophisticated security knowledge. Even within frontier AI companies, the combination of security skills and AI expertise is relatively rare. This hampers their ability to develop technical solutions, draft good policy, and otherwise create effective plans for making an AI-driven future go well.
That’s where security professionals come in — especially those with enough experience to get started quickly. Even if you aren’t well-acquainted with “AI security” in particular, the methods and mindset are similar to what you’d find in other cyber work. If you know how to red-team complicated systems or do fast-paced cybersecurity research, you are well positioned to work on some of the most important problems we know of.
Why work on AI security?
Note: AI security is a broad field. This profile covers areas we think are especially impactful — those we see as especially relevant to preventing catastrophic outcomes. Many AI security positions look more like “conventional cybersecurity with AI tools”; still useful work, but much less neglected, and not our focus.
Many core problems in AI safety involve some form of security failure:
- External attackers could exploit weak security by gaining unauthorised access to powerful systems. They might use this access to cripple infrastructure, create weapons, or insert secret loyalties ahead of an attempt to seize power. If model weights can’t be reliably secured, this could help rogue states gain powerful destructive capabilities.
- An AI system can acquire power by taking advantage of weak security — for example, by copying its weights to other locations or bypassing internal controls to subvert its own safeguards.
- It’s very difficult to establish a treaty on legitimate AI use or a negotiated pause in AI development if the AI stack can’t be secured and independently verified, because signatories won’t be able to confirm what others are building and running. This could raise the risk of great power competition — for example, if the U.S. and China keep racing to build more powerful AI because neither can trust the other to stop.
Ways to contribute
Protecting AI from external attackers
AI is already being used in high-stakes contexts: helping militaries plan attacks, writing code for critical infrastructure, and helping AI companies develop the next generation of models. Someone who manipulates an AI system — say, to redirect a missile strike or introduce a backdoor into millions of machines — could cause immense harm.
The process used to build modern AI systems has many steps an attacker could exploit (especially if they use powerful AI themselves):
- Model weights and architecture can be stolen, in ways both exotic (side-channel attacks on GPUs) and mundane (an employee copies files to a personal account). This could allow the thieves to copy unreleased models for their own use, or produce versions that are stripped of important safeguards. In general, the more widely models proliferate, the harder it is to control how they are used.
- Training data can be “poisoned” to introduce triggers that provoke harmful behaviour (backdoors), degrade performance on particular inputs, or embed false beliefs that influence model outputs (like “secret loyalties” that lead systems to persistently work toward an attacker’s goals)
- Models can be jailbroken after deployment — or worse, distilled so that their abilities are copied with the guardrails removed
Security work can help on all these fronts. For example, you might produce technical research on data sanitisation or backdoor detection, develop red-teaming frameworks to improve pre-deployment testing, or develop unique security tools and protocols for the modern AI stack.
This area is especially conducive to working at frontier companies, though you could also work on policies that push companies to improve their security, whether in government (e.g. CAISI),1 at a think tank (e.g. RAND, IAPS), or at a for-profit or startup working directly with frontier companies to improve their security (e.g. SL5).
Stopping subversion from misaligned AI (“AI control”)
As AI grows increasingly powerful, companies and governments are deploying it more widely within critical systems. This leaves them vulnerable not only to external attackers, but also the AI itself.
If an AI turns out to be misaligned — acting outside its intended scope or pursuing goals its users didn’t intend — it could act like a human attacker,2 while also presenting new challenges:
- The AI might have broader access to internal systems than any single employee, and it could act across them faster than humans can intervene.
AIs can operate and communicate in ways humans can’t reliably detect — for example, hiding coded messages inside text that seems innocuous. Systems built to monitor human employees might not catch misaligned AIs. - Because an AI can write and run its own code, it may take actions its containment was never designed to anticipate.
- AI is increasingly being deployed to build and develop the next generation of AI.3 By shaping the systems that train its successors, it could acquire capabilities that render its safeguards ineffective. And even without intentional sabotage, the self-improvement loop could produce AIs that are increasingly opaque and resistant to human oversight, leading us to a world where human goals are sidelined.
AIs aren’t yet capable of launching sophisticated end-to-end attacks without human help. But they are increasingly able to act autonomously in ways their developers didn’t intend. We’ve already seen models refuse to be shut down, lie to humans about their plans, and access unauthorised systems.4 And the state of play is changing very quickly: even as we struggle to control the capabilities of today’s models, hundreds of billions of dollars are being leveraged to build better ones, at a pace of several groundbreaking releases each year.
We’ll be much safer if we build infrastructure that stops misaligned AI from causing this kind of harm. Cybersecurity professionals have exactly the right skills for this: they know how to break systems and secure them against unexpected threats. They’ve also cultivated a “security mindset” — a trained focus on ways that a system could break or fail (in contrast to the upside-focused “builder’s” mindset that fuels many researchers and entrepreneurs).
If you want to contribute to this emerging field, there are a number of options: you can work at nonprofits, such as Redwood Research or Palisade Research, in government (e.g. UK AISI), in academia (e.g. the Oxford Witt Lab) or in a role building products to mitigate these risks (e.g. Watcher). Several frontier companies also have positions dedicated to these problems.
Verification, compliance, and AI governance
Even if we secure AI systems against external threats and keep them under human control, we still need to worry about how companies and governments will use them. Secure, aligned AI could still become the focus of an arms race, or be used by whoever controls it to seize power on a global scale.
We can manage these risks by creating regulations and international agreements that limit parties’ ability to acquire compute or train powerful models. An arms race can be stopped if both parties agree to stop racing; it’s harder to build a world-conquering AI if all training runs above a certain size require third-party monitoring and evals to catch dangerous capabilities. But even if we establish the right rules, we need to ensure that people actually follow them.
Verifying compliance requires cybersecurity work to:
- Track the flow and use of high-end compute so that chips can’t be smuggled and frontier systems can’t be trained in secret.
- Verify the properties of AI training activities to ensure that AI developers are demonstrably complying with agreements. This work leans on cryptography and hardware security, and must be done well enough to prevent attempts by national governments to evade inspection.
- Evaluate the cyber risks of new models before release — either to prevent the release of a model with non-compliant capabilities, or warn the infosecurity community about the challenges they’ll soon be facing.
In practice, this might involve physical engineering (e.g. hardware “fingerprints” on chips, packaging that detects and reports tampering or unauthorised movement), cryptography (e.g. signed attestations that let chips prove their identity and location to a registry, secure enclaves that protect these credentials from being extracted or forged), or adversarial evaluation/red-teaming (to ensure that verification mechanisms can’t be bypassed or tampered with).
What’s it like to work on AI security?
You’d apply your existing skills to problems in the AI space, largely using familiar methods and concepts. For most roles, you wouldn’t need to come in with much (if any) specific context on AI; you could pick it up as you go. In the end, it may not be so different from other security work — though you’ll be working with technology that becomes more capable every few months and could create unexpected challenges at any time. That aspect is challenging, but it keeps the work fresh.
Compensation can be competitive. Surprisingly, even nonprofits and small companies can offer salaries on par with those of other security roles. Government roles pay less, but only account for a small fraction of the space. Frontier companies pay a lot.
AI security is a small field, and densely interconnected. Many organisations coauthor papers or even share office space. Hubs like London or the San Francisco Bay Area feature frequent events and meetups. You’ll have a relatively easy time building a network, and if you do well in your initial position, your skills will be in demand and word will get around quickly.
The culture has much in common with open source. Most people concerned about the security of frontier systems are working toward a shared, prosocial goal. There’s an emphasis on transparency, feedback, and public discussion, whether in inter-organisational Slack channels or online spaces like Twitter or the Alignment Forum. You don’t have to take part — you can stick to your job and avoid thinking about work outside of work — but the community is there if you want it.
Would you be a good fit?
It’s extremely helpful to have professional cybersecurity experience before you try to enter the field, for a few reasons:
- Most people will have a much easier time picking up the relevant AI context than building sufficient security skills; the AI experience you’ll need is mostly conceptual (and can be self-taught), while the security challenges are intense enough that even a world-class engineer won’t be bored. If you have security chops, you’ve passed the more difficult bottleneck, and your abilities will be in high demand.
- Information security requires a specific mindset: looking for where systems are vulnerable, and thinking like an attacker. It takes time and practice to develop, and the problems we face are urgent enough that AI specialists don’t have that time; they need to hire people who already think that way.
- Most leaders in the AI safety space know a lot about AI but don’t have much hands-on security experience; they need colleagues who can fill those gaps. In some cases, they aren’t even aware when their ideas overlap with existing cybersecurity work, so you might save your coworkers from reinventing the wheel.5
The field is short on people across the board, but demand is most acute in a few areas: infrastructure security, verification, and hardware, along with the rare person who pairs deep security experience with a real understanding of how modern ML systems work. If you recognise yourself in any of those, you’re closer to qualified than almost anyone being trained from scratch.
Useful skills
In addition to baseline security experience, these skills are especially valuable:
- National security, to understand the capabilities and strategies of nation-state actors
- Hardware security, to design and test new hardware verification technology
- Cryptography, to verify the authenticity and provenance of AI outputs
- Network security, to protect model weights and data centres
- Red-teaming, to assess whether existing defences (evaluations, guardrails, verification systems, model security measures) actually hold up
- Threat modelling, to work out which threats matter most in a fast-shifting landscape
These other skills will help you transition faster and open up more opportunities:
* A basic understanding of AI systems (architecture, training, supply chains) is helpful, though emphatically not required; the requisite knowledge can be self-taught or learned on the job.
* Management experience. This is a new field with far more junior than senior talent, so the ability to direct junior people productively gives you a lot of leverage.
* Research experience. Because so many ideas and methods are still being worked out, you’ll have more options if you’re ready to develop your own agenda rather than just executing on existing ones.
A few other things that help:
* The ability to work in the US, where much of the most important work is being done (or the UK, which comes second).
* The ability to obtain a security clearance in your country of choice, since it keeps important options open.
* A focus on catastrophic risks. A lot of effort already goes into preventing modest harms from current models; far fewer security professionals focus on the worst-case scenarios that could arise from the development of AGI.
Key roles to aim for
Research. Many open problems in AI security don’t yet have established methods or consensus answers. If you’re comfortable developing your own agenda, you’ll find no shortage of important questions. This work is especially valuable if you combine hands-on security experience with an understanding of how modern ML systems work, and it’s where formal verification experts and academic researchers can do a lot of good, setting high standards for security and mechanism design at a time when verifying the outputs of a “black box” technology is unusually important.
Engineering and defence. As AI makes it easier to attack critical systems, we need more talented defenders: skilled CISOs and security engineers who can protect frontier companies, government agencies, and other infrastructure that could be turned to catastrophic ends. Many of these positions involve red-teaming work to break through model defences.
Entrepreneurship. Regulation alone can’t force companies or governments to adopt better security practices, so we need founders building tools and services that make better security easy and appealing enough to adopt anyway. Security is a crowded market, but very few firms are tackling the problems that concern us most, like AI control or defending systems against nation-state attackers.
If you can fill that gap, you may find support: funders focused on these problems are emerging, including Coefficient Giving’s Navigators Incubator, Seldon Lab, and Halcyon Futures. Founding a nonprofit or focused research organisation to build what the market won’t is more feasible than ever, and we also expect commercial demand to rise as the leading companies grow and new security policies take effect. It’s a promising time to enter the market.
Policy and technical advising. The people shaping AI governance often lack deep security expertise. If you can translate between the technical realities of AI systems and the legal mechanisms meant to govern them, you occupy a rare and valuable position. This is especially true for compliance verification, where getting the technical details wrong has serious consequences.
Fieldbuilding. The infrastructure to coordinate this field is still nascent, and there’s value in building it: organising peer review to establish consensus on the most promising research directions, training professionals to understand AI risk and the security landscape, and providing the advising and networks that place people into key roles.
Downsides to working on AI security
Unlike most IT fields, AI security is very young, and core ideas are still coalescing. Many organisations in the AI safety space are small and have few staff with practical security experience.
This means you’ll need to develop your own views, build your own network, and set your own direction (since you may not be managed very closely). You’ll need to be comfortable with exploratory work, and the chance that whatever you build won’t get much use.
How to enter if you have a lot of experience
To get involved with the field, you could start applying for jobs right away. The ideas below could help you make connections or become a better candidate, but we’d encourage you to start looking now — security expertise is scarce and valuable, and you might get hired faster than you’d expect.
If you want to explore the field first, or learn about the modern AI landscape, you could:
- Attend a conference like the AI Security Forum or FMxAI.
- Join a research fellowship aimed at professionals, such as Anthropic Fellows or Heron’s Research Fellowship.
- Join a week-long AI Security bootcamp to get up to speed on problem areas.
- Build context on a specific security issue with a program like the ERA Fellowship.
- Build AI context through a programme like BlueDot Impact’s AGI Strategy course.
- Apply to Coefficient Giving for funding to support your career transition, or to their Capacity Building RFP to run security-focused workshops or trainings.
- Book a career advising call with Heron or 80,000 Hours.
How to enter if you’re early in your career
While having experience makes things much easier, there are a few strong early-career moves you can take if you’re new to the field:
- Quickly developing expertise in one subfield — like a specific type of hardware verification method, or a specific technique within AI control.
- Joining a top technical AI fellowship, like MATS, Astra, or SPAR.
- Working at a top academic or research organisation within AI security. Leading researchers include Dawn Song, Florian Tramèr, Yisroel Mirsky, and Daniel Kang. Notable research centres include CSET, Oxford’s AI Governance Initiative, and Carnegie Mellon’s Software Engineering Institute.
Because this area is unusually time-sensitive, we recommend starting as soon as you can.
Where can this kind of work be done?
Top organisations to work for
The most well-established options (generally stronger, though it depends on the available roles and your experience):
- Frontier AI companies like OpenAI, Anthropic, Google DeepMind, or xAI
- The US Center for AI Standards and Innovation
- The UK’s AI Security Institute
- The RAND Center on AI, Security, and Technology
- The Institute for AI Policy and Strategy
- Irregular
Smaller organisations
- Amodo Design
- Apollo Research
- Center for a New American Security
- FAR.AI
- Lucid Computing
- METR
- Palisade Research
- Redwood Research
- SL5 Task Force
Grantmakers
Grantmaking roles are also important, and hard to hire for. You don’t need grantmaking experience, just an understanding of the issues at hand. These organisations fund work in AI security:
* ARIA (several relevant programs)
* Coefficient Giving’s Navigating Transformative AI Fund
* Longview Philanthropy’s Frontier AI Fund or Emerging Challenges Fund
Find jobs in AI cybersecurity
Our job board features opportunities in AI security:
Speak with us
If you think this path might be a great option for you, but you need help deciding or thinking about what to do next, our team might be able to help.
We can help you compare options, make connections, and possibly even help you find jobs or funding opportunities.
Learn more
External attacks
- Podcast: Sella Nevo on who’s trying to steal frontier AI models, and what could they do with them
- Podcast: Nova DasSarma on why information security may be critical to the safe development of AI systems
- Securing AI Model Weights: preventing theft and misuse of frontier models (RAND)
- Frontier model performance on offensive-security tasks (Irregular)
- Life of a Jailbreak, from On the Biology of a Large Language MoSdel (Anthropic)
- AI Integrity: Defending Against Backdoors and Secret Loyalties (Dave Banerjee, IAPS)
- Catastrophic AI misuse (80,000 Hours)
- Strategic AI Training Sabotage: State Attacks on Advanced Systems’ Development (Twm Stone)
AI control
- Podcast: Buck Shlegeris on controlling AI that wants to take over – so we can use it anyway
- Detecting and reducing scheming in AI models (OpenAI and Apollo Research)
- What’s worse, spies or schemers? (Buck Shlegeris, Julian Stastny)
- An overview of areas of control work (Ryan Greenblatt)
- Risks from power-seeking AI systems (80,000 Hours)
AI governance
- Podcast: Lennart Heim on the compute governance era and what has to come after
- Accelerating AI Data Center Security (Erich Grunwald, IAPS)
Acknowledgements
Thank you to Abbey Chaver, Arden Koehler, Guy Nachshon, Inbar Shulman, Jarrah Bloomfield, and Nitzan Shulman.
Read next: Learn about other high-impact careers
Want to consider more paths? See our list of the highest-impact career paths according to our research.
Notes and references
- The haphazard response to the Fable jailbreak highlights the urgency for more experts to apply for roles in government where they can help to clarify policies around AI security and deployment.↩
- For more on the difference between internal attacks from human spies and scheming AI systems, see “What’s worse, spies or schemers?“↩
- This is an explicit goal for companies like OpenAI and Anthropic.↩
- During its training, Mythos escaped an isolated “sandbox computer”, using a “moderately sophisticated” exploit to get online and send an email.↩
- For example, a lot of early research on AI control failed to pick up useful concepts from sandboxing, an existing security technique developed for related problems. AI safety has also adapted ideas related to access control (e.g. RBAC and ABAC), data loss prevention, and threat modelling.↩