Can cybersecurity red teams keep AI safe?

AI safety and security remain a defining issue in artificial intelligence. Learn how AI experts have imported the concept of "red teaming" into their security efforts.

By: James M. Tobin, Edited by: Mitch Jacobson

Published: September 25, 2025

"Red teams" have played a crucial role in the development of cybersecurity strategy for decades. The concept involves assigning a dedicated group known as a "red team" to stage targeted cyberattacks on networks and systems. Their efforts provide a controlled yet effective method for identifying and addressing security risks.

AI red teaming applies similar concepts to artificial intelligence technologies. Explore this fascinating area of AI security research, and discover ways to make AI safer for individual users, businesses, governments, and society.

Take a cybersecurity or AI course on edX today

Key takeaways

AI red teams work to identify safety, security, and privacy risks in predictive and generative artificial intelligence systems.
The concept of red teaming originated in cybersecurity and was subsequently applied to AI security in the 2010s.
Red teams employ the same techniques that real-life hackers and threat actors use to stage attacks, adapting them to the unique safety risks associated with AI.
Human-led red teams remain a common standard, but recent research shows AI-powered red teams can outperform them in certain scenarios.

Cybersecurity red teams: What do they do?

The concept of "red teaming" first emerged in military contexts. During the Cold War, U.S. military units would conduct safety exercises with a "blue team" representing the United States and a "red team" simulating the Soviet Union.

As cybersecurity developed alongside increasingly advanced computer technologies, it imported the notion of "red teaming" into security testing frameworks. Cybersecurity red teams include ethical hackers, who carry out mock intrusions using real-world attack vectors to identify and close previously undetected vulnerabilities before a real attack can happen.

The following table highlights some of the major objectives and strategies used in cybersecurity red teaming:

Major objectives and strategies used in cybersecurity red teaming
Red teaming technique	Security objective
Penetration testing	Probe web applications for coding errors or oversights that create security weaknesses
Social engineering	Target unsuspecting human users in a bid to gain login credentials or other forms of unauthorized access
Network sniffing	Surveil network activity and identify opportunities to stage attacks
Brute-force attacks	Use automated or manual methods of systematically guessing access credentials
Continuous automated red teaming	Automate red teaming efforts and conduct testing over an extended period of time

Are there security concerns unique to AI?

Yes, AI poses multiple security risks that are especially relevant or unique to its associated technologies. While these risks continue to evolve as AI becomes increasingly integrated into business, known security concerns can be grouped into two broad categories:

Technology-based risks

These risks relate to the underlying technologies that power AI and machine learning (ML) systems. They broadly include:

Data poisoning: This describes deliberate efforts to corrupt training data to create inaccurate, inappropriate, or malicious AI outputs.
Prompt injection attacks: Attackers can engineer AI prompts to "trick" the system into exhibiting unintended behavior or revealing sensitive information.
System prompt leaks: The system prompts that guide AI outputs may themselves contain sensitive information, which attackers can target and exploit.
Unchecked autonomy: AI systems may lack effective oversight, allowing them to abuse their agency and carry out harmful activities.

Technology-based risks

These risks relate to the underlying technologies that power AI and machine learning (ML) systems. They broadly include:

Data poisoning: This describes deliberate efforts to corrupt training data to create inaccurate, inappropriate, or malicious AI outputs.
Prompt injection attacks: Attackers can engineer AI prompts to "trick" the system into exhibiting unintended behavior or revealing sensitive information.
System prompt leaks: The system prompts that guide AI outputs may themselves contain sensitive information, which attackers can target and exploit.
Unchecked autonomy: AI systems may lack effective oversight, allowing them to abuse their agency and carry out harmful activities.

Emerging and future AI risks

Popular culture has explored concerns over the evolution of artificial general intelligence, which would theoretically surpass even the highest levels of human intelligence and potentially place AI beyond human control. However, many other, more realistic risks are already emerging, and these could become very real and concerning AI safety and security issues in the near future.

Examples include:

Mass-scale job losses as human workers become redundant, potentially contributing to volatile socioeconomic conditions.
AI's weaponization in surveillance, profit-driven exploitation, and warfare.
Serious environmental harm arising from the enormous quantities of electricity required to power AI data centers.

AI safety and security experts also widely believe that malicious actors could use artificial intelligence to develop highly advanced hacking and cyberattacking capabilities. In a worst-case scenario, these could have global impacts.

Red teaming AI: Opportunities and limitations

Current AI red teaming models favor human personnel, led by accomplished interdisciplinary experts and composed of prompt engineers and ethical hackers. While major technology companies have used AI red teams since the late 2010s, the practice remains a rising but underutilized concept.

On the plus side, AI red teaming offers these advantages:

It makes AI more resilient against common attack vectors, such as data poisoning.
AI red teaming can reduce bias, enhance system performance, and support compliance.
Red teaming can help AI developers quantify and measure risks rather than rely on theoretical guesswork.

However, it also has limitations:

AI red teaming can be subjective, and it remains heavily dependent on the insights and abilities of its human designers.
It can struggle to replicate an authentic attack environment, especially when limited to a single model.
Businesses have a general lack of incentive to invest in AI red teams, given the lack of standardized regulatory and compliance frameworks.

Going forward, automated AI red teaming — powered by artificial intelligence itself — can dramatically improve the AI safety and security landscape. Some organizations already use automated approaches, especially for systematic tasks involving pattern matching, but future efficacy will strongly depend on the reliability and resilience of the AI/ML systems that control them.

Data spotlight

In a Cornell University study, automated AI red teaming had a 69.5% success rate vs. 47.6% for human red teams.

Learn more about cybersecurity and AI on edX

No results.

Frequently asked questions

What is the role of red teaming in defending AI systems?

AI red teams adopt adversarial personas, probing and testing AI systems to find security vulnerabilities, biases, compliance issues, and other exploitable weaknesses before malicious actors do. They primarily work by feeding in engineered AI prompts and checking to see if they can elicit problematic responses.

Will AI eventually take over the work of cybersecurity red teams?

Experts believe AI will likely play an increasingly significant role in the future of red teaming. However, at present, it appears unlikely that AI will fully replace human cybersecurity professionals. While the trajectory of AI technology is difficult to predict, it is reasonable to expect that humans will continue to occupy important oversight and guidance roles, even if AI largely takes over task execution.

How can we ensure that AI is safe?

Experts recommend combining a transparent, safety-focused regulatory and compliance framework with best practices at the end-user level. The current lack of a standardized AI governance framework represents a major safety risk, and the topic remains a point of debate among technology experts, industry leaders, and government officials.

Can cybersecurity red teams keep AI safe?

Key takeaways

Cybersecurity red teams: What do they do?

Are there security concerns unique to AI?

Technology-based risks

Technology-based risks

Emerging and future AI risks

Red teaming AI: Opportunities and limitations

Data spotlight

Learn more about cybersecurity and AI on edX

Frequently asked questions

Share this article