Topic

AI Red Teaming

Methods, case studies, and tooling for red teaming AI systems end to end.

ai red teamingllm red teamingjailbreakadversarial testingpyritpromptfoo
Evergreen Overview

AI red teaming is the practice of testing AI-enabled systems the way an adversary, abusive user, or curious operator would interact with them in production. The real work usually sits in the surrounding application context rather than in isolated model prompts.

What AI red teaming includes
  • Prompt abuse, indirect injection, and trust-boundary failures
  • Tool misuse, privilege expansion, and unsafe action chains
  • System-level evaluation of how the model, workflow, and controls behave together
What teams usually need to answer
  • What an attacker can influence, read, or trigger through the model
  • Where approvals, isolation, monitoring, or policy controls are missing
  • Which failures are model problems versus product and architecture problems
Who this page is for
  • People studying AI evaluation and red-team programs
  • Product and platform teams launching copilots or agents
  • Leaders who need concrete examples of AI risk in operational systems
References

Current notes, events, and source material

These items are included because they add useful evidence, framing, implementation detail, or upcoming context for teams working in this area.

Microsoft Security Blog May 30, 2026 news

Malicious npm packages abuse dependency confusion to profile developer environments

A dependency confusion campaign leveraged 33 malicious npm packages to collect reconnaissance data from developer and build environments. This report details the attack chain, observed tradecraft, and detection opportunities to help organizations identify and disrupt related activity. The post Malicious npm packages ab

Microsoft Security Blog May 28, 2026 news

The Gentlemen ransomware: Dissecting a self-propagating Go encryptor

Microsoft Threat Intelligence presents a comprehensive analysis of The Gentlemen, a Go-based ransomware deployed by affiliates of Storm-2697 that combines per-file ephemeral key encryption with an aggressive self-propagation module to deploy itself across an entire network using series of simultaneous lateral movement

Microsoft Security Blog May 26, 2026 news

From poisoned search results to GPU mining: A cryptojacking campaign abusing ScreenConnect and Microsoft .NET utilities

Microsoft exposes a cryptojacking campaign using SEO poisoning and ScreenConnect to target high-performance PCs, with malicious sites also surfaced through AI chatbots. The post From poisoned search results to GPU mining: A cryptojacking campaign abusing ScreenConnect and Microsoft .NET utilities appeared first on Micr

Microsoft Security Blog May 22, 2026 news

Microsoft recognized as a Leader in The Forrester Wave™ for Workforce Identity Security Platforms

Microsoft has been recognized as a Leader in The Forrester Wave™: Workforce Identity Security Platforms, Q2 2026, receiving the highest scores in both the current offering and strategy categories. The post Microsoft recognized as a Leader in The Forrester Wave™ for Workforce Identity Security Platforms appeared first o

Microsoft Security Blog May 22, 2026 news

From edge appliance to enterprise compromise: Multi-stage Linux intrusion via F5 and Confluence

A multi-stage attack on Linux devices began with an exposed F5 BIG-IP edge appliance and pivoted to an internal Confluence server for credential theft and identity compromise. Learn how the threat actor attempted Kerberos relay and lateral movement, and how Microsoft Defender detected, blocked, and unraveled the attack

Microsoft Security Blog May 22, 2026 news

Microsoft Security success stories: How St. Luke’s and ManpowerGroup are securing AI foundations

How Frontier firms secure AI at scale: read how Microsoft customers embed governance, identity, and cloud security to make protection an enabler of AI growth. The post Microsoft Security success stories: How St. Luke’s and ManpowerGroup are securing AI foundations appeared first on Microsoft Security Blog .

Microsoft Security Blog May 20, 2026 news

Mini Shai Hulud: Compromised @antv npm packages enable CI/CD credential theft

Compromised @antv npm packages deploy the Mini Shai-Hulud payload to steal CI/CD secrets from Linux-based automation environments. The malware executes during npm install and targets credentials across GitHub, AWS, Kubernetes, Vault, npm, and 1Password platforms. The post Mini Shai Hulud: Compromised @antv npm packages

Microsoft Security Blog May 20, 2026 news

Introducing RAMPART and Clarity: Open source tools to bring safety into Agent development workflow

The AI systems shipping inside enterprises today are fundamentally different from the ones we were building even two years ago, because they have moved well past answering questions and into accessing your email, retrieving records from your CRM, writing and executing code, and taking actions on your behalf across doze

Microsoft Security Blog May 18, 2026 news

How Storm-2949 turned a compromised identity into a cloud-wide breach

Storm-2949 turned stolen credentials into a cloud-wide breach, moving from identity compromise to large-scale data theft without using malware. This incident shows how threat actors can exploit trusted systems to operate undetected. The post How Storm-2949 turned a compromised identity into a cloud-wide breach appeared

Microsoft Security Blog May 14, 2026 news

When configuration becomes a vulnerability: Exploitable misconfigurations in AI apps

Exposed UIs, weak authentication, and risky defaults could turn cloud-native AI apps on Kubernetes into potential targets by threat actors. Learn how exploitable misconfigurations lead to RCE and data leaks. The post When configuration becomes a vulnerability: Exploitable misconfigurations in AI apps appeared first on

Krebs on Security May 12, 2026 news

Patch Tuesday, May 2026 Edition

Artificial intelligence platforms may be just as susceptible to social engineering as human beings, but they are proving remarkably good at finding security vulnerabilities in human-made computer code. That reality is on full display this month with some of the more widely-used software makers -- including Apple, Googl

Anthropic Frontier Red Team April 7, 2026 news

Assessing Claude Mythos Preview’s cybersecurity capabilities

Claude Mythos Preview is a new general-purpose language model that is strikingly capable at computer security tasks. This post provides technical details for researchers and practitioners who want to understand exactly how we have been testing this model, and what we have found over the past month. We hope this will sh

Krebs on Security March 8, 2026 news

How AI Assistants are Moving the Security Goalposts

AI-based assistants or "agents" -- autonomous programs that have access to the user's computer, files, online services and can automate virtually any task -- are growing in popularity with developers and IT workers. But as so many eyebrow-raising headlines over the past few weeks have shown, these powerful and assertiv