Independent notes on AI red teaming, prompt injection, and agent security

Learn AI Red Teaming Through Practice

AI red teaming in practice: prompt injection, tool misuse, sensitive data exposure, unsafe autonomy, and the design failures that appear when models are connected to real systems.

Starts With
System-level risk
Style
Practical + research-led
Includes
Guides + notes
Start Here

What AI red teaming means in practice

For AI systems, red teaming is not only about eliciting bad model outputs. It is about understanding how prompts, retrieved content, tools, memory, and automation can be manipulated or misused together.

What AI Red Teaming Means

AI red teaming is system-level adversarial review. It tests prompts, retrieved content, tools, memory, permissions, and model behavior together rather than treating the model in isolation.

Why AI Systems Fail Differently

Many failures are not classic vulnerabilities in the old sense. They emerge from instruction conflicts, hidden context, unsafe tool access, misplaced trust in retrieved data, and poorly bounded autonomy.

What Good Review Looks Like

Useful AI security work combines adversarial thinking with architecture review, trust-boundary analysis, and practical defensive design so teams know what to fix and why it matters.

Learning Paths

Core areas to explore next

Use these pages as structured introductions to the parts of AI security and red teaming that show up repeatedly in real systems.

/ai-application-security

AI Application Security

How LLM features change application threat models once prompts, retrieval, tools, memory, and downstream systems are tied together.

A clearer system-level threat model for AI featuresA better sense of where to add approvals, isolation, and monitoring
Open guide
/llm-red-teaming

LLM Red Teaming

How adversarial testing is applied to LLM-backed products, including harmful outputs, prompt breakouts, and misuse paths.

Better visibility into failure modes that matter in productionFaster break-fix loops between testing and engineering
Open guide
/prompt-injection-testing

Prompt Injection

The core attack pattern in modern AI applications: malicious instructions arriving through users, retrieved content, tools, or hidden context.

A practical mental model for prompt injection beyond slogansBetter design instincts around content trust boundaries
Open guide
/prompt-engineering-review

Prompt Engineering

Instruction design and prompt structure as part of the security boundary, not just a usability exercise.

Prompts that are easier to reason aboutLower variance when inputs become messy or adversarial
Open guide
/agent-security-review

Agent Security

Security basics for systems that can plan, use tools, persist state, and take actions across multiple steps.

A more grounded model for agent-specific riskBetter boundaries around tools and action execution
Open guide
/adversarial-ml-and-model-risk

Adversarial ML and Model Risk

A compact guide to adversarial ML concepts and how they connect to modern AI product security.

Cleaner distinctions between model risk and system riskBetter alignment between AI security and traditional security controls
Open guide
Topic Coverage

Prompt engineering, prompt injection, agent security, and more

These topic hubs connect introductory guidance with current research, incident patterns, and product-facing security lessons from the broader AI ecosystem.

AI Red Teaming

Methods, case studies, and tooling for red teaming AI systems end to end.

Open topic
Prompt Engineering

Prompt design patterns, instruction hierarchy, and defensive prompt construction.

Open topic
Prompt Injection

Prompt injection attacks, mitigations, detection, and design patterns for safer AI applications.

Open topic
Agent Security

Controls and attack paths for browsing, tool use, memory, identity, and action-taking agents.

Open topic
Model Evaluation

Safety evaluations, system cards, preparedness, and security measurement for frontier models.

Open topic
Adversarial ML

Adversarial machine learning attacks, taxonomies, and mitigations across the ML lifecycle.

Open topic
Featured Reading

Current material worth reading

Editorial Approach

How these notes are put together

  • I favor practical application risk over abstract model capability debates
  • I link to primary sources and add short notes on why they matter
  • I treat prompts, tools, memory, identity, and action boundaries as one attack surface
How to use the site: start with the learning pages, use the topics to branch into specific areas, and use the research archive when you want original sources and short context notes.
Recent Additions

Recent notes and references across AI security

Profile

Profile and contact

Focused on AI red teaming, prompt injection risk, agent security, and application-layer failures in LLM and agent systems.