Back to blog
Technical Guide

How to Monitor AI Agents in Your AWS Environment

February 20, 2026·9 min read

AI agents are making AWS API calls across your environment right now. If you use Claude Code, GitHub Copilot Workspace, LangChain, AutoGPT, or Amazon Bedrock agents, these systems are authenticating with IAM credentials and executing actions on your behalf.

Most security teams have zero visibility into this activity. The agents use existing IAM roles, blend in with normal service traffic, and operate at speeds that make manual review impractical.

This guide covers how to detect AI agents in your AWS environment, what to monitor, and how to build a response framework that scales.

The problem: invisible autonomous actors

Traditional security monitoring assumes that API calls come from either humans (via console or CLI) or known services (via application code). AI agents break this assumption. They operate under human user sessions or service roles, making API calls that look structurally identical to legitimate traffic.

A developer using Claude Code might trigger dozens of AWS API calls per minute: creating resources, reading configurations, modifying IAM policies. These calls are authenticated under the developer's credentials, but the developer is not making the decisions about which calls to execute.

  • The agent decides which APIs to call based on its reasoning, not explicit human instruction for each call.
  • Permissions are typically inherited from the human user or a shared service role, often broader than the agent needs.
  • Error handling and retry logic in agents can generate bursts of API calls that look like brute-force attempts to monitoring systems.
  • Multi-step operations (create role, attach policy, create function, configure trigger) happen in rapid succession without human review of each step.

Detecting AI agents via CloudTrail user-agent patterns

The most reliable method for identifying AI agent activity in AWS is analyzing the user-agent string in CloudTrail events. While not every agent sets a distinctive user-agent, many do, and this field is consistently logged.

CloudTrail records the user-agent for every API call. Human console activity uses the AWS Console user-agent. CLI activity uses the aws-cli user-agent. SDKs include their language and version.

AI agents often include identifiable strings in their user-agent. For example, Bedrock agents include "bedrock" in the user-agent. LangChain-based tools may include "langchain" or the specific framework name. Claude Code uses identifiable patterns when making AWS calls through its tool-use interface.

  • Filter CloudTrail events by userAgent field to identify known AI tool patterns.
  • Build a registry of expected user-agent strings for your environment and alert on new, unrecognized patterns.
  • Correlate user-agent strings with the IAM principal to identify which roles are being used by AI agents.
  • Monitor for user-agent strings that change mid-session, which can indicate an AI agent taking over a human session.

What to monitor: the five dimensions of agent behavior

Detecting that an AI agent exists is only the first step. You need to understand what it is doing and whether that behavior is expected. Monitor these five dimensions.

Agent Monitoring Dimensions

1API Call PatternsWhich APIs, how often, in what sequence
2Resource AccessWhich resources, first-time vs. repeated
3Role AssumptionsWhich roles, cross-account activity
4MCP Tool CallsExternal tool invocations, data flow
5Temporal PatternsTime of day, session duration, bursts

API call patterns reveal what the agent is trying to accomplish. An agent that normally reads CloudFormation stacks and writes deployment configs is operating within its expected scope. The same agent suddenly calling iam:CreateUser or sts:AssumeRole for unfamiliar accounts is a signal worth investigating.

Resource access tracking shows which specific resources each agent interacts with. First-time access to a resource (especially sensitive ones like secrets, databases, or IAM configurations) deserves higher scrutiny than repeated access to known resources.

Role assumptions are critical to monitor because they represent privilege transitions. An agent assuming a role in another account, or assuming a role with higher privileges than its starting context, is a potential escalation path.

Building behavioral baselines for AI agents

Static rules ("alert if more than 100 API calls per minute") produce too many false positives when applied to AI agents. Agents are inherently bursty. They might make 5 API calls in one session and 500 in the next, depending on the task.

Behavioral baselines solve this by learning what each specific agent (or agent-role combination) normally does and flagging deviations from that pattern.

  • Track the set of API actions each agent typically calls. New API actions outside this set get flagged.
  • Track resource access patterns. An agent that normally operates in us-east-1 suddenly accessing resources in eu-west-1 is anomalous.
  • Track timing patterns. Agents tied to CI/CD pipelines should be active during deployment windows, not at 3 AM on a Sunday.
  • Track error rates. A sudden spike in AccessDenied errors can indicate an agent probing for permissions it does not have.

Risk scoring methodology

Not every anomaly is a threat. Risk scoring helps prioritize which deviations require human attention and which can be handled automatically.

A useful scoring model combines three factors: the sensitivity of the resource being accessed, the severity of the deviation from baseline, and the privilege level of the identity involved.

  • Low risk: an agent accesses a new non-sensitive resource during business hours. Log it, update the baseline.
  • Medium risk: an agent calls an API it has never used before, but the API is read-only and the resource is not classified as sensitive. Alert the team.
  • High risk: an agent assumes a cross-account role, accesses a production database, or modifies IAM policies. This requires immediate human review.
  • Critical risk: an agent with elevated privileges exhibits multiple high-risk signals simultaneously. Consider automated containment.

Progressive response: when to alert vs. when to block

Blocking AI agents at the first sign of anomalous behavior will cripple developer productivity. Ignoring anomalies until a breach occurs defeats the purpose of monitoring. The right approach is progressive response.

Progressive response matches the intensity of the response to the confidence and severity of the detection. Low-confidence, low-severity signals get logged. High-confidence, high-severity signals trigger automated containment.

  • Monitor: log all agent activity with enriched context. No alerts, no disruption. This is your baseline-building phase.
  • Notify: send alerts to the security team for medium-risk anomalies. Include context: what the agent did, what its baseline looks like, and why this deviates.
  • Recommend: suggest specific actions (revoke role, restrict permissions, quarantine resource) with one-click execution.
  • Confirm: for high-risk detections, prepare the response action and wait for human approval before executing.
  • Autonomous: for critical, high-confidence detections (credential exfiltration, active privilege escalation), execute containment automatically and notify after the fact.

Getting started

You do not need to build all of this at once. Start by enabling CloudTrail logging across all accounts and regions if you have not already. Then build a query that identifies AI agent user-agent strings in your environment.

Once you know which agents are active and which roles they use, you can begin building baselines. Focus on the highest-privilege roles first. An AI agent with administrative access is a higher priority than one with read-only access to a staging environment.

The goal is visibility first, then detection, then response. Each phase builds on the previous one, and each phase delivers value independently.

Related articles