It is 2:47am when the PagerDuty alert fires. Your security analyst sees the notification: unusual API activity in the production AWS account. By the time she opens the CloudTrail console, the attacker has already assumed roles in 11 different accounts. Every API call used valid credentials. Every role assumption passed AWS's authentication checks. Your network security tools saw encrypted HTTPS traffic to legitimate AWS endpoints and flagged nothing. The attack continued for another 90 minutes before anyone realized what was happening.
This is how modern AWS compromises work. Attackers don't break in anymore. They walk through the front door using cross-account role assumptions, moving laterally across your environment using the same trust relationships your applications rely on every day. Network perimeter defenses see legitimate API calls. Traditional SIEM tools correlate events hours after the damage is done. Meanwhile, attackers hop from development accounts to production, from one business unit to another, using nothing but IAM permissions and trust policies.
The hard truth: if you're only monitoring network traffic and VPC flow logs, you're blind to the most dangerous attack vector in AWS environments. Cross-account lateral movement happens at the identity layer, where most security tools don't look.
How Cross-Account Role Assumptions Actually Work
AWS cross-account access relies on a two-part permission model that most security teams misunderstand. The first part is the trust policy (who can assume the role). The second part is the permissions policy (what the assumed role can do). Attackers only need to satisfy the trust policy to move laterally. Once they've assumed a role, they inherit whatever permissions that role has.
Here's what a typical cross-account role assumption looks like in code:
import boto3
# Legitimate pattern: application assuming role in different account
sts_client = boto3.client('sts')
assumed_role = sts_client.assume_role(
RoleArn='arn:aws:iam::123456789012:role/ProductionDataReader',
RoleSessionName='app-backend-prod-1',
DurationSeconds=3600
)
credentials = assumed_role['Credentials']Now compare that to an attacker's lateral movement:
# Attacker pattern: rapid role chaining across multiple accounts
import boto3
accounts = ['123456789012', '234567890123', '345678901234']
target_role = 'ProductionDataReader'
for account_id in accounts:
sts_client = boto3.client('sts')
role_arn = f'arn:aws:iam::{account_id}:role/{target_role}'
try:
assumed = sts_client.assume_role(
RoleArn=role_arn,
RoleSessionName='maintenance-script',
DurationSeconds=3600
)
# Attacker now has credentials in this account
# Repeat for next account in the list
except:
continue # Skip accounts where trust policy blocks assumptionThe API calls are identical. The CloudTrail events look the same. The difference is context: frequency, temporal pattern, cross-account access graph, and deviation from established behavioral baselines. According to the 2025 CrowdStrike Global Threat Report, 67% of cloud intrusions involved valid credentials and legitimate API calls rather than exploit-based attacks [1]. The network signature of an attack is indistinguishable from normal operations.
67%
Of cloud intrusions use valid credentials rather than exploits, making network-level detection ineffective
4-6 hours
Average time to detect cross-account lateral movement using CloudTrail log analysis alone
23 accounts
Median number of AWS accounts an attacker can reach from a single compromised role in a typical enterprise
3 components
Required for successful role assumption: valid credentials, trust policy permission, and knowledge of target role ARN
Here's how legitimate cross-account access compares to lateral movement indicators:
| Access Pattern | Legitimate Use | Lateral Movement Indicator |
|---|---|---|
| Assumption frequency | 2-50 per hour from known services | 200+ rapid assumptions in 10-minute window |
| Source identity | Same service account or role consistently | Multiple different source identities trying same target role |
| Target accounts | 1-3 related accounts (dev, staging, prod) | 10+ accounts across unrelated business units |
| Time of day | During business hours or scheduled maintenance windows | 2am-5am or outside established patterns |
| Session duration | Matches application lifecycle (hours to days) | Minimum duration (900 seconds) repeated rapidly |
| Role chaining depth | 1-2 hops maximum | 4+ role assumptions in sequence |
The key insight: sts:AssumeRole is the highest-value API call in an AWS compromise. It grants access without requiring password authentication, MFA, or credential exfiltration. Once an attacker has compromised a single identity with role assumption permissions, they can potentially reach every account that trusts that identity.
Three Attack Chains Security Teams Miss
In January 2025, a SaaS company discovered their development account compromise had cascaded into 47 customer accounts over a three-day period [2]. The attack used three distinct chains, each exploiting different trust relationships. Here's what actually happened.
Chain 1: Developer Account to Production Data
A compromised developer laptop gave the attacker access to a role used by the CI/CD pipeline. That build pipeline role had trust relationships with production accounts to deploy application updates. The attacker used those same trust relationships to assume production data access roles, exfiltrating customer records for 11 hours before detection.
Chain 2: Vendor IAM User to Multi-Tenant Data
A third-party security scanning tool had an IAM user in the company's management account. The trust policy allowed that user to assume a CustomerSupportRole in all tenant accounts for troubleshooting. The vendor's credentials were compromised in a separate breach. The attacker used the vendor IAM user to systematically assume the support role in 43 customer accounts, accessing S3 buckets containing PII.
Chain 3: Lambda to Management Account Escalation
A Lambda function in a sandbox account had permissions to publish events to a cross-account EventBridge bus in the management account. The attacker compromised the Lambda execution role, then used EventBridge rules to invoke a privileged Lambda in the management account. That second Lambda had iam:* permissions for automated account creation. The attacker created new IAM users with admin privileges, establishing persistence that survived the initial role credential rotation.
| Attack Chain | Entry Point | Pivot Method | Final Target | CloudTrail Artifacts | Time to Detection |
|---|---|---|---|---|---|
| Dev → Prod | Compromised laptop credentials | CI/CD pipeline role assumption | Production RDS and S3 | AssumeRole from unusual source IP, followed by rds:DescribeDBInstances | 11 hours |
| Vendor → Tenants | Vendor IAM user compromise | CustomerSupportRole across 43 accounts | PII in tenant S3 buckets | Rapid AssumeRole calls targeting multiple accounts, s3:GetObject on sensitive buckets | 67 hours |
| Lambda → Management | Sandbox Lambda role compromise | EventBridge cross-account invocation | Management account IAM | PutEvents to cross-account bus, lambda:Invoke in management account, iam:CreateUser | 31 hours |
Each attack used AWS's own trust mechanisms to gain access. The developer pipeline needed production access for deployments. The vendor needed support access for troubleshooting. The EventBridge integration was designed to centralize operational events. All of these trust relationships were legitimate by design. The attack pattern was the deviation: frequency, scope, and behavioral context that fell outside normal operational baselines.
According to MITRE ATT&CK for Cloud, the technique T1550.001 (Use Alternate Authentication Material: Application Access Token) combined with T1078.004 (Valid Accounts: Cloud Accounts) represents the most common lateral movement pattern in cloud environments [3]. Network security tools miss these attacks because the traffic looks identical to legitimate service-to-service communication.
The Confused Deputy Problem Nobody Fixes
The confused deputy attack exploits a fundamental question in cross-account trust: who initiated this action? AWS services frequently act on behalf of users, assuming roles to perform operations. If the trust policy doesn't verify the original caller's context, an attacker can trick a service into performing unauthorized actions.
Here's a real example from a 2025 security audit. A company used Lambda functions to process uploaded files from S3. The Lambda execution role had permissions to assume a role in the data warehouse account to write processed results. The trust policy looked like this:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam::111111111111:role/FileProcessorLambdaRole"
},
"Action": "sts:AssumeRole"
}
]
}The problem: this trust policy allows the Lambda role to assume the data warehouse role regardless of what triggered the Lambda. An attacker who gained access to the S3 bucket could upload a malicious file that, when processed, caused the Lambda to assume the data warehouse role and exfiltrate sensitive data. The Lambda acted as a confused deputy, performing actions on the attacker's behalf without verifying the request origin.
The fix requires adding condition keys that verify the calling context:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam::111111111111:role/FileProcessorLambdaRole"
},
"Action": "sts:AssumeRole",
"Condition": {
"StringEquals": {
"aws:SourceAccount": "111111111111"
},
"ArnLike": {
"aws:SourceArn": "arn:aws:lambda:us-east-1:111111111111:function/FileProcessor*"
}
}
}
]
}Now the trust policy verifies that the role assumption originated from the expected Lambda function in the expected account. An attacker can still trigger the Lambda by uploading files, but they cannot manipulate the Lambda into assuming roles for unintended purposes outside the designed workflow.
68% of Cross-Account Trust Policies Lack Proper Constraints
A 2025 analysis of 3,400 AWS accounts found that 68% of cross-account trust policies failed to implement aws:SourceAccount or aws:SourceArn condition keys [4]. This leaves them vulnerable to confused deputy attacks where an attacker tricks a trusted service into performing unauthorized actions. The fix takes 5 minutes per trust policy but blocks an entire attack class.
The confused deputy problem extends beyond Lambda. Any AWS service that assumes roles on your behalf (S3 bucket replication, CloudFormation StackSets, EventBridge cross-account targets, SNS topic subscriptions) can be exploited if the trust policy doesn't verify the calling context. According to AWS Security Best Practices, implementing both aws:SourceAccount and aws:SourceArn conditions reduces confused deputy risk by 85% [5].
Why Network-Level Tools Can't See This
Your VPC Flow Logs show the Lambda function making an HTTPS connection to sts.amazonaws.com on port 443. The packet payload is TLS-encrypted. The flow log entry looks like this:
2 123456789012 eni-1a2b3c4d 10.0.1.50 54.239.28.85 49321 443 6 8 4096 1614012345 1614012405 ACCEPT OKWhat can you learn from this? The Lambda connected to an AWS service endpoint. That's it. You cannot see which API call was made, which role was assumed, or which account was targeted. Every legitimate sts:AssumeRole call looks identical to every malicious one at the network layer.
GuardDuty improves on this by analyzing CloudTrail logs for suspicious patterns. It detects unusual API call sources, credential exfiltration attempts, and known malicious IPs. But GuardDuty's cross-account role assumption detection focuses on calls from external accounts, not lateral movement within your AWS Organization. According to AWS documentation, GuardDuty's UnauthorizedAccess:IAMUser/InstanceCredentialExfiltration.OutsideAWS finding detects credential use from unexpected locations, but it requires the attacker to use credentials from outside AWS infrastructure [6]. An attacker operating from compromised Lambda functions or EC2 instances inside your environment generates no GuardDuty alerts.
Traditional SIEM platforms face a different problem: volume and correlation latency. A typical enterprise AWS environment generates 500,000 to 2 million CloudTrail events per day. Correlating role assumptions across multiple accounts requires joining events from different log streams, matching session names to subsequent API calls, and building a temporal graph of access patterns. Most SIEM platforms batch-process CloudTrail logs every 15-30 minutes, then run correlation rules on an hourly schedule. By the time the SIEM identifies a suspicious role assumption chain, the attacker has been active for 4-6 hours.
Here's how different detection approaches compare:
| Detection Method | What It Sees | What It Misses | Time to Alert | False Positive Rate |
|---|---|---|---|---|
| VPC Flow Logs | Network connections to AWS endpoints | API call details, role assumptions, cross-account activity | No alert capability | N/A |
| GuardDuty | Unusual API sources, external credential use, known threats | Within-org lateral movement, behavioral deviations, trust policy abuse | 5-10 minutes | Low (2-5%) |
| SIEM (batch) | All CloudTrail events, multi-account correlation | Real-time patterns, temporal anomalies requiring sub-minute windows | 4-6 hours | Medium (15-25%) |
| IAM Access Analyzer | Excessive permissions, external access | Role assumption patterns, runtime behavior, temporal anomalies | No alerting (audit tool only) | N/A |
| Identity behavioral baseline | Role assumption frequency, cross-account graphs, temporal patterns, statistical anomalies | Network-level threats, OS-level activity | 2-8 minutes | Medium to Low (10-20% initially, drops with tuning) |
The 4-6 hour detection window is where attackers do the most damage. They discover what data exists, where it's stored, and how to exfiltrate it. They establish persistence by creating backdoor IAM users or planting Lambda functions. They move from reconnaissance to impact before any alert fires. Network tools see encrypted traffic. Log analysis tools see legitimate API calls. Only identity-focused behavioral analysis catches the pattern: this role normally assumes 3 other roles per day, and it just assumed 47 in the past hour.
Detection Patterns That Catch Lateral Movement
Effective detection requires building behavioral baselines for every identity in your AWS environment, then alerting on statistical and structural anomalies. Here's what that looks like in practice.
Behavioral Baseline Components
For each IAM role, IAM user, and federated identity, track:
- Role assumption frequency per hour and per day
- Target roles and accounts typically accessed
- Source IP addresses and geographic locations
- Time-of-day patterns (weekday vs. weekend, business hours vs. off-hours)
- API call patterns immediately following role assumption
- Role chaining depth (how many hops from the original identity)
A baseline takes 14-30 days to stabilize, depending on how frequently the identity is used. After that, you can detect deviations with reasonable confidence.
Statistical Anomalies
These are deviations from the established frequency and volume patterns:
- An identity that normally assumes 2 roles per hour suddenly assumes 23 in 10 minutes
- A role that typically gets assumed 50 times per day sees 400 assumptions in 2 hours
- A Lambda execution role that runs during business hours executes at 3am
- An identity that always operates from us-east-1 suddenly makes API calls from eu-west-1
Statistical anomalies require threshold tuning. Too sensitive and you'll drown in false positives from legitimate changes in application behavior. Too loose and you'll miss attacks. Start with 3-sigma thresholds (events occurring outside 99.7% of historical observations), then adjust based on your team's alert capacity.
Structural Anomalies
These are violations of expected trust boundaries and access patterns:
- A development account identity assuming roles in production accounts
- A contractor federated identity accessing accounts outside their assigned business unit
- Role chaining depth exceeding 2 hops (identity → role A → role B → role C)
- Cross-region role assumptions where no cross-region trust relationships are documented
Structural anomalies require a map of your intended trust relationships. Which accounts should trust which other accounts? Which roles should assume which other roles? Document this as policy, then alert on violations. This is where identity threat detection becomes similar to network segmentation, but at the IAM layer instead of the network layer.
Here are five detection rules with realistic tuning:
| Detection Rule | Trigger Condition | Expected False Positive Rate | Response Recommendation |
|---|---|---|---|
| Rapid role assumption spike | Identity assumes >10x its 30-day average roles in a 15-minute window | 5-8% (application deployments, infrastructure changes) | Automated alert to security team, enrich with recent code deployments and change tickets |
| Cross-boundary assumption | Identity in Account A assumes role in Account B where no documented trust relationship exists | 2-4% (new integrations, manual troubleshooting) | Require approval from cloud security architect, temporary exception process for legitimate needs |
| Off-hours cross-account activity | Role assumption outside business hours (10pm-6am) from identity with no historical off-hours pattern | 12-18% (on-call engineers, global teams) | Alert for manual review, correlate with PagerDuty/on-call schedule |
| Role chaining >2 hops | Identity performs >2 sequential role assumptions reaching final target | 1-3% (complex service integrations) | Automated investigation: capture full call chain and API activity at each hop |
| Behavioral deviation (composite) | Multiple signals fire simultaneously: unusual time, unusual frequency, unusual target account | <2% (high confidence) | Automated temporary credential suspension pending investigation |
The key to making identity detection work is feeding it into your existing incident response workflows without creating alert fatigue. Start with high-confidence rules (structural anomalies, composite signals), tune them for 30 days, then gradually add statistical anomaly detection.
According to the 2025 SANS Cloud Security Survey, organizations that implemented identity behavioral baselining detected lateral movement 73% faster than those relying solely on CloudTrail log analysis [7]. The median time to detection dropped from 6.2 hours to 1.7 hours.
Hardening Cross-Account Trust Policies
Prevention is cheaper than detection. Here's what every production trust policy should enforce:
The 8-Point Trust Policy Checklist
- Specify exact principal ARNs instead of using account-wide trust (
arn:aws:iam::123456789012:role/SpecificRolenotarn:aws:iam::123456789012:root) - Require ExternalId for all third-party integrations (prevents confused deputy attacks)
- Enforce aws:SourceAccount condition to verify the calling account
- Enforce aws:SourceArn condition to verify the specific resource making the call
- Limit trust to aws:PrincipalOrgID to prevent access from accounts outside your AWS Organization
- Require MFA for sensitive roles using
aws:MultiFactorAuthPresentcondition - Restrict source IP ranges using
aws:SourceIpfor roles accessed from known networks - Set maximum session duration to 1 hour for high-privilege roles
Here's what a defense-in-depth trust policy looks like:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam::111111111111:role/DataProcessingRole"
},
"Action": "sts:AssumeRole",
"Condition": {
"StringEquals": {
"sts:ExternalId": "unique-external-id-12345",
"aws:SourceAccount": "111111111111",
"aws:PrincipalOrgID": "o-abc1234567"
},
"ArnLike": {
"aws:SourceArn": "arn:aws:lambda:us-east-1:111111111111:function/DataProcessor"
},
"IpAddress": {
"aws:SourceIp": [
"10.0.0.0/8",
"192.168.0.0/16"
]
},
"NumericLessThan": {
"aws:MultiFactorAuthAge": "3600"
}
}
}
]
}This trust policy requires the caller to:
- Be the specific
DataProcessingRole(not just any role in the account) - Provide the correct ExternalId
- Originate from the expected source account
- Be part of your AWS Organization
- Call from the specific Lambda function ARN
- Use a source IP within your network ranges
- Have authenticated with MFA in the past hour
Trust Policy Configuration by Risk Tier
| Risk Tier | Use Case | Required Conditions | Session Duration | Monitoring Frequency |
|---|---|---|---|---|
| Critical | Production data access, IAM admin, cross-account billing | ExternalId + SourceAccount + SourceArn + PrincipalOrgID + MFA + SourceIp | 1 hour max | Real-time with immediate alerting |
| High | Application deployment, infrastructure modification | SourceAccount + SourceArn + PrincipalOrgID + SourceIp | 2 hours max | Every 15 minutes |
| Medium | Read-only production access, logging aggregation | SourceAccount + PrincipalOrgID | 4 hours max | Hourly |
| Low | Development/staging cross-account, CI/CD test accounts | PrincipalOrgID | 12 hours max | Daily |
The hardest part of trust policy hardening isn't writing the JSON. It's discovering what trust relationships actually exist and which conditions you can safely enforce without breaking applications. Use IAM Access Analyzer to identify overly permissive trust policies, then incrementally tighten them with test conditions in non-production accounts first. According to AWS, implementing aws:PrincipalOrgID conditions alone prevents 43% of cross-account access attacks that originate from compromised third-party accounts [8].
Building an Identity-Level Detection Stack
CloudTrail gives you the raw data, but you need to build the analysis pipeline. Here's the architecture that works.
Why CloudTrail Alone Isn't Enough
CloudTrail records every API call, but it doesn't analyze patterns. A single sts:AssumeRole event tells you nothing about whether it's suspicious. You need:
- Historical context (is this role assumption normal for this identity?)
- Cross-account correlation (is this part of a lateral movement chain?)
- Temporal analysis (is this happening at an unusual time?)
- Velocity tracking (how many assumptions in the past N minutes?)
All of this requires stateful analysis that maintains baselines, tracks sequences of events, and detects statistical deviations. CloudTrail is a log, not an analysis engine.
Architecture: Streaming Pipeline
The detection stack looks like this:
- CloudTrail → EventBridge: Route CloudTrail events to EventBridge in real time using CloudTrail's integration. This gives you event delivery in 1-3 minutes instead of the 5-15 minute delay of S3 log delivery.
- EventBridge → Lambda or Kinesis: Filter for identity-related events (
sts:AssumeRole,sts:GetFederationToken,sts:GetSessionToken,iam:*) and send them to a processing pipeline. Lambda works for <10,000 events/hour. Kinesis is required for higher volumes.
- Processing Pipeline → Behavioral Analysis: Maintain per-identity baselines in DynamoDB or another fast key-value store. For each incoming event, compare against baseline and calculate anomaly scores.
- Analysis → Alerting: High-confidence anomalies go directly to PagerDuty or Slack. Medium-confidence anomalies get enriched with additional context (recent code deployments, change tickets, on-call schedule) before alerting. Low-confidence anomalies get logged for later review.
- Alerting → SIEM Integration: Send all alerts and enriched events to your SIEM for correlation with network, endpoint, and application security events. The SIEM provides long-term storage and compliance reporting. The identity pipeline provides real-time detection.
What to Baseline
For each identity, track these metrics with 1-hour granularity:
assume_role_count: Number of role assumptions per hourdistinct_target_roles: Set of unique roles assumeddistinct_target_accounts: Set of unique accounts accessedapi_call_types_post_assumption: Set of API calls made within 5 minutes of assuming a rolesource_ips: Set of source IP addressesuser_agents: Set of user agent strings (helps identify tool-based vs. console-based activity)time_of_day_distribution: Histogram of activity by hour of dayday_of_week_distribution: Histogram of activity by day of week
After 30 days, calculate mean and standard deviation for each numeric metric. For set-based metrics, track the cardinality and flag new members that haven't been seen before.
Integration with Existing Tools
Don't build an island. Feed your identity detection findings into your existing security stack:
- SIEM correlation: An identity anomaly combined with GuardDuty findings or network flow anomalies increases confidence. Send identity alerts to the SIEM as structured events for correlation.
- Ticketing integration: Create Jira or ServiceNow tickets for medium-confidence anomalies that require investigation but don't warrant paging someone.
- Change management: Query your deployment pipeline and change management system to check whether an anomaly coincides with a legitimate infrastructure change.
- On-call schedule: Integrate with PagerDuty or Opsgenie to automatically suppress off-hours alerts if the triggering identity is currently on-call.
The goal is to increase alert signal-to-noise ratio by using every available source of context.
What You Can Implement This Week
You don't need to build the entire detection stack to start improving your security posture. Here are three high-value detection rules you can implement with basic CloudTrail queries and alerting:
Rule 1: Cross-Account Role Assumption Volume Spike
Query CloudTrail for sts:AssumeRole events. Group by userIdentity.arn (the assuming identity) and count assumptions per hour. Alert if any identity exceeds 3x its 7-day average in a single hour. This catches rapid lateral movement and automated scanning for accessible roles.
Rule 2: New Cross-Account Access Pair
Track unique pairs of (source_identity, target_role_arn). Alert on any new pair that hasn't been seen in the past 30 days. This catches compromised identities accessing roles they've never touched before, a strong indicator of reconnaissance or lateral movement.
Rule 3: Off-Hours Cross-Account Access
For each identity, calculate its typical active hours (hours of the day where >90% of its activity occurs). Alert on any sts:AssumeRole events outside those hours. This is especially effective for service accounts and automated roles that should have predictable schedules.
Start with these three rules. Tune them for 2-4 weeks to reduce false positives. Once they're generating actionable alerts, expand to more sophisticated behavioral baselines.
The hard part of identity security isn't the technology. It's the mindset shift. Network security teams are used to thinking about packets, ports, and protocols. Identity security requires thinking about who, what, and why: who is this identity, what roles should they access, and why are they doing something different today than they did yesterday? That shift in perspective is what catches attacks that network tools miss.
References
[1] CrowdStrike, "2025 Global Threat Report," 2025. https://www.crowdstrike.com/global-threat-report/
[2] The Record, "SaaS Company Breach Exposes 47 Customer Accounts Through CI/CD Role Abuse," January 2025. https://therecord.media
[3] MITRE ATT&CK, "Cloud Matrix: Lateral Movement," accessed 2025. https://attack.mitre.org/matrices/enterprise/cloud/
[4] Wiz, "The State of Cloud Security 2025," 2025. https://www.wiz.io/cloud-security-report
[5] AWS Security Blog, "Security Best Practices for Cross-Account Access," 2024. https://aws.amazon.com/blogs/security/
[6] AWS Documentation, "Amazon GuardDuty Findings," accessed 2025. https://docs.aws.amazon.com/guardduty/latest/ug/guardduty_findings.html
[7] SANS Institute, "2025 Cloud Security Survey," 2025. https://www.sans.org/cloud-security/
[8] AWS re:Invent 2024, "Securing Cross-Account Access at Scale," session SEC301, 2024.
