Back to blog
Architecture

Progressive Trust: A Better Model for Cloud Security Automation

February 5, 2026·9 min read

Security automation has a trust problem. Teams either automate everything and hope nothing goes wrong, or they keep humans in the loop for every decision and drown in alert fatigue. Neither extreme works.

Fully manual security operations cannot keep pace with cloud-scale environments. An organization processing thousands of CloudTrail events per minute cannot have a human review each one. But fully autonomous security automation terrifies teams who have seen a misconfigured rule block production traffic.

Progressive trust offers a middle path. It defines five levels of automation autonomy and provides a framework for promoting actions through those levels as confidence grows.

The problem with binary automation

Most security automation falls into two categories. Detection-only tools that generate alerts for humans to triage, and response-automation tools that execute predefined actions automatically. Both have serious limitations.

Detection-only tools push the entire decision-making burden onto human analysts. In a cloud environment with hundreds of identities generating millions of events, the signal-to-noise ratio makes effective triage nearly impossible. Alert fatigue sets in quickly.

  • Teams start ignoring medium-priority alerts because they do not have bandwidth to investigate them all.
  • Response times for real incidents increase because the genuine signals are buried in noise.
  • Institutional knowledge about which alerts matter lives in the heads of senior analysts, not in the system.
  • Turnover creates gaps. When the analyst who tuned the alert rules leaves, the team loses context about why those rules exist.

Fully autonomous response tools have the opposite problem. They act fast but cannot account for context that a human would immediately recognize. A rule that says "block any IAM role that accesses more than 50 unique resources in an hour" might block a legitimate data migration. A rule that revokes credentials on suspicious activity might disrupt a critical batch processing job.

  • One bad automated action can cause a production outage that costs more than the threat it was designed to prevent.
  • Teams lose trust in automation after a single high-impact false positive and revert to manual processes.
  • There is no mechanism for the system to learn from its mistakes or for the team to gradually build confidence.

The five trust levels

Progressive trust replaces the binary choice with a spectrum. Each security action (revoking a session, restricting a role, quarantining a resource) operates at one of five trust levels. Actions can be promoted or demoted between levels based on performance.

Progressive Trust Levels

Level 1MonitorLog activity with enriched context. No alerts, no actions.
Level 2NotifyAlert the security team with context and recommended investigation steps.
Level 3RecommendSuggest a specific response action with one-click execution.
Level 4ConfirmPrepare the action and wait for human approval before executing.
Level 5AutonomousExecute the action automatically and notify after the fact.
Actions move up as confidence grows, down after false positives

Level 1: Monitor

Every action starts at Monitor. The system observes, logs, and enriches events, but takes no action and generates no alerts. This is the baseline-building phase where the system learns what normal behavior looks like for each identity.

Monitor mode is not passive. The system is actively building behavioral profiles, identifying patterns, and preparing detection logic. It is also where you validate that your data pipeline is working correctly and that CloudTrail events are flowing as expected.

  • Example: a new Lambda function is deployed. The system tracks every API call it makes for the first 14 days, building a behavioral baseline without generating any alerts.
  • Value: establishes the "normal" against which all future anomalies will be measured.
  • Duration: typically 1-4 weeks depending on the identity's activity frequency.

Level 2: Notify

Once a baseline is established, anomaly detection activates at the Notify level. The system sends alerts to the security team when it detects behavior that deviates from the baseline, but it does not suggest or take any action.

The key at this level is context. A notification that says "unusual API call detected" is useless. A notification that says "Lambda function order-processor called iam:CreateRole for the first time, normally it only calls dynamodb:PutItem and sqs:SendMessage" gives the analyst everything they need.

  • Example: a CI/CD role that normally deploys to us-east-1 makes API calls in ap-southeast-1. The team is notified with the full event context.
  • Value: the team starts building intuition about which anomalies are real threats and which are benign.
  • Promotion criteria: after 30+ notifications where the team consistently identifies the correct response, the action can move to Recommend.

Level 3: Recommend

At the Recommend level, the system not only detects anomalies but suggests a specific response action. "We detected X. Based on similar incidents, we recommend Y. Click here to execute."

This level reduces response time dramatically. Instead of investigating from scratch, the analyst reviews a pre-built recommendation and decides whether to execute it. The system learns from which recommendations are accepted and which are rejected.

  • Example: a service account starts accessing a database it has never touched. The system recommends restricting the account's permissions to its baseline resource set.
  • Value: response time drops from hours (investigation) to minutes (review and approve).
  • Promotion criteria: when recommendations are accepted without modification more than 90% of the time.

Level 4: Confirm

At the Confirm level, the system prepares the response action and queues it for execution, requiring only a single human approval to proceed. The action is fully staged. The human is reviewing and approving, not investigating.

This is the appropriate level for high-impact actions like revoking active sessions, modifying IAM policies, or quarantining compute resources. The automation handles the complexity of preparing the action. The human provides the final judgment call.

  • Example: an AI agent with elevated privileges starts assuming cross-account roles it has never used. The system prepares a session revocation and sends a confirmation request to the on-call security engineer.
  • Value: response time drops to the time it takes a human to review and click "approve."
  • Promotion criteria: when the human approves without modification more than 95% of the time, and the action has never caused a production impact.

Level 5: Autonomous

Autonomous is the highest trust level. The system detects the threat and executes the response automatically, notifying the team after the fact. Only actions that have been thoroughly validated at lower levels should reach this stage.

This level is reserved for high-confidence, time-sensitive scenarios where human latency would allow the attacker to cause significant damage. Credential exfiltration, active privilege escalation, and data exfiltration are candidates for autonomous response.

  • Example: an IAM access key that has been flagged as compromised (detected in a public repository) is automatically disabled. The team is notified after the key is disabled.
  • Value: response time drops to seconds, preventing damage that would occur during the minutes or hours of human response.
  • Demotion criteria: any false positive that impacts production immediately demotes the action back to Confirm or Recommend.

Building confidence over time

The power of progressive trust is in the promotion mechanism. Actions earn higher trust levels through consistent, accurate performance at lower levels. They lose trust immediately when they cause problems.

This asymmetry is intentional. Trust is earned slowly and lost quickly. A response action that works correctly 100 times earns promotion to the next level. A single false positive that impacts production sends it back down.

Over time, the system naturally converges on the right level of automation for each type of response action. Simple, low-risk actions (like logging additional context) reach Autonomous quickly. Complex, high-impact actions (like revoking production credentials) may stay at Confirm indefinitely, and that is fine.

Why this matters for AI agent oversight

Progressive trust is especially relevant for monitoring AI agents. AI agent behavior is less predictable than traditional service account behavior, which means detection systems will have higher false positive rates initially.

Starting every AI-agent-related response action at Monitor gives the system time to learn what normal AI agent behavior looks like. As the baselines mature and the detection logic improves, actions can be promoted through the trust levels.

This approach lets security teams adopt AI agent monitoring without the risk of automated responses disrupting developer workflows. It also creates an audit trail that demonstrates due diligence: every promotion decision is logged with the evidence that supported it.

Related articles