You are three weeks into evaluating Identity Threat Detection and Response platforms. You have sat through eight vendor demos. Every one showed a dashboard detecting a suspicious admin login from a new geography. Every one flagged a user with too many permissions. Every one claimed "comprehensive identity coverage" and "AI-powered detection." Then you ask to see how the platform handles your 4,000 AWS Lambda execution roles, your GitHub service accounts, and your CI/CD pipeline credentials. The room goes quiet. The sales engineer pivots to "roadmap items" and "professional services engagements."
This is where most ITDR evaluations fail. Security teams treat identity threat platforms like they treated SIEM a decade ago: focus on alert volume, dashboard aesthetics, and integrations count. But ITDR is not SIEM with an identity filter. It is a fundamentally different security control that must answer a question SIEM never tried to solve: which of your 50,000 identities (human and machine) poses the highest breach risk right now, and what automated action can contain that risk without breaking production? If your evaluation criteria do not force vendors to answer that question with specificity, you will buy a platform that detects compromises but cannot respond to them.
Here is the practitioner's framework for evaluating ITDR platforms, built from evaluations at companies running 5,000 to 50,000 cloud identities. This is not a feature checklist. It is the set of questions that separate platforms built for real cloud environments from those built for demo environments.
Identity Coverage: The First Filter That Eliminates 60% of Vendors
Most ITDR platforms excel at monitoring human identities. SSO integration is straightforward. MFA bypass detection is table stakes. Privilege escalation tracking for admin accounts is a solved problem. Every vendor can show you a dashboard of user logins, session durations, and permission changes.
Then you ask about non-human identities. The conversation changes fast. Can the platform monitor AWS IAM roles? Lambda execution roles? ECS task roles? Kubernetes service accounts? GitHub personal access tokens? CI/CD system credentials? Most vendors claim coverage, but when you ask for a demo using your actual environment, the coverage gaps become obvious. They can list service accounts, but they cannot tell you which team owns them, which application uses them, or what the risk is if one gets compromised.
The ownership problem is where platforms fail hardest. A list of 4,000 service accounts is not identity coverage. It is inventory. Identity coverage means mapping every non-human identity to a team, an application, a repository, or a workflow. It means answering questions like "which service accounts can this developer provision?" and "what production data can this CI/CD pipeline access?" If the platform cannot build that context graph, it will generate alerts your team cannot triage because they lack the business context to prioritize response.
AI agent identity monitoring is the newest gap. LangChain agents request AWS credentials. Bedrock workflows access S3 buckets. AI runners execute code with inherited permissions. If your platform cannot detect when an AI agent exceeds its intended scope or requests credentials it has never used before, you have a blind spot that attackers will exploit [1]. The platforms leading this space treat AI agents as a distinct identity class with unique behavioral patterns and risk profiles.
The Coverage Test Question
Ask the vendor: "Show me every identity that can assume the production database role, including transitive paths through role chaining." If they cannot visualize the full privilege path within 60 seconds, their coverage is incomplete.
Human identity baseline requirements include SSO integration with Okta, Azure AD, or Google Workspace, MFA bypass detection, privilege escalation tracking (both permanent and temporary), session anomaly detection (location, device, time), and access certification workflows for quarterly reviews. Every vendor will check these boxes. Do not spend evaluation time here.
Non-human identity coverage gaps are where you differentiate platforms. Test coverage for AWS IAM roles (including cross-account roles), Lambda execution roles, ECS task roles, EKS service accounts, RDS and Redshift credentials, Secrets Manager and Parameter Store secrets, GitHub personal access tokens and deploy keys, GitLab runner tokens, Jenkins service accounts, and Terraform/CDK automation credentials. Ask to see the platform ingest your actual CloudTrail logs and build the identity context graph. If they cannot demo this with your data, they cannot do it in production.
23%
of cloud breaches start with compromised service accounts, not human credentials (CrowdStrike 2025 Threat Report) [2]
4,200
median number of non-human identities per 1,000 employees in AWS environments (Detectory customer data, 2025)
87%
of organizations cannot identify which team owns more than half of their service accounts (Cloud Security Alliance survey, 2025) [3]
Detection Methodology: Rules, ML, or Hybrid (And Why It Matters)
Detection methodology determines whether your platform catches real threats or drowns you in false positives. Most vendors pitch "AI-powered detection" without explaining what that means. Press them on the architecture. You will find three approaches: rule-based detection, pure machine learning, and hybrid detection with per-identity behavioral baselines.
Rule-based detection is fast and explainable. When an IAM role is used from a new IP, a rule fires. When a user escalates to admin, a rule fires. When a service account accesses 50 S3 buckets in 10 minutes, a rule fires. Rules catch known attack patterns with zero false positives if you tune them correctly. The problem is maintenance debt. Your environment changes constantly. New services launch. New identities provision. New automation workflows start using credentials in patterns your rules never anticipated. Within six months, you have 200 rules, 40 exceptions, and a backlog of tuning requests.
Pure machine learning approaches adapt to new patterns without rule updates. The platform baselines normal behavior, then flags statistical anomalies. This works well for detecting novel attacks. The problem is false positive rates. ML models flag legitimate but unusual activity: the service account that only runs monthly, the developer who works night shifts, the automation job that scales up during product launches. Without human-in-the-loop feedback, these models generate alert fatigue. Your analysts spend more time dismissing false positives than investigating real threats [4].
Hybrid detection with per-identity behavioral baselines combines the strengths of both approaches. The platform learns normal behavior for each identity individually (not just aggregate patterns), uses rules for known high-confidence threats, and applies ML to detect deviations from the per-identity baseline. When a service account that normally accesses three S3 buckets suddenly accesses 30, the platform flags it. When a Lambda role that usually runs for 2 minutes runs for 45 minutes, the platform flags it. The key is individualized baselines, not one-size-fits-all thresholds.
The baseline quality question determines how fast the platform becomes useful. How long does it take to establish a reliable baseline for a new identity? What happens with identities that only authenticate quarterly? How does the platform handle seasonal patterns (month-end batch jobs, annual compliance scans)? The best platforms start generating reliable signals within 7 to 14 days and explicitly handle low-frequency identities by extending the baseline window.
Detection latency requirements vary by threat type. Credential exfiltration requires real-time detection (under 60 seconds). Privilege escalation can tolerate near-real-time detection (under 5 minutes). Anomalous access patterns can use batch analysis (every 15 minutes). Ask the vendor to define their latency guarantees for each threat category. If they claim "real-time for everything," they are either lying or burning unnecessary compute costs.
| Detection Approach | Strengths | Weaknesses | Best For |
|---|---|---|---|
| Rule-Based | Fast, explainable, zero false positives when tuned correctly, audit-friendly | High maintenance debt, blind to novel attacks, environment-specific tuning required | Known attack patterns, compliance-driven detection, low-tolerance for false positives |
| Pure ML | Adaptive to new patterns, detects novel attacks, minimal rule maintenance | High false positive rate, lacks explainability, requires significant training data | Threat hunting, research environments, teams with capacity to triage ambiguous alerts |
| Hybrid (Per-Identity Baselines) | Low false positive rate, detects novel attacks, explainable deviations, scales with identity count | Requires baseline establishment period, more complex architecture, harder to validate | Production environments with high identity counts, teams needing automated response |
Response Automation Depth: The 5-Level Maturity Model
Detection without response is not ITDR. It is alerting. The depth of automation a platform provides determines whether it multiplies your security team's capacity or just adds another alert feed. Most vendors demo Level 2 or 3 automation and imply they offer Level 5. Force them to define their automation maturity with specificity.
Level 1 (Alert Only) means the platform detects threats and sends notifications to Slack, PagerDuty, or a SIEM. All response actions are manual. An analyst sees the alert, opens a runbook, manually suspends the session or revokes the credentials, then documents the incident. This is not ITDR. This is detection without response. If a platform cannot automatically respond to at least high-confidence threats, it does not solve the problem ITDR exists to solve.
Level 2 (Guided Response) means the platform suggests actions and provides runbooks, but the analyst executes every step. The platform might show "Recommended Action: Revoke temporary credentials for role X" with a button to copy the AWS CLI command. The analyst still has to run the command, verify the result, and check for side effects. This is better than pure alerting, but it does not scale. If you have 50 identity incidents per week, your analysts spend 20 hours per week executing manual response steps.
Level 3 (Semi-Automated Response) means the platform can execute response actions on analyst approval. The platform detects a compromised session, presents the evidence, and offers a "Revoke Session" button. The analyst reviews the context and clicks. The platform executes the revocation API call, logs the action, and confirms success. This reduces response time from 15 minutes to 2 minutes, but still requires analyst intervention for every incident.
Level 4 (Policy-Driven Automation) means the platform automatically responds to defined threat scenarios based on risk score and policy rules. You configure policies like "If a service account accesses more than 20 S3 buckets in 10 minutes AND it has never accessed more than 5 buckets before, automatically revoke its temporary credentials and notify the owning team." The platform executes the response, logs the action, and escalates to an analyst only if the response fails. This is where ITDR becomes force-multiplying. A team of three analysts can handle 500 identity incidents per week because 80% resolve automatically.
Level 5 (Adaptive Automation) means the platform learns from analyst decisions and adjusts response policies over time. When an analyst overrides an automatic revocation, the platform learns that context and adjusts the risk score or policy threshold. When an analyst approves a suggested action repeatedly, the platform automates it going forward. This requires sophisticated feedback loops and policy recommendation engines. Few platforms offer this today, but it is the long-term target for mature ITDR programs.
The rollback question is critical. If the platform auto-revokes credentials for a false positive, how fast can you restore access? If the answer is "open a ticket with our support team," the platform is not production-ready. The best platforms provide immediate rollback with full audit trails, so analysts can restore access with one click if they determine the action was incorrect.
Cloud-Native Integration: Beyond API Connections
Integration depth determines whether the platform understands your cloud environment or just ingests logs. Every vendor claims "cloud-native integration," but when you ask how they parse CloudTrail events, the answers vary wildly. AWS CloudTrail emits 400+ event types. Most platforms parse the common 20 (IAM actions, S3 access, EC2 changes) and ignore the rest. That means they miss events like Secrets Manager GetSecretValue, KMS Decrypt, STS AssumeRoleWithWebIdentity, and Lambda InvokeFunction with cross-account roles.
AWS service coverage depth varies dramatically across platforms. Ask which services have first-class monitoring (meaning the platform understands the service's identity model, permission model, and common attack patterns). Baseline coverage includes IAM, STS, Organizations, SSO, Secrets Manager, KMS, Lambda, ECS, and EKS. Advanced coverage includes RDS IAM authentication, Redshift IAM roles, Glue job roles, Step Functions state machine roles, and API Gateway IAM authorizers. If your environment uses a service and the platform does not monitor it, you have a blind spot.
Cross-account monitoring architecture separates platforms built for single-account demos from those built for enterprise environments. Can the platform track AssumeRole chains across 50+ AWS accounts without deploying infrastructure in every account? Does it require a spoke-and-hub model, or can it ingest CloudTrail from a centralized logging account? How does it handle AWS Organizations with hundreds of accounts? The best platforms use centralized CloudTrail ingestion with cross-account role analysis, so you deploy once and monitor everywhere.
Multi-cloud reality is that most organizations run AWS plus at least one other cloud. If you run AWS and GCP, or AWS and Azure, does the platform provide unified identity context or siloed views? Can it detect when a developer uses the same compromised credential in AWS and GCP? Can it correlate an identity across clouds (same email, different IAM principal)? Most platforms claim multi-cloud support but deliver separate dashboards with no cross-cloud correlation. That is not multi-cloud. That is multiple single-cloud tools under one license.
The Terraform and CDK integration test reveals whether the platform understands your infrastructure-as-code workflows. Can it ingest your IAM-as-code repositories to understand intended permissions versus actual permissions? Can it detect when a developer manually creates a role that contradicts the Terraform state? Can it flag drift between your IaC definitions and your runtime environment? This level of integration is rare, but it is the future of ITDR for teams that manage identity as code [5].
The 12 Questions Vendors Don't Want to Answer
These questions force vendors to reveal platform limitations they hide during demos. Do not accept vague answers. Demand specifics, references to customer environments, or live demonstrations with your data.
1. How many service accounts does your largest customer monitor, and what is their alert-to-incident ratio? This reveals whether the platform scales to real enterprise environments. If the vendor cannot answer, they either do not have large customers or their customers are drowning in alerts.
2. Show me a real-world lateral movement detection that involved role chaining through three AWS accounts. Most vendors demo simple scenarios. Real attacks use complex privilege paths. If they cannot show this, their detection depth is shallow.
3. What percentage of your detections are behavioral versus rule-based, and how do you validate ML model accuracy? This forces transparency about detection methodology. If they cannot quantify it, they do not measure it.
4. If I have 10,000 Lambda functions with unique execution roles, how does your pricing model work? Some vendors charge per identity. That pricing model breaks at scale. Force them to define the cost for your actual environment.
5. How long does it take to baseline a new identity, and what is the false positive rate in the first 30 days? This reveals whether the platform is useful from day one or requires months of tuning.
6. Can you detect when a compromised CI/CD pipeline provisions a new IAM role with admin permissions? This is a common attack vector. If the platform cannot detect it, you have a critical gap.
7. What is your MTTR for a credential compromise scenario from detection to credential revocation? This measures response speed. If the answer is "it depends on how fast your analysts respond," the platform lacks automation.
8. How do you handle identities that only authenticate during monthly batch jobs? Low-frequency identities break baseline models. If the vendor has no answer, their ML approach is immature.
9. Show me your incident investigation workflow for a service account that suddenly accessed 50 S3 buckets. This reveals whether the platform provides investigation context or just raw alerts.
10. What happens when your platform goes down? Do I lose response capability or just detection? Single points of failure are unacceptable. The platform must degrade gracefully.
11. How do you differentiate between legitimate automation and malicious automation using the same service account? This is the hardest problem in ITDR. If they cannot answer, they generate false positives on automation.
12. What compliance frameworks do you map to, and do you provide audit-ready evidence or just logs? This determines whether the platform reduces audit burden or creates it. Logs are not evidence. Evidence is correlated, timestamped, and mapped to control requirements.
Compliance Mapping and Audit Readiness
Compliance requirements drive ITDR adoption faster than pure threat detection. If your platform cannot generate audit-ready evidence, it does not reduce compliance burden. It adds another data source your auditors will question.
SOC 2 Type II requirements include privileged access monitoring (every use of admin credentials must be logged and reviewed), session recording (some interpretations require video, most accept detailed logs), and access certification workflows (quarterly reviews of who has access to what). Ask the vendor how their platform generates evidence for each control. If the answer is "export CloudTrail logs and build reports manually," the platform does not support compliance. It supports detection.
PCI-DSS identity controls include MFA enforcement for all access to cardholder data environments, quarterly access reviews, and least privilege validation. PCI auditors want to see evidence that you review and remediate excessive permissions. Ask how the platform generates that evidence. The best platforms produce audit reports that map directly to PCI requirements with evidence timestamps, reviewer signatures, and remediation tracking.
ISO 27001 evidence collection is broader and more flexible than PCI, but auditors still want structured evidence. Does the platform generate audit reports, or do you need to export raw logs and build them manually? Can you filter evidence by control domain (A.9 Access Control, A.12 Operations Security)? Can you show year-over-year improvement in identity hygiene metrics?
GDPR identity data handling affects where the platform stores identity metadata. If you operate in the EU, where does the vendor store your data? Do they offer EU-only data residency? How do you handle data subject access requests (show me every log entry for user X)? How do you handle data deletion requests? Most platforms are not designed for GDPR compliance because they focus on US customers. If you have EU operations, this is a deal-breaker question.
The audit question is the simplest test. When your auditor asks "show me every time this admin role was used in Q4," can you answer in 5 minutes or 5 hours? If the answer is 5 hours, the platform creates audit burden instead of reducing it. The best platforms have audit report templates for every major framework, so you generate evidence with a few clicks instead of building spreadsheets manually.
| Compliance Framework | Key Identity Controls | Evidence Requirements | Platform Capability to Evaluate |
|---|---|---|---|
| SOC 2 Type II | Privileged access monitoring, session logging, access reviews | Timestamped logs, review signatures, remediation tracking | Pre-built audit reports, control mapping, evidence export |
| PCI-DSS | MFA enforcement, quarterly access reviews, least privilege validation | Evidence of MFA use, access review reports, permission change logs | PCI control templates, automated access reviews, risk scoring |
| ISO 27001 | Access control policies, operations security, user access management | Control-mapped evidence, year-over-year metrics, policy documentation | ISO domain filtering, trend reports, policy versioning |
| GDPR | Data subject rights, data residency, access logging | Subject access reports, deletion logs, EU data residency options | EU-only deployment option, subject access workflows, retention policies |
Summary: The Three Non-Negotiables
After evaluating ITDR platforms for 18 months across three organizations, I distilled the checklist to three non-negotiables. If a platform fails any of these, it is not ready for production environments with real identity complexity.
First: Identity coverage must be binary. If the platform monitors human SSO logins but treats service accounts as second-class citizens, it will miss the attacks that matter. 23% of breaches start with compromised service accounts, not human credentials [2]. Your platform must monitor every identity type with equal rigor: AWS IAM roles, Lambda execution roles, Kubernetes service accounts, GitHub tokens, CI/CD credentials, and AI agent identities. Coverage means more than inventory. It means ownership mapping, risk scoring, and behavioral baselining per identity.
Second: Detection methodology determines whether you catch novel attacks or drown in false positives. Rule-based engines catch known patterns but miss new techniques. Pure ML generates alert fatigue. Hybrid approaches with per-identity behavioral baselines separate platforms built for real environments from those built for demos. Force the vendor to define their detection architecture, baseline establishment period, and false positive rate with specificity. If they cannot quantify it, they do not measure it.
Third: Response automation depth determines whether the platform multiplies your team's capacity or just adds another alert feed. Platforms that only alert (Level 1) force your team to scale linearly with identity count. Policy-driven automation (Level 4) lets three analysts handle 500 incidents per week because 80% resolve automatically. Ask the vendor to demonstrate automated response for a high-confidence threat. If they pivot to "guided workflows" or "suggested actions," they do not offer real automation.
The best ITDR platforms do not just detect identity threats. They reduce the operational burden on security teams, provide audit-ready compliance evidence, and scale to tens of thousands of identities without requiring proportional analyst headcount. Start your evaluation with these criteria. You will eliminate 60% of vendors in the first meeting and spend your time evaluating the platforms that can actually solve the problem.
References
[1] OWASP, "AI Security and Privacy Guide: LLM Top 10," 2025. https://owasp.org/www-project-ai-security-and-privacy-guide/
[2] CrowdStrike, "2025 Global Threat Report," February 2025. https://www.crowdstrike.com/global-threat-report/
[3] Cloud Security Alliance, "Cloud Identity and Access Management Survey," 2025. https://cloudsecurityalliance.org/research/working-groups/identity-and-access-management/
[4] Gartner, "Market Guide for Identity Threat Detection and Response," December 2024. https://www.gartner.com/en/documents/identity-threat-detection-response
[5] HashiCorp, "State of Cloud Strategy Survey," 2025. https://www.hashicorp.com/state-of-the-cloud
