AI Remediation Guide
AI Remediation is the Fix rung of the verb ladder ā turning waste findings into action. Every plan gets remediation on the Remediation page: each finding shows its recommendation plus the exact Console and CLI steps to fix it yourself. On the Agentic plan, the same findings also get one-click execution ā you review the agent's Bedrock-generated plan, approve it, and CloudWise runs the fix in your AWS account with full rollback capability.
Manual vs. automatic remediationā
Remediation is one experience that expands with your plan. Open the Remediation page and you'll see your waste findings with how to fix each one:
| Plan | What you get |
|---|---|
| Free / Shield | Every finding with step-by-step Console + CLI instructions you run yourself, filterable by confidence, risk, category, and account |
| Agentic | The same findings ā plus a one-click Approve & fix that executes the plan in your account (with pre-checks and rollback) |
| Compliance | Propose-only by design (air-gapped) ā guided instructions, no execution |
The fix instructions and the risk/confidence model (below) apply to everyone. The rest of this guide covers the automatic (Agentic) execute flow ā planning, approval, execution, and rollback.
Guided fix instructions appear on the Remediation page for every plan. Only the Agentic plan ($49/mo or $399/year) adds one-click execution inside your account. Free and Shield are propose-only (you run the steps); Compliance is propose-only by design (air-gapped). Upgrade ā
Architecture Overviewā
The diagram below shows the end-to-end flow of execute-mode remediation ā from approving an action on the Remediation page through IAM role chaining, pre-checks, execution, and rollback.

Key components:
- Remediation page ā Review findings, approve remediation actions, and monitor execution status (the same page shows guided fix instructions on every plan)
- IAM Hop 1 (CloudWise Service Role) ā CloudWise assumes its own service role to initiate cross-account access
- IAM Hop 2 (Customer Remediation Role) ā A scoped role in your AWS account with only cost-saving permissions
- Pre-Check Phase ā Read-only checks (e.g.,
describe_instances,list_volumes) verify the resource state and resolve placeholders (e.g.,VOLUME_ID,INSTANCE_ID) before any writes occur - Execution Phase ā Performs the approved actions (e.g., create snapshot, delete volume) with full CloudWatch logging and metrics
- Rollback Path ā If anything fails, CloudWise automatically restores from snapshots or recreates resources
- CloudWatch Monitoring ā All actions, logs, and metrics are recorded for audit
šÆ How It Worksā
CloudWise Remediation follows a strict human-in-the-loop workflow. No action is ever executed without your explicit approval.
Waste Detection Scan
ā
Confidence Filtering (your threshold)
ā
AI Remediation Planner (Bedrock ā Anthropic Claude)
ā
Deterministic Risk Classification
ā
Policy Filtering (your rules)
ā
Plan Validation (allow-list enforcement)
ā
Action Created ā Pending Approval
ā
You Review & Approve (or Reject)
ā
CloudWise Executes via Cross-Account Role
ā
Verification & Rollback Available
Pipeline Stepsā
- Waste Detection ā CloudWise scans 17 AWS regions for idle, unused, and over-provisioned resources across 191 waste types
- Confidence Filtering ā Each finding has a confidence level (High, Medium, or Low). Only findings at or above your configured Minimum Confidence Level proceed to planning. Default: Medium (skips Low confidence findings)
- Remediation Planning ā Eligible findings are sent to Amazon Bedrock (Anthropic Claude), which generates a specific execution plan with pre-checks, API calls, and rollback steps
- Risk Classification ā Every action is deterministically classified as Low, Medium, or High risk based on the waste type (no LLM involvement in risk scoring)
- Policy Filtering ā Your remediation policy controls which actions are proposed: risk threshold, confidence threshold, excluded tags, resource IDs, and waste types
- Allow-List Validation ā Every API call in the plan is validated against a strict allow-list of 41 AWS services. Any plan containing disallowed actions is rejected automatically
- Approval Queue ā Valid actions appear on the Remediation page with status
Pending Approvaland a 72-hour expiry window - Execution ā Approved actions execute via a scoped IAM role in your AWS account with pre-flight checks
- Audit Trail ā Every action is logged with timestamps, results, rollback information, and Bedrock model/token usage
š Getting Startedā
Step 1: Deploy the Remediation Roleā
AI Remediation requires a write-access IAM role in each AWS account you want to remediate. This is separate from the read-only role used for cost monitoring.
- Go to Settings ā Remediation Policy
- Click Deploy in AWS Console
- Upload the downloaded template when the CloudFormation console opens
- Accept the default parameters and deploy
The settings page downloads the template and opens the CloudFormation console automatically. Default parameters work for most setups ā just click through the wizard.
The CloudWiseRemediationRole is scoped to cost-saving actions only:
- ā Stop/terminate EC2 instances
- ā Delete unattached EBS volumes and old snapshots
- ā Release unused Elastic IPs
- ā Delete idle NAT Gateways and Load Balancers
- ā Modify over-provisioned resources (Lambda, DynamoDB, EBS)
- ā Clean up unused secrets, KMS keys, API Gateways
- ā No IAM, VPC, or security group access
- ā No data access (S3 objects, databases)
- ā 15-minute session cap (cannot maintain persistent access)
Step 2: Configure Your Policyā
Navigate to Settings ā Remediation Policy to customize:
- Confidence threshold ā How certain a finding must be before generating a plan
- Risk threshold ā Which risk levels to include
- Safety caps ā Maximum daily actions and savings impact
- Exclusions ā Tags, resource IDs, and waste types to skip
- Notifications ā Email and Slack alerts
Step 3: Review & Approve Actionsā
After the next processing cycle, remediation actions will appear on the Remediation page. Review each action's plan, then approve or reject.
āļø Configuration Referenceā
Remediation Engine Toggleā
| Setting | Default | Description |
|---|---|---|
| Enable Remediation | On | Master switch. When disabled, no new remediation actions are generated. Existing pending actions remain but no new ones are created. |
Risk & Confidence Controlsā
Confidence and risk are independent filters that work at different stages of the pipeline:
- Confidence = "How certain are we this resource is waste?" ā Filters findings before plan generation. Based on usage signals (e.g., an EBS volume unattached for 90 days is high confidence; a log group with no retention policy set is low confidence).
- Risk = "How impactful would this remediation be?" ā Filters actions during plan generation. Based on the type of remediation (e.g., deleting an unattached volume is low risk; stopping a running instance is high risk).
A finding can be high-confidence + high-risk (e.g., an idle EC2 instance clearly unused for 30 days) or low-confidence + low-risk (e.g., a log group flagged for missing retention policy). The two dimensions are orthogonal.
Minimum Confidence Levelā
Controls which waste findings are eligible for remediation planning:
| Level | Setting | What Gets Considered |
|---|---|---|
| High (Most certain) | Only highest-confidence findings | Resources with strong inactivity signals (zero usage for 30+ days, zero API calls, unattached 90+ days) |
| Medium (Recommended) | High + Medium confidence | Above + resources with moderate signals (low utilization, no recent changes, outdated configurations) |
| Low (All findings) | All confidence levels | Above + resources flagged by heuristics (missing best-practice settings, potential over-provisioning) |
If you see waste findings in your dashboard that don't have remediation actions, they may be filtered by the confidence threshold. Lowering it to Low will generate plans for all findings, but some may be less certain about the resource being actual waste. Review these plans more carefully.
Maximum Risk Levelā
Controls which risk levels are proposed for remediation:
| Level | Setting | What Gets Proposed |
|---|---|---|
| Low (Safest) | Only lowest-risk actions | Unattached volumes, old snapshots, unused Elastic IPs, lifecycle policies, Lambda timeout/runtime/ARM64 recommendations, excessive log retention, empty log groups, Glue job timeouts, failed Glue job retries, EMR auto-termination, previous-gen instances, Spot opportunities |
| Medium (Recommended) | Low + Medium risk | Above + stopped instances, GP2āGP3 migrations, idle NAT Gateways, idle load balancers, unused secrets and API Gateways, idle Lambda Provisioned Concurrency, over-provisioned EMR clusters, WorkSpaces AutoStop/pool optimization |
| High (All) | All risk levels | Above + idle running instances, idle databases, unused Lambda functions, idle services |
Risk Classificationā
Every waste type is deterministically mapped to a risk level ā no AI/LLM is involved in risk scoring. The complete classification across all 191 classified waste types:
Low Risk ā ā Safe to act on immediately (80 waste types):
| Waste Type | Why It's Low Risk |
|---|---|
unattached_ebs | Not connected to any instance |
old_ebs_snapshot | Aged out, point-in-time copy exists |
orphaned_ebs_snapshot | Source volume deleted ā snapshot serves no recovery purpose |
ami_orphaned_snapshot | AMI deregistered ā backing snapshot no longer needed |
unattached_eip | AWS charges $0.005/hr for unused EIPs |
eip_on_stopped_instance | Paying for idle elastic IP |
incomplete_multipart | Incomplete uploads consuming storage |
old_ecr_images | Container images past retention |
untagged_ecr_images | Untagged images consuming registry space |
old_log_group | Aged-out log data |
no_retention_log_group | Infinite retention accumulates cost |
excessive_retention_log_group | Retention reduction ā old data ages out naturally |
empty_log_group | Delete empty group ā zero data loss risk |
unused_dashboard | No monitoring value |
orphaned_dns_record | DNS record pointing to deleted resource |
cloudtrail_s3_no_lifecycle | CloudTrail S3 bucket without lifecycle rules |
ecr_no_lifecycle_policy | ECR repo without image cleanup |
no_lifecycle_policy / no_lifecycle_efs | Missing lifecycle configuration |
stopped_sagemaker_notebook_storage | Stopped notebook with EBS storage charges |
previous_gen_sagemaker_instance | Old-gen instance type ā recommend upgrade |
high_lcu_cost_alb | ALB with high LCU costs ā advise optimization |
classic_lb_migration | Classic LB migration ā recommend upgrade to ALB/NLB |
lambda_excessive_timeout | Timeout reduction ā no cost impact, improves hygiene |
lambda_arm64_migration | ARM64 migration opportunity ā architecture recommendation |
lambda_old_runtime | Deprecated runtime ā housekeeping flag |
s3_rapid_growth | Recommendation-only ā advises lifecycle rules for fast-growing buckets |
s3_wrong_storage_class | Recommendation-only ā advises Intelligent-Tiering migration |
s3_empty_bucket | Empty bucket cleanup ā zero data loss risk |
s3_high_request_and_transfer_cost | Recommendation-only ā advises request pattern optimization |
redshift_spectrum_heavy | Recommendation-only ā advises Athena evaluation for Spectrum workloads |
redshift_legacy_dc2 | Recommendation-only ā advises RA3 migration planning |
redshift_concurrency_scaling_waste | WLM tuning recommendation ā no service impact |
oversized_glue_job | Advisory ā DPU right-sizing recommendation |
glue_job_missing_timeout | Timeout reduction ā jobs still run, just fail faster if stuck |
failed_glue_job_retry | Disable retries ā jobs still run once, just don't retry on failure |
glue_dev_endpoint_migration | Advisory ā migration to Interactive Sessions recommendation |
glue_catalog_bloat | Advisory ā table version cleanup recommendation |
ecs_container_insights_waste | Monitoring change ā basic ECS metrics remain free |
backup_no_lifecycle_tiering | Advisory ā lifecycle tiering recommendation |
stale_backup_plan_assignment | Advisory ā backup plan selection cleanup |
backup_copy_policy_overreach | Advisory ā cross-region copy policy review |
s3_no_default_encryption | Advisory ā recommends customer-managed KMS encryption |
aurora_io_optimization_opportunity | Billing config change ā switch to I/O-Optimized, no downtime |
aurora_extended_support_cost | Advisory ā recommends engine version upgrade to avoid surcharge |
rds_extended_support_cost | Advisory ā recommends RDS engine upgrade to avoid Extended Support surcharge |
elasticache_extended_support_cost | Advisory ā recommends ElastiCache engine upgrade to avoid Extended Support surcharge |
elasticache_engine_migration | Advisory ā recommends Valkey migration for 20% savings, Redis 7.x API compatible |
elasticache_serverless_optimization | Advisory ā recommends Serverless for spiky workloads, manual migration required |
elasticache_data_tiering_opportunity | Advisory ā recommends R6gd data tiering for large datasets, manual migration required |
eks_extended_support_cost | Advisory ā recommends Kubernetes version upgrade to avoid EKS Extended Support charges |
opensearch_extended_support_cost | Advisory ā recommends domain version upgrade to avoid legacy support surcharge |
documentdb_extended_support_cost | Advisory ā recommends DocumentDB engine upgrade to avoid Extended Support surcharge |
aurora_serverless_opportunity | Advisory ā recommends Serverless v2 migration for low-utilization clusters |
neptune_serverless_opportunity | Recommendation only ā Serverless migration requires new cluster |
oversized_mq_broker | Recommendation only ā MQ doesn't support in-place instance changes |
lightsail_unattached_static_ip | Unattached static IP ā no instance dependency, $3.65/mo flat charge |
appsync_idle_subscriptions | AppSync idle subscriptions ā advisory (architecture review) |
ri_opportunity_opensearch | RI commitment purchase ā advisory, non-refundable |
emr_missing_auto_termination | Advisory ā auto-termination policy is a configuration setting, no workload impact |
emr_previous_gen_instances | Advisory ā recommends upgrading to current-gen instance types |
emr_spot_opportunity | Advisory ā recommends Spot for task nodes, no impact on running workloads |
beanstalk_unnecessary_alb | Advisory ā switch from LoadBalanced to SingleInstance when min=max=1 |
beanstalk_previous_gen_instances | Advisory ā recommends upgrading to current-gen instance types |
workspaces_windows_license_optimization | Advisory ā BYOL licensing review, no infrastructure changes |
oversized_ec2_optimizer | Compute Optimizer recommendation ā advisory rightsizing suggestion |
oversized_ebs_optimizer | Compute Optimizer recommendation ā advisory rightsizing suggestion |
oversized_lambda_optimizer | Compute Optimizer recommendation ā advisory rightsizing suggestion |
oversized_rds_optimizer | Compute Optimizer recommendation ā advisory rightsizing suggestion |
ri_opportunity_ec2 | RI purchase recommendation ā advisory, informational only |
ri_opportunity_rds | RI purchase recommendation ā advisory, informational only |
ri_opportunity_elasticache | RI purchase recommendation ā advisory, informational only |
ri_opportunity_redshift | RI purchase recommendation ā advisory, informational only |
sp_opportunity_compute | SP purchase recommendation ā advisory, informational only |
sp_opportunity_ec2 | SP purchase recommendation ā advisory, informational only |
sp_opportunity_sagemaker | SP purchase recommendation ā advisory, informational only |
savings_plan_coverage_gap | SP coverage gap ā advisory, informational only |
convertible_ri_exchange_opportunity | Convertible RI exchange ā advisory, informational only |
kinesis_on_demand_downgrade | Billing mode switch ā reversible, no data impact |
kinesis_extended_retention_waste | Retention decrease to default 24h ā no data loss for active consumers |
Medium Risk ā ļø ā Review before acting (75 waste types):
| Waste Type | Why It's Medium Risk |
|---|---|
stopped_ec2_with_ebs | EBS volumes still incurring charges |
gp2_migration | Performance characteristics change (GP2 ā GP3) |
over_provisioned_iops | IOPS reduction |
over_provisioned_lambda | Memory/timeout reduction |
lambda_provisioned_concurrency_idle | Deleting/reducing PC configs affects cold start behavior |
over_provisioned_dynamodb | Capacity mode change |
dynamodb_no_autoscaling | Switch to on-demand billing |
idle_nat_gateway | May have intermittent traffic |
multiple_eips_per_instance | EIP consolidation |
old_rds_snapshot | Aged RDS snapshot |
old_documentdb_snapshot | Aged DocumentDB snapshot ā irreversible deletion |
old_fsx_backup | Aged FSx backup ā irreversible deletion |
old_backup | Past retention window ā verify backup policy |
idle_load_balancer | Load balancer with no healthy targets |
low_traffic_alb | Load balancer with near-zero traffic |
unused_vpc_endpoint | VPC endpoint with no traffic |
oversized_elasticache | Cache cluster right-sizing |
elasticache_replication_waste | Non-production replica removal ā reversible via increase_replica_count |
oversized_ecs_task | Task definition right-sizing |
ecs_no_autoscaling | Auto-scaling configuration requires workload analysis |
oversized_ecs_memory | Memory right-sizing ā may cause OOM if usage spikes |
redshift_no_pause | Creates scheduled pause action ā reversible |
idle_kinesis_stream | Stream with no throughput |
over_provisioned_kinesis | Shard count reduction |
kinesis_enhanced_fan_out_waste | Deregister idle enhanced fan-out consumer |
kinesis_firehose_idle | Delete idle Firehose delivery stream |
idle_glue_dev_endpoint | Unused Glue development endpoint |
old_glue_job | Stale Glue job definition |
idle_glue_crawler | Crawler with no recent runs |
idle_state_machine | Step Function with no executions |
step_functions_retry_storm | State machine with excessive retry ratio and failure rate |
step_functions_high_transition_density | Workflow with excessive transitions per success |
step_functions_express_duration_waste | Express workflow with high p95 duration |
redundant_backup | Duplicate backup |
unused_secret | Secrets Manager secret unused for 90+ days |
idle_sagemaker_notebook | Notebook instance running idle |
oversized_sagemaker_endpoint | Endpoint with low CPU/memory utilization |
oversized_msk_cluster | Downsizing requires workload analysis |
overprovisioned_documentdb | Instance right-sizing requires workload analysis |
oversized_neptune | Instance class change ā brief downtime |
neptune_old_snapshot | Snapshot deletion ā irreversible |
oversized_fsx | Storage right-sizing ā advisory only in v1 |
fsx_throughput_overprovisioned | Throughput right-sizing ā advisory only in v1 |
underutilized_redshift | Low CPU but active connections ā right-sizing |
oversized_opensearch | Node rightsizing requires workload analysis |
opensearch_ebs_overprovisioned | EBS resizing requires planned reconfiguration |
redshift_wlm_over_provisioned | WLM concurrency right-sizing requires workload analysis |
unused_api_gateway | API Gateway with zero calls in 30 days |
unused_appsync | AppSync API with no queries |
appsync_idle_cache | AppSync idle cache ā no hits in 14 days |
unused_distribution | Unused CloudFront distribution |
unused_hosted_zone | Unused Route 53 hosted zone |
unused_kms_key | Unused KMS key |
unencrypted_ebs_volume | Advisory ā encryption-at-rest gap, requires snapshot copy to encrypt |
unencrypted_rds_instance | Advisory ā encryption-at-rest gap, requires read replica promotion |
unencrypted_efs_filesystem | Advisory ā encryption cannot be changed after creation |
opensearch_no_encryption_at_rest | Advisory ā encryption-at-rest gap, requires domain reconfiguration |
unencrypted_documentdb_cluster | Advisory ā encryption cannot be changed after creation |
rds_no_deletion_protection | Advisory ā recommends enabling DeletionProtection |
dynamodb_no_deletion_protection | Advisory ā recommends enabling deletion protection |
resource_without_backup_coverage | Advisory ā resource not covered by any AWS Backup selection |
lightsail_unattached_disk | Block storage disk deletion ā irreversible without snapshot |
lightsail_old_snapshot | Snapshot deletion ā irreversible, verify no recovery need |
lightsail_idle_load_balancer | Load balancer with zero healthy instances ā may have intermittent traffic |
unused_transfer_protocol | Transfer server with unused protocols ā review protocol configuration |
aurora_to_rds_downgrade_opportunity | Aurora to RDS downgrade ā requires migration planning |
emr_over_provisioned | Over-provisioned EMR instance groups ā right-sizing requires workload analysis |
beanstalk_over_provisioned | Over-provisioned Beanstalk environment ā CPU <25% over 14 days, right-sizing requires workload analysis |
workspaces_autostop_opportunity | Billing mode switch ā AlwaysOn to AutoStop changes billing model. Guarded auto-remediation: executor validates AlwaysOn state, checks deny-tag (cloudwise:autostop-deny=true), and verifies modifiable state before execution |
workspaces_pool_overprovisioned_capacity | Pool capacity reduction ā may affect user availability during peaks |
oversized_workspace | WorkSpace bundle right-sizing requires usage analysis |
oversized_redshift | Redshift cluster right-sizing requires workload analysis |
expiring_reserved_instance | RI approaching expiry ā advisory, renewal planning |
expiring_savings_plan | SP approaching expiry ā advisory, renewal planning |
disabled_global_accelerator | Disabled accelerator still incurring charges ā review before deleting |
High Risk š“ ā Significant resource changes (36 waste types):
| Waste Type | Why It's High Risk |
|---|---|
idle_ec2 | Running instance ā may have workloads |
idle_rds | Running database ā may have connections |
idle_elasticache | Running cache ā may serve traffic |
idle_redshift | Running data warehouse |
idle_opensearch | Running search cluster |
idle_emr_cluster | Running EMR cluster |
idle_sagemaker_endpoint | Running inference endpoint |
idle_msk_cluster | Running Kafka cluster |
idle_mq_broker | Running message broker |
idle_neptune | Running graph database |
idle_documentdb | Running document database |
unused_lambda | May be invoked by other services |
idle_ecs_service | Running containers |
idle_dynamodb | DynamoDB table with provisioned capacity |
idle_fsx | Running file system |
idle_efs | Elastic File System with no mounts |
idle_beanstalk | Running Elastic Beanstalk environment |
beanstalk_idle_traffic | EB environment with zero traffic for 14 days |
beanstalk_orphaned_rds | Orphaned RDS left behind by terminated EB environment |
idle_lightsail | Running Lightsail instance |
lightsail_idle_database | Managed database with zero connections |
idle_workspace | Running WorkSpace |
idle_transfer_server | Running Transfer Family server |
idle_transfer_no_activity | Transfer server with no file activity |
idle_transfer_web_app | Idle Transfer Family web app |
idle_qldb | Running QLDB ledger |
idle_timestream | Timestream database |
unused_accelerator | Global Accelerator |
idle_global_accelerator | Global Accelerator with zero traffic for 30 days |
long_running_emr | Long-running EMR cluster |
duplicate_cloudtrail | Redundant trail ā verify compliance requirements before deleting |
rds_publicly_accessible | RDS instance with public access enabled ā high security risk |
cur_savings_plan_waste | CUR-detected unused Savings Plan coverage |
cur_unused_reservation | CUR-detected underutilized Reserved Instance |
unused_reserved_instance | Reserved Instance with low utilization |
unused_savings_plan | Savings Plan with low utilization |
Require MFA Above ($)ā
| Setting | Default | Description |
|---|---|---|
| MFA Threshold | $500 | Actions with estimated monthly savings above this amount require MFA verification before execution. Prevents accidental approval of high-impact changes. |
Max Daily Actionsā
| Setting | Default | Range | Description |
|---|---|---|---|
| Max Daily Actions | 50 | 1ā200 | Maximum number of remediation actions that can be executed in a single day. Prevents runaway automation. |
Max Daily Savings Impactā
| Setting | Default | Description |
|---|---|---|
| Max Daily Savings Impact | $10,000 | Safety cap on the total dollar value of actions executed per day. Even if you approve more, execution pauses when this limit is reached. |
Notification Channelsā
| Channel | Default | Description |
|---|---|---|
| On | Approval requests, execution results, daily digest | |
| Slack | Off | Real-time notifications to a dedicated channel (see Slack Integration Guide) |
Remediation Slack messages include:
- Per-action: Title, description, API calls, rollback steps, confidence level
- Batch digest: Summary with counts by status and top savings
- Direct links to the specific action on the Remediation page
Exclusionsā
Excluded Resource Tagsā
Resources with matching tags are never proposed for remediation. Use Key=Value format.
Common exclusions:
Environment=Productionā Skip all production resourcesTeam=Platformā Skip platform team resourcesDoNotDelete=trueā Explicit protection tagCostCenter=Sharedā Skip shared-cost resources
Excluded Resource IDsā
Specific AWS resources that should never be remediated, by resource ID.
Examples:
i-0abc123def456ā A specific EC2 instancevol-0abc123def456ā A specific EBS volumesnap-0abc123def456ā A specific snapshot
Excluded Waste Typesā
Deselect specific waste detection categories from remediation. The full list of 191 classified waste types:
| Category | Waste Types |
|---|---|
| EC2 | idle_ec2, stopped_ec2_with_ebs |
| EBS | unattached_ebs, old_ebs_snapshot, orphaned_ebs_snapshot, ami_orphaned_snapshot, gp2_migration, over_provisioned_iops |
| RDS | idle_rds, old_rds_snapshot, aurora_io_optimization_opportunity, aurora_extended_support_cost, rds_extended_support_cost, aurora_serverless_opportunity, aurora_to_rds_downgrade_opportunity |
| Networking | unattached_eip, eip_on_stopped_instance, multiple_eips_per_instance, idle_nat_gateway, idle_load_balancer, low_traffic_alb, high_lcu_cost_alb, classic_lb_migration, unused_vpc_endpoint, orphaned_dns_record |
| Serverless | unused_lambda, over_provisioned_lambda, unused_api_gateway, lambda_provisioned_concurrency_idle, lambda_excessive_timeout, lambda_arm64_migration, lambda_old_runtime |
| DynamoDB | idle_dynamodb, over_provisioned_dynamodb, dynamodb_no_autoscaling |
| Containers | idle_ecs_service, oversized_ecs_task, ecs_no_autoscaling, ecs_container_insights_waste, oversized_ecs_memory, idle_elasticache, oversized_elasticache, eks_extended_support_cost, elasticache_extended_support_cost, elasticache_replication_waste, elasticache_engine_migration, elasticache_serverless_optimization, elasticache_data_tiering_opportunity |
| Data | idle_kinesis_stream, over_provisioned_kinesis, kinesis_on_demand_downgrade, kinesis_extended_retention_waste, kinesis_enhanced_fan_out_waste, kinesis_firehose_idle, idle_redshift, underutilized_redshift, oversized_redshift, redshift_no_pause, redshift_spectrum_heavy, redshift_legacy_dc2, redshift_wlm_over_provisioned, redshift_concurrency_scaling_waste, idle_opensearch, oversized_opensearch, opensearch_ebs_overprovisioned, ri_opportunity_opensearch, opensearch_extended_support_cost |
| ML | idle_sagemaker_notebook, idle_sagemaker_endpoint, oversized_sagemaker_endpoint, stopped_sagemaker_notebook_storage, previous_gen_sagemaker_instance |
| DevOps | idle_glue_dev_endpoint, old_glue_job, idle_glue_crawler, oversized_glue_job, glue_job_missing_timeout, failed_glue_job_retry, glue_dev_endpoint_migration, glue_catalog_bloat, idle_state_machine, step_functions_retry_storm, step_functions_high_transition_density, step_functions_express_duration_waste, idle_emr_cluster, long_running_emr, emr_over_provisioned, emr_missing_auto_termination, emr_previous_gen_instances, emr_spot_opportunity |
| Storage | old_backup, redundant_backup, backup_no_lifecycle_tiering, stale_backup_plan_assignment, backup_copy_policy_overreach, unused_secret, idle_fsx, idle_efs, no_lifecycle_efs, oversized_fsx, fsx_throughput_overprovisioned, old_fsx_backup |
| Monitoring | unused_dashboard, old_log_group, no_retention_log_group, excessive_retention_log_group, empty_log_group |
| Security Posture | unencrypted_ebs_volume, unencrypted_rds_instance, rds_no_deletion_protection, rds_publicly_accessible, unencrypted_efs_filesystem, s3_no_default_encryption, dynamodb_no_deletion_protection, opensearch_no_encryption_at_rest, unencrypted_documentdb_cluster, resource_without_backup_coverage |
| ECR | old_ecr_images, untagged_ecr_images, ecr_no_lifecycle_policy, no_lifecycle_policy |
| CloudTrail | duplicate_cloudtrail, cloudtrail_s3_no_lifecycle |
| S3 | incomplete_multipart, s3_rapid_growth, s3_wrong_storage_class, s3_empty_bucket, s3_high_request_and_transfer_cost |
| Messaging | idle_msk_cluster, oversized_msk_cluster, idle_mq_broker, oversized_mq_broker |
| Databases | idle_neptune, neptune_serverless_opportunity, oversized_neptune, neptune_old_snapshot, idle_documentdb, overprovisioned_documentdb, old_documentdb_snapshot, documentdb_extended_support_cost, idle_qldb, idle_timestream |
| Niche | idle_beanstalk, beanstalk_idle_traffic, beanstalk_unnecessary_alb, beanstalk_previous_gen_instances, beanstalk_over_provisioned, beanstalk_orphaned_rds, idle_lightsail, lightsail_unattached_static_ip, lightsail_unattached_disk, lightsail_old_snapshot, lightsail_idle_load_balancer, lightsail_idle_database, idle_workspace, oversized_workspace, workspaces_autostop_opportunity, workspaces_pool_overprovisioned_capacity, workspaces_windows_license_optimization, idle_transfer_server, idle_transfer_no_activity, unused_transfer_protocol, idle_transfer_web_app, unused_accelerator, idle_global_accelerator, disabled_global_accelerator, unused_appsync, appsync_idle_cache, appsync_idle_subscriptions, unused_distribution, unused_hosted_zone, unused_kms_key |
| RI/SP & Optimizer | oversized_ec2_optimizer, oversized_ebs_optimizer, oversized_lambda_optimizer, oversized_rds_optimizer, ri_opportunity_ec2, ri_opportunity_rds, ri_opportunity_elasticache, ri_opportunity_redshift, sp_opportunity_compute, sp_opportunity_ec2, sp_opportunity_sagemaker, savings_plan_coverage_gap, convertible_ri_exchange_opportunity, expiring_reserved_instance, expiring_savings_plan, unused_reserved_instance, unused_savings_plan, cur_unused_reservation, cur_savings_plan_waste |
š Security Architectureā
Allow-List Enforcementā
Every API call in a remediation plan is validated against a strict allow-list before the action is created. If any call falls outside the allow-list, the entire plan is rejected. The allow-list covers 41 AWS services:
Allowed AWS API calls by service:
| Service | Allowed Actions |
|---|---|
| EC2 | stop_instances, start_instances, terminate_instances, release_address, allocate_address, disassociate_address, associate_address, delete_volume, create_volume, delete_snapshot, create_snapshot, modify_volume + describe operations |
| EC2 (NAT/VPC) | delete_nat_gateway, delete_vpc_endpoints + describe operations |
| ELBv2 | delete_load_balancer, delete_target_group, describe_load_balancers, describe_target_health, describe_target_groups |
| Classic ELB | delete_load_balancer, describe_load_balancers |
| RDS | stop_db_instance, modify_db_instance, modify_db_cluster, create_db_snapshot, describe_db_clusters + describe |
| S3 | abort_multipart_upload, put_bucket_lifecycle_configuration, delete_bucket_lifecycle, put_bucket_intelligent_tiering_configuration, delete_bucket, list_objects_v2, head_bucket + list/get |
| Lambda | update_function_configuration, delete_function, delete_provisioned_concurrency_config, put_provisioned_concurrency_config, list_provisioned_concurrency_configs + get |
| CloudWatch Logs | put_retention_policy, delete_log_group + describe |
| CloudWatch | delete_dashboards + list |
| ECR | batch_delete_image, put_lifecycle_policy + describe/get |
| EFS | put_lifecycle_configuration + describe |
| DynamoDB | update_table + describe |
| ElastiCache | delete_cache_cluster, modify_cache_cluster, create_snapshot, describe_replication_groups, decrease_replica_count, increase_replica_count, list_tags_for_resource, modify_replication_group + describe |
| Kinesis | delete_stream, update_shard_count, decrease_stream_retention_period, update_stream_mode, deregister_stream_consumer + describe |
| Firehose | delete_delivery_stream, describe_delivery_stream |
| Glue | delete_dev_endpoint, delete_job, delete_crawler, update_job, get_job_runs, get_databases, get_tables + get |
| Step Functions | delete_state_machine, list_state_machines, list_executions, describe_execution + describe |
| Backup | delete_recovery_point, update_recovery_point_lifecycle, update_backup_plan, delete_backup_selection, list_backup_plans, list_backup_selections, list_recovery_points_by_backup_vault, get_backup_plan + describe |
| Secrets Manager | delete_secret + describe |
| SageMaker | stop_notebook_instance, start_notebook_instance, delete_notebook_instance, delete_endpoint, update_endpoint + describe (describe_notebook_instance, describe_endpoint, describe_endpoint_config) |
| API Gateway | delete_rest_api, get_rest_api, get_rest_apis |
| CloudTrail | delete_trail, describe_trails, get_trail_status |
| Elastic Beanstalk | terminate_environment, update_environment, describe_configuration_settings + describe |
| Lightsail | stop_instance, start_instance, delete_instance, release_static_ip, get_disk, create_disk_snapshot, delete_disk, create_disk_from_snapshot, get_instance_snapshot, delete_instance_snapshot, get_load_balancer, create_load_balancer, delete_load_balancer, get_relational_database, create_relational_database_snapshot, stop_relational_database, start_relational_database, delete_relational_database + get |
| WorkSpaces | terminate_workspaces, stop_workspaces, start_workspaces, modify_workspace_properties + describe (describe_workspaces, describe_workspace_bundles, describe_workspaces_pools, describe_tags) |
| Transfer Family | stop_server, start_server, delete_server, update_server, delete_web_app + describe/list |
| QLDB | delete_ledger + describe |
| Timestream | delete_database, delete_table + describe |
| Global Accelerator | delete_accelerator, update_accelerator, list_listeners, list_endpoint_groups + describe |
| AppSync | delete_graphql_api, delete_api_cache, update_api_cache, get_api_cache + get |
| CloudFront | delete_distribution, update_distribution + get |
| Route 53 | delete_hosted_zone + get + list_resource_record_sets |
| KMS | schedule_key_deletion, disable_key + describe |
| EMR | terminate_job_flows, modify_instance_groups, put_auto_termination_policy, remove_auto_termination_policy, add_instance_groups + describe/list |
| Redshift | delete_cluster, pause_cluster, resume_cluster, resize_cluster, create_cluster_snapshot, modify_cluster_parameter_group, create_scheduled_action, delete_scheduled_action + describe |
| MSK (Kafka) | delete_cluster, update_broker_type + describe |
| MQ | delete_broker, create_broker + describe |
| Neptune | delete_db_instance, create_db_cluster_snapshot, delete_db_cluster_snapshot, modify_db_instance, modify_db_cluster + describe |
| DocumentDB | delete_db_instance, delete_db_cluster_snapshot, modify_db_instance, create_db_cluster_snapshot, describe_db_cluster_snapshots + describe |
| FSx | delete_file_system, create_backup, delete_backup, update_file_system + describe |
| ECS | update_service + describe |
| OpenSearch | delete_domain, update_domain_config + describe |
The remediation system cannot:
- Create or modify IAM roles, policies, or users
- Modify VPCs, security groups, or network ACLs
- Access S3 object data
- Access database contents
- Create new resources (except snapshots for rollback)
- Modify DNS records
- Change encryption settings
IAM Role Designā
The CloudWiseRemediationRole uses:
- Scoped permissions ā Only the specific API calls listed above
- 15-minute session cap ā STS sessions expire after 15 minutes
- External ID ā Prevents confused deputy attacks
- Condition keys ā Region-locked where applicable
Audit Trailā
Every remediation action records:
- Who approved it (Cognito user ID)
- When it was approved, executed, and completed
- The exact API calls made
- Pre-check results
- Snapshot IDs created for rollback
- Any errors encountered
- Bedrock model and token usage (model ID, input/output tokens, latency)
š Action Lifecycleā
Each remediation action progresses through these statuses:
pending_approval ā approved ā executing ā completed
ā rejected ā failed
ā expired ā rolled_back
ā blocked_by_policy
| Status | Description |
|---|---|
| Pending Approval | Action proposed, waiting for your review. Expires after 72 hours. |
| Approved | You approved the action. Queued for execution. |
| Rejected | You rejected the action with a reason. |
| Expired | Action was not reviewed within 72 hours. |
| Executing | Action is currently running in your AWS account. |
| Completed | Action executed successfully. Savings realized. |
| Failed | Execution encountered an error. Pre-checks prevent partial execution ā if a pre-check fails, no changes are made. |
| Rolled Back | Action was executed but then rolled back (e.g., snapshot restored). |
| Blocked by Policy | Action was blocked by a policy change after initial proposal. |
Bulk Actionsā
You can approve multiple pending actions at once using the Approve All Low-Risk button in the remediation queue. This is useful for batch-approving low-risk actions after review.
Rollbackā
Completed actions that created a snapshot (e.g., before deleting an EBS volume) can be rolled back. The rollback restores the resource from the snapshot. For actions using soft-delete mechanisms (e.g., Secrets Manager's delete_secret with a recovery window, or KMS schedule_key_deletion with a waiting period), you can cancel the deletion during the recovery/waiting period.
š° Savings Trackingā
The remediation queue includes a Savings Widget that shows:
- Realized savings ā Monthly savings from completed actions
- Projected savings ā Estimated savings from pending actions
- Savings by waste type ā Breakdown of savings by category
- Savings by account ā Breakdown across AWS accounts
- Trend chart ā Savings over time
š§ Troubleshootingā
"No approvable remediation actions appear"ā
This covers the one-click, approvable actions (Agentic). Guided fix instructions for each finding appear on the Remediation page on every plan ā if those are missing, it's a waste-detection issue (see step 5), not a tier issue.
- Check your tier ā Executing fixes requires the Agentic plan or above. On Free/Shield the Remediation page shows the same findings with Console/CLI steps to run yourself, but no Approve & fix button
- Check the remediation toggle ā Go to Settings ā Remediation Policy and ensure the engine is enabled
- Check confidence threshold ā Your minimum confidence level might be filtering out findings. Try lowering it to Medium or Low in Settings ā Remediation Policy
- Check risk threshold ā Your maximum risk level might be too restrictive. A setting of Low only proposes the safest actions. Try Medium for a balance of safety and coverage
- Check waste detection ā Remediation plans are generated from waste findings. If no waste is detected, no remediation actions are created
- Check exclusions ā You may have excluded the waste types or tagged resources being detected
- Wait for the next scan ā Remediation plans are generated during the daily processing cycle (or on-demand when you trigger a scan). New actions appear after the scan completes
"Action failed to execute"ā
- Check the CloudFormation stack ā Ensure
CloudWise-Remediation-Roleis deployed in the target AWS account - Check the error message ā Click on the failed action to see the specific error
- Check resource state ā The resource may have been modified or deleted since the scan. Pre-checks detect this and fail gracefully
- Check IAM permissions ā The remediation role may be missing the required permissions for that specific action
"Action was blocked by policy"ā
This means your remediation policy was updated after the action was proposed, and the new policy excludes it. Update your policy settings if this was unintentional.
CloudFormation deployment issuesā
| Error | Solution |
|---|---|
CloudWise-Remediation-Role already exists | The stack already exists. Go to CloudFormation ā Stacks and update or delete the existing stack. |
MaxSessionDuration must be >= 3600 | You're using an old template. Download the latest from Settings ā Remediation Policy. |
Invalid principal in policy | Ensure you're deploying in the correct AWS account. The template trusts the CloudWise service account. |
ā FAQā
Can I fix waste without the Agentic plan?ā
Yes. On every plan, the Remediation page lists your waste findings with each one's recommendation and the exact Console and CLI steps to fix it ā you run them yourself in your own AWS account. The Agentic plan adds one-click Approve & fix, which executes the same plan for you (with pre-checks and rollback). Same findings, same risk/confidence model; the difference is whether CloudWise runs the steps or you do.
Is any action ever executed automatically?ā
No. Every executed action requires explicit approval. There is no "auto-approve" mode. Even the Bulk Approve button requires you to click it. (On Free/Shield nothing is executed at all ā you run the provided steps yourself.)
What's the difference between confidence and risk?ā
Confidence is about the finding ā how sure CloudWise is that a resource is actually waste. A volume unattached for 90 days is high confidence waste. A log group missing a retention policy is low confidence ā it might be intentional.
Risk is about the action ā how impactful the remediation would be. Deleting an unattached EBS volume is low risk. Stopping a running EC2 instance is high risk, even if we're very confident it's idle.
You can control each independently via Settings ā Remediation Policy.
What happens if I approve an action for a resource that no longer exists?ā
The pre-check step will detect that the resource is missing and the action will fail gracefully with an appropriate error message. No partial changes are made.
Can I undo a completed action?ā
If the action created a snapshot before execution (e.g., before deleting an EBS volume), you can roll it back from the action detail view. For actions without snapshots (e.g., stopping an instance), you can restart the resource manually from the AWS Console. Some services support soft-delete with recovery windows (e.g., Secrets Manager, KMS).
How much does it cost in Bedrock usage?ā
CloudWise uses Amazon Bedrock (Anthropic Claude) to generate plans. The cost is included in your Agentic subscription ā you are not charged separately for Bedrock usage.
Does CloudWise store my AWS credentials?ā
No. CloudWise uses IAM cross-account roles with STS temporary credentials. No long-term credentials are stored. Sessions expire after 15 minutes.
How often are new remediation actions generated?ā
Remediation plans are generated during each processing cycle (runs daily, or on-demand when you trigger a scan). Actions for findings that already have a pending action are skipped to avoid duplicates.
Can I use remediation with Air-Gapped Mode?ā
No. AI Remediation requires a cross-account IAM role to execute actions in your AWS account. Air-Gapped Mode is read-only by design.
How many waste types does remediation support?ā
CloudWise can generate remediation plans for all 191 active waste types across 45 AWS services. Every plan is validated against a strict API allow-list before being proposed.