Skip to main content

AI Remediation Guide

AI Remediation is the Fix rung of the verb ladder — turning waste findings into action. Every plan gets remediation on the Remediation page: each finding shows its recommendation plus the exact Console and CLI steps to fix it yourself. On the Agentic plan, the same findings also get one-click execution — you review the agent's Bedrock-generated plan, approve it, and CloudWise runs the fix in your AWS account with full rollback capability.

Manual vs. automatic remediation​

Remediation is one experience that expands with your plan. Open the Remediation page and you'll see your waste findings with how to fix each one:

PlanWhat you get
Free / ShieldEvery finding with step-by-step Console + CLI instructions you run yourself, filterable by confidence, risk, category, and account
AgenticThe same findings — plus a one-click Approve & fix that executes the plan in your account (with pre-checks and rollback)
CompliancePropose-only by design (air-gapped) — guided instructions, no execution

The fix instructions and the risk/confidence model (below) apply to everyone. The rest of this guide covers the automatic (Agentic) execute flow — planning, approval, execution, and rollback.

Agentic plan required to execute

Guided fix instructions appear on the Remediation page for every plan. Only the Agentic plan ($49/mo or $399/year) adds one-click execution inside your account. Free and Shield are propose-only (you run the steps); Compliance is propose-only by design (air-gapped). Upgrade →


Architecture Overview​

The diagram below shows the end-to-end flow of execute-mode remediation — from approving an action on the Remediation page through IAM role chaining, pre-checks, execution, and rollback.

CloudWise execute-mode remediation architecture

Key components:

  • Remediation page — Review findings, approve remediation actions, and monitor execution status (the same page shows guided fix instructions on every plan)
  • IAM Hop 1 (CloudWise Service Role) — CloudWise assumes its own service role to initiate cross-account access
  • IAM Hop 2 (Customer Remediation Role) — A scoped role in your AWS account with only cost-saving permissions
  • Pre-Check Phase — Read-only checks (e.g., describe_instances, list_volumes) verify the resource state and resolve placeholders (e.g., VOLUME_ID, INSTANCE_ID) before any writes occur
  • Execution Phase — Performs the approved actions (e.g., create snapshot, delete volume) with full CloudWatch logging and metrics
  • Rollback Path — If anything fails, CloudWise automatically restores from snapshots or recreates resources
  • CloudWatch Monitoring — All actions, logs, and metrics are recorded for audit

šŸŽÆ How It Works​

CloudWise Remediation follows a strict human-in-the-loop workflow. No action is ever executed without your explicit approval.

Waste Detection Scan
↓
Confidence Filtering (your threshold)
↓
AI Remediation Planner (Bedrock — Anthropic Claude)
↓
Deterministic Risk Classification
↓
Policy Filtering (your rules)
↓
Plan Validation (allow-list enforcement)
↓
Action Created → Pending Approval
↓
You Review & Approve (or Reject)
↓
CloudWise Executes via Cross-Account Role
↓
Verification & Rollback Available

Pipeline Steps​

  1. Waste Detection — CloudWise scans 17 AWS regions for idle, unused, and over-provisioned resources across 191 waste types
  2. Confidence Filtering — Each finding has a confidence level (High, Medium, or Low). Only findings at or above your configured Minimum Confidence Level proceed to planning. Default: Medium (skips Low confidence findings)
  3. Remediation Planning — Eligible findings are sent to Amazon Bedrock (Anthropic Claude), which generates a specific execution plan with pre-checks, API calls, and rollback steps
  4. Risk Classification — Every action is deterministically classified as Low, Medium, or High risk based on the waste type (no LLM involvement in risk scoring)
  5. Policy Filtering — Your remediation policy controls which actions are proposed: risk threshold, confidence threshold, excluded tags, resource IDs, and waste types
  6. Allow-List Validation — Every API call in the plan is validated against a strict allow-list of 41 AWS services. Any plan containing disallowed actions is rejected automatically
  7. Approval Queue — Valid actions appear on the Remediation page with status Pending Approval and a 72-hour expiry window
  8. Execution — Approved actions execute via a scoped IAM role in your AWS account with pre-flight checks
  9. Audit Trail — Every action is logged with timestamps, results, rollback information, and Bedrock model/token usage

šŸš€ Getting Started​

Step 1: Deploy the Remediation Role​

AI Remediation requires a write-access IAM role in each AWS account you want to remediate. This is separate from the read-only role used for cost monitoring.

  1. Go to Settings → Remediation Policy
  2. Click Deploy in AWS Console
  3. Upload the downloaded template when the CloudFormation console opens
  4. Accept the default parameters and deploy
One-Click Deploy

The settings page downloads the template and opens the CloudFormation console automatically. Default parameters work for most setups — just click through the wizard.

The CloudWiseRemediationRole is scoped to cost-saving actions only:

  • āœ… Stop/terminate EC2 instances
  • āœ… Delete unattached EBS volumes and old snapshots
  • āœ… Release unused Elastic IPs
  • āœ… Delete idle NAT Gateways and Load Balancers
  • āœ… Modify over-provisioned resources (Lambda, DynamoDB, EBS)
  • āœ… Clean up unused secrets, KMS keys, API Gateways
  • āŒ No IAM, VPC, or security group access
  • āŒ No data access (S3 objects, databases)
  • āŒ 15-minute session cap (cannot maintain persistent access)

Step 2: Configure Your Policy​

Navigate to Settings → Remediation Policy to customize:

  • Confidence threshold — How certain a finding must be before generating a plan
  • Risk threshold — Which risk levels to include
  • Safety caps — Maximum daily actions and savings impact
  • Exclusions — Tags, resource IDs, and waste types to skip
  • Notifications — Email and Slack alerts

Step 3: Review & Approve Actions​

After the next processing cycle, remediation actions will appear on the Remediation page. Review each action's plan, then approve or reject.


āš™ļø Configuration Reference​

Remediation Engine Toggle​

SettingDefaultDescription
Enable RemediationOnMaster switch. When disabled, no new remediation actions are generated. Existing pending actions remain but no new ones are created.

Risk & Confidence Controls​

Confidence and risk are independent filters that work at different stages of the pipeline:

  • Confidence = "How certain are we this resource is waste?" — Filters findings before plan generation. Based on usage signals (e.g., an EBS volume unattached for 90 days is high confidence; a log group with no retention policy set is low confidence).
  • Risk = "How impactful would this remediation be?" — Filters actions during plan generation. Based on the type of remediation (e.g., deleting an unattached volume is low risk; stopping a running instance is high risk).

A finding can be high-confidence + high-risk (e.g., an idle EC2 instance clearly unused for 30 days) or low-confidence + low-risk (e.g., a log group flagged for missing retention policy). The two dimensions are orthogonal.

Minimum Confidence Level​

Controls which waste findings are eligible for remediation planning:

LevelSettingWhat Gets Considered
High (Most certain)Only highest-confidence findingsResources with strong inactivity signals (zero usage for 30+ days, zero API calls, unattached 90+ days)
Medium (Recommended)High + Medium confidenceAbove + resources with moderate signals (low utilization, no recent changes, outdated configurations)
Low (All findings)All confidence levelsAbove + resources flagged by heuristics (missing best-practice settings, potential over-provisioning)
When to lower confidence

If you see waste findings in your dashboard that don't have remediation actions, they may be filtered by the confidence threshold. Lowering it to Low will generate plans for all findings, but some may be less certain about the resource being actual waste. Review these plans more carefully.

Maximum Risk Level​

Controls which risk levels are proposed for remediation:

LevelSettingWhat Gets Proposed
Low (Safest)Only lowest-risk actionsUnattached volumes, old snapshots, unused Elastic IPs, lifecycle policies, Lambda timeout/runtime/ARM64 recommendations, excessive log retention, empty log groups, Glue job timeouts, failed Glue job retries, EMR auto-termination, previous-gen instances, Spot opportunities
Medium (Recommended)Low + Medium riskAbove + stopped instances, GP2→GP3 migrations, idle NAT Gateways, idle load balancers, unused secrets and API Gateways, idle Lambda Provisioned Concurrency, over-provisioned EMR clusters, WorkSpaces AutoStop/pool optimization
High (All)All risk levelsAbove + idle running instances, idle databases, unused Lambda functions, idle services

Risk Classification​

Every waste type is deterministically mapped to a risk level — no AI/LLM is involved in risk scoring. The complete classification across all 191 classified waste types:

Low Risk āœ… — Safe to act on immediately (80 waste types):

Waste TypeWhy It's Low Risk
unattached_ebsNot connected to any instance
old_ebs_snapshotAged out, point-in-time copy exists
orphaned_ebs_snapshotSource volume deleted — snapshot serves no recovery purpose
ami_orphaned_snapshotAMI deregistered — backing snapshot no longer needed
unattached_eipAWS charges $0.005/hr for unused EIPs
eip_on_stopped_instancePaying for idle elastic IP
incomplete_multipartIncomplete uploads consuming storage
old_ecr_imagesContainer images past retention
untagged_ecr_imagesUntagged images consuming registry space
old_log_groupAged-out log data
no_retention_log_groupInfinite retention accumulates cost
excessive_retention_log_groupRetention reduction — old data ages out naturally
empty_log_groupDelete empty group — zero data loss risk
unused_dashboardNo monitoring value
orphaned_dns_recordDNS record pointing to deleted resource
cloudtrail_s3_no_lifecycleCloudTrail S3 bucket without lifecycle rules
ecr_no_lifecycle_policyECR repo without image cleanup
no_lifecycle_policy / no_lifecycle_efsMissing lifecycle configuration
stopped_sagemaker_notebook_storageStopped notebook with EBS storage charges
previous_gen_sagemaker_instanceOld-gen instance type — recommend upgrade
high_lcu_cost_albALB with high LCU costs — advise optimization
classic_lb_migrationClassic LB migration — recommend upgrade to ALB/NLB
lambda_excessive_timeoutTimeout reduction — no cost impact, improves hygiene
lambda_arm64_migrationARM64 migration opportunity — architecture recommendation
lambda_old_runtimeDeprecated runtime — housekeeping flag
s3_rapid_growthRecommendation-only — advises lifecycle rules for fast-growing buckets
s3_wrong_storage_classRecommendation-only — advises Intelligent-Tiering migration
s3_empty_bucketEmpty bucket cleanup — zero data loss risk
s3_high_request_and_transfer_costRecommendation-only — advises request pattern optimization
redshift_spectrum_heavyRecommendation-only — advises Athena evaluation for Spectrum workloads
redshift_legacy_dc2Recommendation-only — advises RA3 migration planning
redshift_concurrency_scaling_wasteWLM tuning recommendation — no service impact
oversized_glue_jobAdvisory — DPU right-sizing recommendation
glue_job_missing_timeoutTimeout reduction — jobs still run, just fail faster if stuck
failed_glue_job_retryDisable retries — jobs still run once, just don't retry on failure
glue_dev_endpoint_migrationAdvisory — migration to Interactive Sessions recommendation
glue_catalog_bloatAdvisory — table version cleanup recommendation
ecs_container_insights_wasteMonitoring change — basic ECS metrics remain free
backup_no_lifecycle_tieringAdvisory — lifecycle tiering recommendation
stale_backup_plan_assignmentAdvisory — backup plan selection cleanup
backup_copy_policy_overreachAdvisory — cross-region copy policy review
s3_no_default_encryptionAdvisory — recommends customer-managed KMS encryption
aurora_io_optimization_opportunityBilling config change — switch to I/O-Optimized, no downtime
aurora_extended_support_costAdvisory — recommends engine version upgrade to avoid surcharge
rds_extended_support_costAdvisory — recommends RDS engine upgrade to avoid Extended Support surcharge
elasticache_extended_support_costAdvisory — recommends ElastiCache engine upgrade to avoid Extended Support surcharge
elasticache_engine_migrationAdvisory — recommends Valkey migration for 20% savings, Redis 7.x API compatible
elasticache_serverless_optimizationAdvisory — recommends Serverless for spiky workloads, manual migration required
elasticache_data_tiering_opportunityAdvisory — recommends R6gd data tiering for large datasets, manual migration required
eks_extended_support_costAdvisory — recommends Kubernetes version upgrade to avoid EKS Extended Support charges
opensearch_extended_support_costAdvisory — recommends domain version upgrade to avoid legacy support surcharge
documentdb_extended_support_costAdvisory — recommends DocumentDB engine upgrade to avoid Extended Support surcharge
aurora_serverless_opportunityAdvisory — recommends Serverless v2 migration for low-utilization clusters
neptune_serverless_opportunityRecommendation only — Serverless migration requires new cluster
oversized_mq_brokerRecommendation only — MQ doesn't support in-place instance changes
lightsail_unattached_static_ipUnattached static IP — no instance dependency, $3.65/mo flat charge
appsync_idle_subscriptionsAppSync idle subscriptions — advisory (architecture review)
ri_opportunity_opensearchRI commitment purchase — advisory, non-refundable
emr_missing_auto_terminationAdvisory — auto-termination policy is a configuration setting, no workload impact
emr_previous_gen_instancesAdvisory — recommends upgrading to current-gen instance types
emr_spot_opportunityAdvisory — recommends Spot for task nodes, no impact on running workloads
beanstalk_unnecessary_albAdvisory — switch from LoadBalanced to SingleInstance when min=max=1
beanstalk_previous_gen_instancesAdvisory — recommends upgrading to current-gen instance types
workspaces_windows_license_optimizationAdvisory — BYOL licensing review, no infrastructure changes
oversized_ec2_optimizerCompute Optimizer recommendation — advisory rightsizing suggestion
oversized_ebs_optimizerCompute Optimizer recommendation — advisory rightsizing suggestion
oversized_lambda_optimizerCompute Optimizer recommendation — advisory rightsizing suggestion
oversized_rds_optimizerCompute Optimizer recommendation — advisory rightsizing suggestion
ri_opportunity_ec2RI purchase recommendation — advisory, informational only
ri_opportunity_rdsRI purchase recommendation — advisory, informational only
ri_opportunity_elasticacheRI purchase recommendation — advisory, informational only
ri_opportunity_redshiftRI purchase recommendation — advisory, informational only
sp_opportunity_computeSP purchase recommendation — advisory, informational only
sp_opportunity_ec2SP purchase recommendation — advisory, informational only
sp_opportunity_sagemakerSP purchase recommendation — advisory, informational only
savings_plan_coverage_gapSP coverage gap — advisory, informational only
convertible_ri_exchange_opportunityConvertible RI exchange — advisory, informational only
kinesis_on_demand_downgradeBilling mode switch — reversible, no data impact
kinesis_extended_retention_wasteRetention decrease to default 24h — no data loss for active consumers

Medium Risk āš ļø — Review before acting (75 waste types):

Waste TypeWhy It's Medium Risk
stopped_ec2_with_ebsEBS volumes still incurring charges
gp2_migrationPerformance characteristics change (GP2 → GP3)
over_provisioned_iopsIOPS reduction
over_provisioned_lambdaMemory/timeout reduction
lambda_provisioned_concurrency_idleDeleting/reducing PC configs affects cold start behavior
over_provisioned_dynamodbCapacity mode change
dynamodb_no_autoscalingSwitch to on-demand billing
idle_nat_gatewayMay have intermittent traffic
multiple_eips_per_instanceEIP consolidation
old_rds_snapshotAged RDS snapshot
old_documentdb_snapshotAged DocumentDB snapshot — irreversible deletion
old_fsx_backupAged FSx backup — irreversible deletion
old_backupPast retention window — verify backup policy
idle_load_balancerLoad balancer with no healthy targets
low_traffic_albLoad balancer with near-zero traffic
unused_vpc_endpointVPC endpoint with no traffic
oversized_elasticacheCache cluster right-sizing
elasticache_replication_wasteNon-production replica removal — reversible via increase_replica_count
oversized_ecs_taskTask definition right-sizing
ecs_no_autoscalingAuto-scaling configuration requires workload analysis
oversized_ecs_memoryMemory right-sizing — may cause OOM if usage spikes
redshift_no_pauseCreates scheduled pause action — reversible
idle_kinesis_streamStream with no throughput
over_provisioned_kinesisShard count reduction
kinesis_enhanced_fan_out_wasteDeregister idle enhanced fan-out consumer
kinesis_firehose_idleDelete idle Firehose delivery stream
idle_glue_dev_endpointUnused Glue development endpoint
old_glue_jobStale Glue job definition
idle_glue_crawlerCrawler with no recent runs
idle_state_machineStep Function with no executions
step_functions_retry_stormState machine with excessive retry ratio and failure rate
step_functions_high_transition_densityWorkflow with excessive transitions per success
step_functions_express_duration_wasteExpress workflow with high p95 duration
redundant_backupDuplicate backup
unused_secretSecrets Manager secret unused for 90+ days
idle_sagemaker_notebookNotebook instance running idle
oversized_sagemaker_endpointEndpoint with low CPU/memory utilization
oversized_msk_clusterDownsizing requires workload analysis
overprovisioned_documentdbInstance right-sizing requires workload analysis
oversized_neptuneInstance class change — brief downtime
neptune_old_snapshotSnapshot deletion — irreversible
oversized_fsxStorage right-sizing — advisory only in v1
fsx_throughput_overprovisionedThroughput right-sizing — advisory only in v1
underutilized_redshiftLow CPU but active connections — right-sizing
oversized_opensearchNode rightsizing requires workload analysis
opensearch_ebs_overprovisionedEBS resizing requires planned reconfiguration
redshift_wlm_over_provisionedWLM concurrency right-sizing requires workload analysis
unused_api_gatewayAPI Gateway with zero calls in 30 days
unused_appsyncAppSync API with no queries
appsync_idle_cacheAppSync idle cache — no hits in 14 days
unused_distributionUnused CloudFront distribution
unused_hosted_zoneUnused Route 53 hosted zone
unused_kms_keyUnused KMS key
unencrypted_ebs_volumeAdvisory — encryption-at-rest gap, requires snapshot copy to encrypt
unencrypted_rds_instanceAdvisory — encryption-at-rest gap, requires read replica promotion
unencrypted_efs_filesystemAdvisory — encryption cannot be changed after creation
opensearch_no_encryption_at_restAdvisory — encryption-at-rest gap, requires domain reconfiguration
unencrypted_documentdb_clusterAdvisory — encryption cannot be changed after creation
rds_no_deletion_protectionAdvisory — recommends enabling DeletionProtection
dynamodb_no_deletion_protectionAdvisory — recommends enabling deletion protection
resource_without_backup_coverageAdvisory — resource not covered by any AWS Backup selection
lightsail_unattached_diskBlock storage disk deletion — irreversible without snapshot
lightsail_old_snapshotSnapshot deletion — irreversible, verify no recovery need
lightsail_idle_load_balancerLoad balancer with zero healthy instances — may have intermittent traffic
unused_transfer_protocolTransfer server with unused protocols — review protocol configuration
aurora_to_rds_downgrade_opportunityAurora to RDS downgrade — requires migration planning
emr_over_provisionedOver-provisioned EMR instance groups — right-sizing requires workload analysis
beanstalk_over_provisionedOver-provisioned Beanstalk environment — CPU <25% over 14 days, right-sizing requires workload analysis
workspaces_autostop_opportunityBilling mode switch — AlwaysOn to AutoStop changes billing model. Guarded auto-remediation: executor validates AlwaysOn state, checks deny-tag (cloudwise:autostop-deny=true), and verifies modifiable state before execution
workspaces_pool_overprovisioned_capacityPool capacity reduction — may affect user availability during peaks
oversized_workspaceWorkSpace bundle right-sizing requires usage analysis
oversized_redshiftRedshift cluster right-sizing requires workload analysis
expiring_reserved_instanceRI approaching expiry — advisory, renewal planning
expiring_savings_planSP approaching expiry — advisory, renewal planning
disabled_global_acceleratorDisabled accelerator still incurring charges — review before deleting

High Risk šŸ”“ — Significant resource changes (36 waste types):

Waste TypeWhy It's High Risk
idle_ec2Running instance — may have workloads
idle_rdsRunning database — may have connections
idle_elasticacheRunning cache — may serve traffic
idle_redshiftRunning data warehouse
idle_opensearchRunning search cluster
idle_emr_clusterRunning EMR cluster
idle_sagemaker_endpointRunning inference endpoint
idle_msk_clusterRunning Kafka cluster
idle_mq_brokerRunning message broker
idle_neptuneRunning graph database
idle_documentdbRunning document database
unused_lambdaMay be invoked by other services
idle_ecs_serviceRunning containers
idle_dynamodbDynamoDB table with provisioned capacity
idle_fsxRunning file system
idle_efsElastic File System with no mounts
idle_beanstalkRunning Elastic Beanstalk environment
beanstalk_idle_trafficEB environment with zero traffic for 14 days
beanstalk_orphaned_rdsOrphaned RDS left behind by terminated EB environment
idle_lightsailRunning Lightsail instance
lightsail_idle_databaseManaged database with zero connections
idle_workspaceRunning WorkSpace
idle_transfer_serverRunning Transfer Family server
idle_transfer_no_activityTransfer server with no file activity
idle_transfer_web_appIdle Transfer Family web app
idle_qldbRunning QLDB ledger
idle_timestreamTimestream database
unused_acceleratorGlobal Accelerator
idle_global_acceleratorGlobal Accelerator with zero traffic for 30 days
long_running_emrLong-running EMR cluster
duplicate_cloudtrailRedundant trail — verify compliance requirements before deleting
rds_publicly_accessibleRDS instance with public access enabled — high security risk
cur_savings_plan_wasteCUR-detected unused Savings Plan coverage
cur_unused_reservationCUR-detected underutilized Reserved Instance
unused_reserved_instanceReserved Instance with low utilization
unused_savings_planSavings Plan with low utilization

Require MFA Above ($)​

SettingDefaultDescription
MFA Threshold$500Actions with estimated monthly savings above this amount require MFA verification before execution. Prevents accidental approval of high-impact changes.

Max Daily Actions​

SettingDefaultRangeDescription
Max Daily Actions501–200Maximum number of remediation actions that can be executed in a single day. Prevents runaway automation.

Max Daily Savings Impact​

SettingDefaultDescription
Max Daily Savings Impact$10,000Safety cap on the total dollar value of actions executed per day. Even if you approve more, execution pauses when this limit is reached.

Notification Channels​

ChannelDefaultDescription
EmailOnApproval requests, execution results, daily digest
SlackOffReal-time notifications to a dedicated channel (see Slack Integration Guide)

Remediation Slack messages include:

  • Per-action: Title, description, API calls, rollback steps, confidence level
  • Batch digest: Summary with counts by status and top savings
  • Direct links to the specific action on the Remediation page

Exclusions​

Excluded Resource Tags​

Resources with matching tags are never proposed for remediation. Use Key=Value format.

Common exclusions:

  • Environment=Production — Skip all production resources
  • Team=Platform — Skip platform team resources
  • DoNotDelete=true — Explicit protection tag
  • CostCenter=Shared — Skip shared-cost resources

Excluded Resource IDs​

Specific AWS resources that should never be remediated, by resource ID.

Examples:

  • i-0abc123def456 — A specific EC2 instance
  • vol-0abc123def456 — A specific EBS volume
  • snap-0abc123def456 — A specific snapshot

Excluded Waste Types​

Deselect specific waste detection categories from remediation. The full list of 191 classified waste types:

CategoryWaste Types
EC2idle_ec2, stopped_ec2_with_ebs
EBSunattached_ebs, old_ebs_snapshot, orphaned_ebs_snapshot, ami_orphaned_snapshot, gp2_migration, over_provisioned_iops
RDSidle_rds, old_rds_snapshot, aurora_io_optimization_opportunity, aurora_extended_support_cost, rds_extended_support_cost, aurora_serverless_opportunity, aurora_to_rds_downgrade_opportunity
Networkingunattached_eip, eip_on_stopped_instance, multiple_eips_per_instance, idle_nat_gateway, idle_load_balancer, low_traffic_alb, high_lcu_cost_alb, classic_lb_migration, unused_vpc_endpoint, orphaned_dns_record
Serverlessunused_lambda, over_provisioned_lambda, unused_api_gateway, lambda_provisioned_concurrency_idle, lambda_excessive_timeout, lambda_arm64_migration, lambda_old_runtime
DynamoDBidle_dynamodb, over_provisioned_dynamodb, dynamodb_no_autoscaling
Containersidle_ecs_service, oversized_ecs_task, ecs_no_autoscaling, ecs_container_insights_waste, oversized_ecs_memory, idle_elasticache, oversized_elasticache, eks_extended_support_cost, elasticache_extended_support_cost, elasticache_replication_waste, elasticache_engine_migration, elasticache_serverless_optimization, elasticache_data_tiering_opportunity
Dataidle_kinesis_stream, over_provisioned_kinesis, kinesis_on_demand_downgrade, kinesis_extended_retention_waste, kinesis_enhanced_fan_out_waste, kinesis_firehose_idle, idle_redshift, underutilized_redshift, oversized_redshift, redshift_no_pause, redshift_spectrum_heavy, redshift_legacy_dc2, redshift_wlm_over_provisioned, redshift_concurrency_scaling_waste, idle_opensearch, oversized_opensearch, opensearch_ebs_overprovisioned, ri_opportunity_opensearch, opensearch_extended_support_cost
MLidle_sagemaker_notebook, idle_sagemaker_endpoint, oversized_sagemaker_endpoint, stopped_sagemaker_notebook_storage, previous_gen_sagemaker_instance
DevOpsidle_glue_dev_endpoint, old_glue_job, idle_glue_crawler, oversized_glue_job, glue_job_missing_timeout, failed_glue_job_retry, glue_dev_endpoint_migration, glue_catalog_bloat, idle_state_machine, step_functions_retry_storm, step_functions_high_transition_density, step_functions_express_duration_waste, idle_emr_cluster, long_running_emr, emr_over_provisioned, emr_missing_auto_termination, emr_previous_gen_instances, emr_spot_opportunity
Storageold_backup, redundant_backup, backup_no_lifecycle_tiering, stale_backup_plan_assignment, backup_copy_policy_overreach, unused_secret, idle_fsx, idle_efs, no_lifecycle_efs, oversized_fsx, fsx_throughput_overprovisioned, old_fsx_backup
Monitoringunused_dashboard, old_log_group, no_retention_log_group, excessive_retention_log_group, empty_log_group
Security Postureunencrypted_ebs_volume, unencrypted_rds_instance, rds_no_deletion_protection, rds_publicly_accessible, unencrypted_efs_filesystem, s3_no_default_encryption, dynamodb_no_deletion_protection, opensearch_no_encryption_at_rest, unencrypted_documentdb_cluster, resource_without_backup_coverage
ECRold_ecr_images, untagged_ecr_images, ecr_no_lifecycle_policy, no_lifecycle_policy
CloudTrailduplicate_cloudtrail, cloudtrail_s3_no_lifecycle
S3incomplete_multipart, s3_rapid_growth, s3_wrong_storage_class, s3_empty_bucket, s3_high_request_and_transfer_cost
Messagingidle_msk_cluster, oversized_msk_cluster, idle_mq_broker, oversized_mq_broker
Databasesidle_neptune, neptune_serverless_opportunity, oversized_neptune, neptune_old_snapshot, idle_documentdb, overprovisioned_documentdb, old_documentdb_snapshot, documentdb_extended_support_cost, idle_qldb, idle_timestream
Nicheidle_beanstalk, beanstalk_idle_traffic, beanstalk_unnecessary_alb, beanstalk_previous_gen_instances, beanstalk_over_provisioned, beanstalk_orphaned_rds, idle_lightsail, lightsail_unattached_static_ip, lightsail_unattached_disk, lightsail_old_snapshot, lightsail_idle_load_balancer, lightsail_idle_database, idle_workspace, oversized_workspace, workspaces_autostop_opportunity, workspaces_pool_overprovisioned_capacity, workspaces_windows_license_optimization, idle_transfer_server, idle_transfer_no_activity, unused_transfer_protocol, idle_transfer_web_app, unused_accelerator, idle_global_accelerator, disabled_global_accelerator, unused_appsync, appsync_idle_cache, appsync_idle_subscriptions, unused_distribution, unused_hosted_zone, unused_kms_key
RI/SP & Optimizeroversized_ec2_optimizer, oversized_ebs_optimizer, oversized_lambda_optimizer, oversized_rds_optimizer, ri_opportunity_ec2, ri_opportunity_rds, ri_opportunity_elasticache, ri_opportunity_redshift, sp_opportunity_compute, sp_opportunity_ec2, sp_opportunity_sagemaker, savings_plan_coverage_gap, convertible_ri_exchange_opportunity, expiring_reserved_instance, expiring_savings_plan, unused_reserved_instance, unused_savings_plan, cur_unused_reservation, cur_savings_plan_waste

šŸ”’ Security Architecture​

Allow-List Enforcement​

Every API call in a remediation plan is validated against a strict allow-list before the action is created. If any call falls outside the allow-list, the entire plan is rejected. The allow-list covers 41 AWS services:

Allowed AWS API calls by service:

ServiceAllowed Actions
EC2stop_instances, start_instances, terminate_instances, release_address, allocate_address, disassociate_address, associate_address, delete_volume, create_volume, delete_snapshot, create_snapshot, modify_volume + describe operations
EC2 (NAT/VPC)delete_nat_gateway, delete_vpc_endpoints + describe operations
ELBv2delete_load_balancer, delete_target_group, describe_load_balancers, describe_target_health, describe_target_groups
Classic ELBdelete_load_balancer, describe_load_balancers
RDSstop_db_instance, modify_db_instance, modify_db_cluster, create_db_snapshot, describe_db_clusters + describe
S3abort_multipart_upload, put_bucket_lifecycle_configuration, delete_bucket_lifecycle, put_bucket_intelligent_tiering_configuration, delete_bucket, list_objects_v2, head_bucket + list/get
Lambdaupdate_function_configuration, delete_function, delete_provisioned_concurrency_config, put_provisioned_concurrency_config, list_provisioned_concurrency_configs + get
CloudWatch Logsput_retention_policy, delete_log_group + describe
CloudWatchdelete_dashboards + list
ECRbatch_delete_image, put_lifecycle_policy + describe/get
EFSput_lifecycle_configuration + describe
DynamoDBupdate_table + describe
ElastiCachedelete_cache_cluster, modify_cache_cluster, create_snapshot, describe_replication_groups, decrease_replica_count, increase_replica_count, list_tags_for_resource, modify_replication_group + describe
Kinesisdelete_stream, update_shard_count, decrease_stream_retention_period, update_stream_mode, deregister_stream_consumer + describe
Firehosedelete_delivery_stream, describe_delivery_stream
Gluedelete_dev_endpoint, delete_job, delete_crawler, update_job, get_job_runs, get_databases, get_tables + get
Step Functionsdelete_state_machine, list_state_machines, list_executions, describe_execution + describe
Backupdelete_recovery_point, update_recovery_point_lifecycle, update_backup_plan, delete_backup_selection, list_backup_plans, list_backup_selections, list_recovery_points_by_backup_vault, get_backup_plan + describe
Secrets Managerdelete_secret + describe
SageMakerstop_notebook_instance, start_notebook_instance, delete_notebook_instance, delete_endpoint, update_endpoint + describe (describe_notebook_instance, describe_endpoint, describe_endpoint_config)
API Gatewaydelete_rest_api, get_rest_api, get_rest_apis
CloudTraildelete_trail, describe_trails, get_trail_status
Elastic Beanstalkterminate_environment, update_environment, describe_configuration_settings + describe
Lightsailstop_instance, start_instance, delete_instance, release_static_ip, get_disk, create_disk_snapshot, delete_disk, create_disk_from_snapshot, get_instance_snapshot, delete_instance_snapshot, get_load_balancer, create_load_balancer, delete_load_balancer, get_relational_database, create_relational_database_snapshot, stop_relational_database, start_relational_database, delete_relational_database + get
WorkSpacesterminate_workspaces, stop_workspaces, start_workspaces, modify_workspace_properties + describe (describe_workspaces, describe_workspace_bundles, describe_workspaces_pools, describe_tags)
Transfer Familystop_server, start_server, delete_server, update_server, delete_web_app + describe/list
QLDBdelete_ledger + describe
Timestreamdelete_database, delete_table + describe
Global Acceleratordelete_accelerator, update_accelerator, list_listeners, list_endpoint_groups + describe
AppSyncdelete_graphql_api, delete_api_cache, update_api_cache, get_api_cache + get
CloudFrontdelete_distribution, update_distribution + get
Route 53delete_hosted_zone + get + list_resource_record_sets
KMSschedule_key_deletion, disable_key + describe
EMRterminate_job_flows, modify_instance_groups, put_auto_termination_policy, remove_auto_termination_policy, add_instance_groups + describe/list
Redshiftdelete_cluster, pause_cluster, resume_cluster, resize_cluster, create_cluster_snapshot, modify_cluster_parameter_group, create_scheduled_action, delete_scheduled_action + describe
MSK (Kafka)delete_cluster, update_broker_type + describe
MQdelete_broker, create_broker + describe
Neptunedelete_db_instance, create_db_cluster_snapshot, delete_db_cluster_snapshot, modify_db_instance, modify_db_cluster + describe
DocumentDBdelete_db_instance, delete_db_cluster_snapshot, modify_db_instance, create_db_cluster_snapshot, describe_db_cluster_snapshots + describe
FSxdelete_file_system, create_backup, delete_backup, update_file_system + describe
ECSupdate_service + describe
OpenSearchdelete_domain, update_domain_config + describe
What's NOT Allowed

The remediation system cannot:

  • Create or modify IAM roles, policies, or users
  • Modify VPCs, security groups, or network ACLs
  • Access S3 object data
  • Access database contents
  • Create new resources (except snapshots for rollback)
  • Modify DNS records
  • Change encryption settings

IAM Role Design​

The CloudWiseRemediationRole uses:

  • Scoped permissions — Only the specific API calls listed above
  • 15-minute session cap — STS sessions expire after 15 minutes
  • External ID — Prevents confused deputy attacks
  • Condition keys — Region-locked where applicable

Audit Trail​

Every remediation action records:

  • Who approved it (Cognito user ID)
  • When it was approved, executed, and completed
  • The exact API calls made
  • Pre-check results
  • Snapshot IDs created for rollback
  • Any errors encountered
  • Bedrock model and token usage (model ID, input/output tokens, latency)

šŸ“Š Action Lifecycle​

Each remediation action progresses through these statuses:

pending_approval → approved → executing → completed
ā†˜ rejected ā†˜ failed
ā†˜ expired ā†˜ rolled_back
ā†˜ blocked_by_policy
StatusDescription
Pending ApprovalAction proposed, waiting for your review. Expires after 72 hours.
ApprovedYou approved the action. Queued for execution.
RejectedYou rejected the action with a reason.
ExpiredAction was not reviewed within 72 hours.
ExecutingAction is currently running in your AWS account.
CompletedAction executed successfully. Savings realized.
FailedExecution encountered an error. Pre-checks prevent partial execution — if a pre-check fails, no changes are made.
Rolled BackAction was executed but then rolled back (e.g., snapshot restored).
Blocked by PolicyAction was blocked by a policy change after initial proposal.

Bulk Actions​

You can approve multiple pending actions at once using the Approve All Low-Risk button in the remediation queue. This is useful for batch-approving low-risk actions after review.

Rollback​

Completed actions that created a snapshot (e.g., before deleting an EBS volume) can be rolled back. The rollback restores the resource from the snapshot. For actions using soft-delete mechanisms (e.g., Secrets Manager's delete_secret with a recovery window, or KMS schedule_key_deletion with a waiting period), you can cancel the deletion during the recovery/waiting period.


šŸ’° Savings Tracking​

The remediation queue includes a Savings Widget that shows:

  • Realized savings — Monthly savings from completed actions
  • Projected savings — Estimated savings from pending actions
  • Savings by waste type — Breakdown of savings by category
  • Savings by account — Breakdown across AWS accounts
  • Trend chart — Savings over time

šŸ”§ Troubleshooting​

"No approvable remediation actions appear"​

This covers the one-click, approvable actions (Agentic). Guided fix instructions for each finding appear on the Remediation page on every plan — if those are missing, it's a waste-detection issue (see step 5), not a tier issue.

  1. Check your tier — Executing fixes requires the Agentic plan or above. On Free/Shield the Remediation page shows the same findings with Console/CLI steps to run yourself, but no Approve & fix button
  2. Check the remediation toggle — Go to Settings → Remediation Policy and ensure the engine is enabled
  3. Check confidence threshold — Your minimum confidence level might be filtering out findings. Try lowering it to Medium or Low in Settings → Remediation Policy
  4. Check risk threshold — Your maximum risk level might be too restrictive. A setting of Low only proposes the safest actions. Try Medium for a balance of safety and coverage
  5. Check waste detection — Remediation plans are generated from waste findings. If no waste is detected, no remediation actions are created
  6. Check exclusions — You may have excluded the waste types or tagged resources being detected
  7. Wait for the next scan — Remediation plans are generated during the daily processing cycle (or on-demand when you trigger a scan). New actions appear after the scan completes

"Action failed to execute"​

  1. Check the CloudFormation stack — Ensure CloudWise-Remediation-Role is deployed in the target AWS account
  2. Check the error message — Click on the failed action to see the specific error
  3. Check resource state — The resource may have been modified or deleted since the scan. Pre-checks detect this and fail gracefully
  4. Check IAM permissions — The remediation role may be missing the required permissions for that specific action

"Action was blocked by policy"​

This means your remediation policy was updated after the action was proposed, and the new policy excludes it. Update your policy settings if this was unintentional.

CloudFormation deployment issues​

ErrorSolution
CloudWise-Remediation-Role already existsThe stack already exists. Go to CloudFormation → Stacks and update or delete the existing stack.
MaxSessionDuration must be >= 3600You're using an old template. Download the latest from Settings → Remediation Policy.
Invalid principal in policyEnsure you're deploying in the correct AWS account. The template trusts the CloudWise service account.

ā“ FAQ​

Can I fix waste without the Agentic plan?​

Yes. On every plan, the Remediation page lists your waste findings with each one's recommendation and the exact Console and CLI steps to fix it — you run them yourself in your own AWS account. The Agentic plan adds one-click Approve & fix, which executes the same plan for you (with pre-checks and rollback). Same findings, same risk/confidence model; the difference is whether CloudWise runs the steps or you do.

Is any action ever executed automatically?​

No. Every executed action requires explicit approval. There is no "auto-approve" mode. Even the Bulk Approve button requires you to click it. (On Free/Shield nothing is executed at all — you run the provided steps yourself.)

What's the difference between confidence and risk?​

Confidence is about the finding — how sure CloudWise is that a resource is actually waste. A volume unattached for 90 days is high confidence waste. A log group missing a retention policy is low confidence — it might be intentional.

Risk is about the action — how impactful the remediation would be. Deleting an unattached EBS volume is low risk. Stopping a running EC2 instance is high risk, even if we're very confident it's idle.

You can control each independently via Settings → Remediation Policy.

What happens if I approve an action for a resource that no longer exists?​

The pre-check step will detect that the resource is missing and the action will fail gracefully with an appropriate error message. No partial changes are made.

Can I undo a completed action?​

If the action created a snapshot before execution (e.g., before deleting an EBS volume), you can roll it back from the action detail view. For actions without snapshots (e.g., stopping an instance), you can restart the resource manually from the AWS Console. Some services support soft-delete with recovery windows (e.g., Secrets Manager, KMS).

How much does it cost in Bedrock usage?​

CloudWise uses Amazon Bedrock (Anthropic Claude) to generate plans. The cost is included in your Agentic subscription — you are not charged separately for Bedrock usage.

Does CloudWise store my AWS credentials?​

No. CloudWise uses IAM cross-account roles with STS temporary credentials. No long-term credentials are stored. Sessions expire after 15 minutes.

How often are new remediation actions generated?​

Remediation plans are generated during each processing cycle (runs daily, or on-demand when you trigger a scan). Actions for findings that already have a pending action are skipped to avoid duplicates.

Can I use remediation with Air-Gapped Mode?​

No. AI Remediation requires a cross-account IAM role to execute actions in your AWS account. Air-Gapped Mode is read-only by design.

How many waste types does remediation support?​

CloudWise can generate remediation plans for all 191 active waste types across 45 AWS services. Every plan is validated against a strict API allow-list before being proposed.