Skip to main content

Air-Gapped Mode: Client-Side Anonymization

For organizations with strict Data Loss Prevention (DLP) policies, CloudWise provides client-side anonymization scripts that replace sensitive AWS identifiers with deterministic hashes before data leaves your network.

Data Retention by Tier
TierRetentionPersistent Salt
Free / Shield7 days❌ No (regenerated each upload)
Compliance365 days✅ Yes (consistent across uploads)

For quarterly trend analysis and audit reports, consider upgrading to Compliance tier.

Overview

Client-side anonymization ensures that:

  • AWS Account IDs (12-digit numbers) are replaced with acct_xxxxxxxxxxxx format
  • Resource IDs (EC2 instances, EBS volumes, etc.) are replaced with res_xxxxxxxxxxxx format
  • Tag Keys (custom tag names) are replaced with tag_xxxxxxxxxxxx format to prevent DLP leaks
  • ARNs are preserved structurally but with hashed components
  • Cost analysis remains accurate because hashing is deterministic
  • No sensitive identifiers leave your network

Quick Start

Option 1: Use the Export Script with --anonymize

The CloudWise export script supports anonymization directly:

# Download the script and verify its checksum
curl -sLO https://cloudcostwise.io/scripts/cloudwise-export.sh
curl -sL https://cloudcostwise.io/scripts/SHA256SUMS | grep cloudwise-export.sh | (sha256sum -c 2>/dev/null || shasum -a 256 -c)

# Run with anonymization enabled
bash cloudwise-export.sh --anonymize

# Or with a custom salt for reproducible hashes
bash cloudwise-export.sh --anonymize --salt "your-secret-salt"

Option 2: Anonymize an Existing CUR File

If you already have a CUR CSV file, use the standalone anonymization script:

# Download the anonymization script and verify its checksum
curl -sLO https://cloudcostwise.io/scripts/cloudwise-anonymize-cur.py
curl -sL https://cloudcostwise.io/scripts/SHA256SUMS | grep cloudwise-anonymize-cur.py | (sha256sum -c 2>/dev/null || shasum -a 256 -c)

# Anonymize your CUR file (requires Python 3)
python3 cloudwise-anonymize-cur.py your-cur-file.csv -o anonymized-cur.csv

# (Optional) Generate a mapping file for internal reference
python3 cloudwise-anonymize-cur.py your-cur-file.csv -o anonymized-cur.csv --output-mapping mapping.csv

Anonymization Details

What Gets Anonymized

Original FormatAnonymized FormatExample
AWS Account ID (12-digit)acct_[12-char-hash]123456789012acct_a1b2c3d4e5f6
EC2 Instance IDres_[12-char-hash]i-1234567890abcdef0res_e5f6g7h8i9j0
EBS Volume IDres_[12-char-hash]vol-0abc123def456789res_f9e8d7c6b5a4
Other Resource IDsres_[12-char-hash]sg-0123456789abcdef0res_b2c3d4e5f6g7
Tag Keys (custom)tag_[12-char-hash]Project-Manhattantag_h8i9j0k1l2m3
ARN Account Componentacct_[12-char-hash]arn:aws:ec2:us-east-1:123456789012:...arn:aws:ec2:us-east-1:acct_a1b2c3d4e5f6:...
Hash Length Change (v1.7.0)

As of version 1.7.0, hashes are 12 characters (up from 8) to reduce collision risk at enterprise scale (50k+ resources).

How Hashing Works

Anonymization uses SHA256 hashing with a user-provided (or auto-generated) salt:

hash = SHA256(salt + original_value)
anonymized_id = prefix + first_12_characters_of_hash

Because hashing is deterministic:

  • The same account ID always produces the same hash (with the same salt)
  • Cross-referencing between uploads works correctly
  • Cost aggregation by account remains accurate

Salt Management

The salt is a secret value that makes your hashes unique:

  • Auto-generated salt: If you don't provide a salt, one is generated automatically
  • Custom salt: Use --salt "your-secret" for reproducible results across multiple exports
  • Keep your salt secure: Anyone with your salt and hash could theoretically brute-force original values
caution

If you lose your salt, you won't be able to correlate new uploads with previous anonymized data.

Upload Validation

When you upload files to CloudWise, the upload wizard automatically detects whether your data is anonymized:

Anonymization Status Badges

BadgeMeaning
🛡️ Anonymized Data (green)All identifiers are anonymized. Safe for DLP compliance.
⚠️ Raw Data Detected (amber)File contains raw AWS identifiers. Consider anonymizing first.
🔶 Mixed Data (orange)File contains both raw and anonymized identifiers.
Unknown (gray)Could not determine anonymization status.

Validation Step

Before uploading, you'll see a validation step that shows:

  1. Detected mode (raw, anonymized, mixed)
  2. Sample identifiers found in your files
  3. Warning if raw data is detected
  4. Option to proceed or go back and anonymize

Mapping Files

When anonymizing data, you can optionally generate a mapping file:

./anonymize-cur.sh input.csv -o output.csv --output-mapping mapping.csv

The mapping file contains:

original_account_id,anonymized_account_id
123456789012,acct_a1b2c3d4
987654321098,acct_f9e8d7c6
warning

Never upload the mapping file to CloudWise. Keep it secure in your internal systems for reference only.

Best Practices

For DLP Compliance

  1. Always use --anonymize when exporting data for CloudWise
  2. Store your salt in a secure location (password manager, secrets manager)
  3. Verify the output before uploading by checking for acct_ and res_ prefixes
  4. Delete original files after anonymizing if required by your DLP policy

For Consistent Analysis

  1. Use the same salt across all exports to maintain identifier consistency
  2. Keep a local mapping file for internal incident investigation
  3. Document your anonymization process for audit purposes

Troubleshooting

"Raw Data Detected" Warning

If you see this warning during upload:

  1. Go back to the upload step
  2. Remove the current files
  3. Re-export using --anonymize flag or run the anonymization script
  4. Upload the anonymized files

Hash Inconsistency Between Uploads

If the same account shows different hashes:

  1. Verify you're using the same salt value
  2. Check that the salt wasn't accidentally modified
  3. Consider re-anonymizing all historical data with a new consistent salt

Performance Issues with Large Files

For very large CUR files (>1GB):

# The script processes in streaming mode, but you can optimize:
# 1. Split your CUR file by month if possible
# 2. Run on a machine with more memory
# 3. Use the script directly (not piped through bash)
./anonymize-cur.sh large-file.csv -o output.csv

FAQ

Q: Why does my data expire after 7 days?

Air-Gapped Mode is designed for evaluation purposes. The 7-day data retention helps us keep infrastructure costs manageable for this free evaluation feature. For permanent data retention with automated monitoring, connect your AWS account to unlock Connected Mode.

Q: Can I extend the 7-day retention period?

Not in Air-Gapped Mode. If you need persistent data storage and historical tracking, please connect your AWS account. Connected Mode provides unlimited data retention based on your subscription tier.

Q: Can CloudWise reverse the anonymization?

No. SHA256 is a one-way hash. Without your salt and original data, the anonymized values cannot be reversed.

Q: Do I lose any analysis capabilities with anonymized data?

No. All cost analysis, waste detection, and optimization recommendations work the same way. Only the displayed identifiers are different.

Q: What if I need to investigate a specific resource?

Use your local mapping file to translate the anonymized ID back to the original AWS resource ID.

Q: Is the salt stored by CloudWise?

No. The salt never leaves your system. CloudWise only sees the already-anonymized data.

Q: Are tag keys also anonymized?

Yes, as of version 1.7.0. Custom tag keys (like Project-Manhattan or Client-Acme) are now anonymized to tag_xxxxxxxxxxxx format to prevent DLP leaks. Standard tag keys like Name, Environment, Owner, Team, and CostCenter are preserved for usability.


See Also