Even the most sophisticated infrastructure architecture cannot prevent every error. That is why it is essential to monitor Terraform operations proactively - especially those with potentially destructive impact. The goal is to detect critical changes early and trigger automated alerts before an uncontrolled blast radius occurs.
Sure - your system engineer will undoubtedly point out that Terraform displays the full plan before executing an apply, and that execution must be confirmed by entering "yes".
What your engineer does not mention: they do not actually read the plan before allowing it to proceed.
“It'll be fine.”
Early Warning System: Automated Plan Analysis
Terraform provides a way to evaluate plan information programmatically using the -json flag. This allows detection of planned deletions (destroy) and the automated initiation of appropriate actions, such as a Slack alert or automatic termination of the CI/CD pipeline.
An alternative early indicator is the return value of terraform plan -detailed-exitcode: an exit code 2 signals planned changes, including planned deletions.
Example: Bash Script for Plan Evaluation
This script can be integrated as a hook into the CI/CD pipeline. If planned deletions are detected, immediate notification follows - or optionally, an automatic stop of the rollout.
An example script for reference:
#!/bin/bash # Automated Plan Analysis Script set -e # Exit on any error # Create the Terraform plan and export it in JSON format terraform plan -out=tfplan -detailed-exitcode PLAN_EXIT_CODE=$? # Check if there are changes (exit code 2) if [ $PLAN_EXIT_CODE -eq 2 ]; then terraform show -json tfplan > plan.json # Analyze planned deletions with more robust jq query DELETIONS=$(jq -r '.resource_changes[]? | select(.change.actions[]? == "delete") | .address' plan.json 2>/dev/null) if [ -n "$DELETIONS" ]; then echo "BLAST RADIUS ALERT: Planned deletions detected:" echo "$DELETIONS" | while read -r resource; do echo " - $resource" done # Send alert with proper error handling if ! curl -f -X POST "https://hooks.slack.com/services/YOUR/SLACK/WEBHOOK" \ -H 'Content-type: application/json' \ --data "{\"text\":\"ALERT: Terraform Destroy detected in $WORKSPACE:\\n$DELETIONS\"}"; then echo "Warning: Failed to send Slack notification" fi exit 2 fi fi
Cloud-based Log Monitoring with Alerting
For production environments, centralized, cloud-native monitoring is recommended. This can be implemented, for example, via Splunk running locally in your data center. Or through cloud services such as AWS CloudWatch or Oracle Logging. The goal is to capture suspicious log entries containing destructive keywords like “destroy” and trigger real-time alerts.
Note: The following examples are provided for guidance and include the necessary resource declarations, but are not yet fully operational end-to-end. Missing elements such as versions.tf and variables.tf are left to the sufficiently skilled reader.
Example: AWS CloudWatch Integration
The alerts can be connected directly to an aws_sns_topic, which in turn can send notifications via email, Slack, PagerDuty or other systems. This ensures that no critical terraform destroy goes unnoticed.
provider "aws" { region = "eu-central-1" } resource "aws_cloudwatch_log_group" "terraform_logs" { name = "/terraform/cicd" retention_in_days = 7 tags = { Environment = "production" Purpose = "terraform-monitoring" } } resource "aws_cloudwatch_metric_filter" "terraform_destroy_filter" { name = "terraform-destroy-keyword" log_group_name = aws_cloudwatch_log_group.terraform_logs.name pattern = "\"destroy\"" metric_transformation { name = "DestroyMatches" namespace = "Terraform/CI" value = "1" unit = "Count" } } resource "aws_sns_topic" "alerts" { name = "terraform-blast-radius-alerts" tags = { Environment = "production" Purpose = "terraform-alerts" } } resource "aws_sns_topic_subscription" "email_alert" { topic_arn = aws_sns_topic.alerts.arn protocol = "email" endpoint = var.alert_email } resource "aws_cloudwatch_metric_alarm" "blast_radius_alarm" { alarm_name = "Terraform-Destroy-Detected" alarm_description = "Detects destroy operations in Terraform CI output" comparison_operator = "GreaterThanThreshold" evaluation_periods = 1 threshold = 0 metric_name = "DestroyMatches" namespace = "Terraform/CI" statistic = "Sum" period = 60 treat_missing_data = "notBreaching" insufficient_data_actions = [] alarm_actions = [aws_sns_topic.alerts.arn] ok_actions = [aws_sns_topic.alerts.arn] tags = { Environment = "production" Purpose = "blast-radius-monitoring" } }
Example: OCI Logging with Alerting
In Oracle Cloud Infrastructure, use the Logging service in combination with a logging query, an alarm and the Notifications service. This allows you to detect destructive actions like terraform destroy based on keywords in the CI/CD pipeline logstream or audit logs.
Configuration steps:
- Log Group for your build logs or audit logs
- Logging Search with a query such as data.message CONTAINS "destroy"
- Define an alarm that triggers on matches
- Connect to a notification topic (email, PagerDuty, etc.)
Example alarm using Terraform:
resource "oci_logging_log_group" "terraform_logs" { display_name = "terraform-ci-logs" compartment_id = var.compartment_id freeform_tags = { "Environment" = "production" "Purpose" = "terraform-monitoring" } } resource "oci_logging_log" "cicd_log" { display_name = "terraform-cicd-log" log_group_id = oci_logging_log_group.terraform_logs.id log_type = "CUSTOM" configuration { source { category = "write" resource = var.compartment_id service = "objectstorage" source_type = "OCISERVICE" } compartment_id = var.compartment_id } is_enabled = true retention_duration = 30 } resource "oci_ons_notification_topic" "alerts" { name = "terraform-destroy-alerts" compartment_id = var.compartment_id description = "Alerts for blast-radius related events" freeform_tags = { "Environment" = "production" "Purpose" = "terraform-alerts" } } resource "oci_ons_subscription" "email_subscription" { compartment_id = var.compartment_id topic_id = oci_ons_notification_topic.alerts.id protocol = "EMAIL" endpoint = var.alert_email } resource "oci_monitoring_alarm" "terraform_destroy_alarm" { display_name = "Terraform-Destroy-Detected" compartment_id = var.compartment_id metric_compartment_id = var.compartment_id query = <<-EOQ LoggingAnalytics[1m]{ logGroup = "${oci_logging_log_group.terraform_logs.display_name}", log = "${oci_logging_log.cicd_log.display_name}" } | where data.message =~ ".*destroy.*" | count() EOQ severity = "CRITICAL" body = "Terraform destroy operation detected in CI/CD pipeline!" is_enabled = true pending_duration = "PT1M" repeat_notification_duration = "PT15M" resolution = "1m" suppression { description = "Planned maintenance window" # time_suppress_from and time_suppress_until can be added if needed } destinations = [oci_ons_notification_topic.alerts.id] freeform_tags = { "Environment" = "production" "Purpose" = "blast-radius-monitoring" } }
Note: The logging query uses simple text search. For production environments, you may want to use more precise filters - such as regular expressions or structured log fields, assuming your CI tools produce structured logs.
Alternatively, the simpler LoggingSearch query engine may be used if Logging Analytics is not enabled in your tenancy.
Additional benefit: This method in OCI can also be extended to detect apply actions, policy violations or drifts, provided the logs are properly populated (e.g. via Terraform plan output, Sentinel warnings or audit events).
✅Checklist: Blast Radius Readiness
This checklist can help you build your infrastructure to be as resilient as possible.
✅ Preventive Measures
- [ ] States segmented by blast radius impact
- [ ] Lifecycle rules implemented for critical resources
- [ ] Remote state validations in place
- [ ] Policy-as-Code for destroy operations
- [ ] Automated plan analysis enabled
- [ ] Cross-state dependency mapping created
🚨 Preparations for Emergencies
- [ ] State backup strategy implemented
- [ ] Import scripts for critical resources tested
- [ ] Incident response playbooks available
- [ ] Team training for state surgery completed
- [ ] Monitoring and alerting for blast radius events active
✍️Careful Planning and Mindsets
Successful enterprise-level Terraform implementations also require:
- Proactive architecture: design states based on blast radius impact
- Defensive programming: implement guardrails and validations
- Monitoring and alerting: detect blast radius events early
- Recovery preparedness: be ready for critical situations
Conclusion: Controlled Explosions Instead of Chaos
Important: Blast radius management is not a one-time setup, but a continuous process.
The key is to strike the right balance between flexibility and control - just like the Goldilocks principle, which we have discussed in detail in a previous article.
Because the best explosion is the one that never happens.