Terraform @Scale - Part 3b: Blast Radius Recovery Strategies

Details: Read Time: 7 mins; Created: 04 July 2025

Despite careful blast radius minimisation, segmented states and lifecycle guardrails, it can happen sooner or later: a terraform apply accidentally deletes production resources, or a terraform destroy affects more than intended.

What to do once the damage is already done?

In the previous article of this series, I explained how to minimise the blast radius. In this follow-up, I will show proven techniques for restoring damaged Terraform states and limiting the impact after an incident.

1. Open-heart first aid: State Surgery

If resources still physically exist but are no longer properly referenced in the state file, the only remedy is a surgical procedure via the command line. So-called state surgery allows you to manually clean up Terraform states and realign them with reality. Terraform at scale 8b

⚠️ Important: This approach requires deep understanding of Terraform logic and the actual infrastructure state. It is powerful but also dangerous if used carelessly. Always create a full backup of the current state file before making manual changes!

You can back up the current state using the following CLI command:

$> terraform state pull > state-backup-$(date +%Y%m%d-%H%M%S).json

Afterwards, you can selectively remove resources from the state without deleting the associated cloud resource. In this example, the affected resource is called aws_instance.problematic_instance:

$> terraform state rm aws_instance.problematic_instance

Now create a new resource in your Terraform manifest for the recovered resource (here we use the name aws_instance.recovered_instance). This can be a completely empty resource or a full copy and paste of the previous one, depending on what exactly was broken.
Then import the affected resource using its unique ID (here as an example i-1234567890abcdef0) under the new name:

$> terraform import aws_instance.recovered_instance i-1234567890abcdef0

Alternatively, you can move an existing resource in the new state to a new position. This could be another part of the infrastructure, or a different Terraform module:

$> terraform state mv aws_instance.misplaced_resource aws_instance.correct_scope

2. Gradual Reconstruction: Partial Import Strategies

In complex scenarios - for example, after a partial failure in a large state - a simple import is often not sufficient. Instead, a structured, step-by-step restoration using Partial Import Configurations (available since Terraform 1.5) is recommended.

Procedure

a. Identify affected resources using -detailed-exitcode

$> terraform plan -detailed-exitcode

The command terraform plan -detailed-exitcode is a special variant of the standard terraform plan command. It provides more detailed exit codes, which is especially useful in automation and CI/CD pipelines.

Functionality:

terraform plan normally shows which changes Terraform would make to your infrastructure without actually applying anything.
With the -detailed-exitcode flag, the behaviour of the exit codes changes so that the result of the plan is displayed more precisely:

Exit Code	Meaning
0	No changes; the infrastructure matches the configuration.
1	An error occurred (e.g., syntax error, provider failure, etc.).
2	Changes are pending; the proposed plan would modify the infrastructure.

Note: An exit code of 2 indicates changed resources. These require special attention.

This function is not always mentioned in simple tutorials, but it is widely used in teams automating Terraform workflows and enables robust and predictable infrastructure changes.

b. Define import configuration

Now create a configuration for the affected resources:

import {
  to = aws_vpc.recovered_vpc
  id = "vpc-12345678"
}

import {
  to = aws_subnet.recovered_subnet[0]
  id = "subnet-12345678"
}

c. Execute automated import and config generation using -generate-config-out

The command terraform plan -generate-config-out=<file> is an experimental feature introduced in Terraform v1.5. It allows you to generate Terraform configuration files (in HCL format) for resources that are defined in import blocks but do not yet exist in your configuration.

This feature is particularly useful when importing existing infrastructure into Terraform, as it helps you generate the necessary configuration code based on the actual state of the resources.

$> terraform plan -generate-config-out=generated.tf
[...]
$> terraform apply

Terraform performs the following:

It examines the import blocks in the configuration.
For each resource to be imported that has no existing configuration, Terraform generates an HCL resource block with the most appropriate set of arguments and values.
The generated configuration is written to the specified file (in this example generated.tf). You must provide a new file path - using an existing file will cause an error.
After executing the command, Terraform displays the plan for importing your resources and indicates where the generated configuration was saved.

terraform plan -generate-config-out=<file> is a powerful, experimental tool for generating Terraform configurations from existing resources defined in import blocks. It makes it easier to bring unmanaged infrastructure under Terraform management. Always review and adapt the generated code before using it.

d. Reality check: State reconciliation using -refresh-only and -refresh=true

Terraform at scale 8d If the Terraform state differs from reality but no resources are missing, reconciling the state with the actual infrastructure helps. This so-called refresh can uncover drift and in some cases automatically correct it.

# State refresh without changes
$> terraform apply -refresh-only

# Compare state and actual infrastructure
$> terraform plan -refresh=true

The command terraform plan -refresh=true creates an execution plan for your Terraform-managed infrastructure, explicitly instructing Terraform to update the state before planning (refresh).

Behaviour details

By default, Terraform automatically updates the state file when running terraform plan to synchronise it with the actual external infrastructure before evaluating changes.
With the -refresh=true flag, you explicitly instruct Terraform to perform this update step. This is normally redundant since it is the default behaviour, but it ensures that all changes made outside Terraform (e.g., manual changes in the cloud console) are detected and reflected in the plan.
During the refresh, Terraform queries all managed resources to update the state file with their current status. Then, Terraform compares this updated state with your configuration files to show which actions (add, change, delete) are needed to bring the actual infrastructure in line with the desired configuration.

Comparison with -refresh=false

Option	Behaviour
-refresh=true	State is updated with the real infrastructure before planning (default behaviour).
-refresh=false	State update is skipped, which can speed up planning but misses external changes.

e. Drift detection in scripted workflows using -detailed-exitcode

The command terraform plan -detailed-exitcode is used to create an execution plan in Terraform - with a special behaviour regarding return values (exit codes). This technique is ideal for use in CI/CD pipelines or as a pre-check before a production terraform apply.

#!/bin/bash

terraform plan -detailed-exitcode
if [ $? -eq 2 ]; then
    echo "Configuration drift detected – manual review required"
fi

Explanation of exit codes

When you run terraform plan -detailed-exitcode, Terraform will:

Display the planned actions (add, change, delete) that would be required to align your infrastructure with the configuration, without applying any changes.
Return a specific exit code based on the result of the plan:

Exit Code	Meaning
0	No changes; the infrastructure matches the configuration.
1	An error occurred (e.g., syntax error, provider issue).
2	Changes need to be applied (resources will be added, modified, or deleted).

Conclusion

As you could see in this article, even a runaway terraform destroy, despite all precautions, is not the end of the world.
On the contrary, in the pre-Terraform IT age the impact was usually far worse, since tools like those Terraform integrates did not yet exist. Instead, it usually meant: "We’re offline for now, rebuild the service manually from scratch over the next 1 or 2 days, and then we’ll figure out what actually went wrong."

However, the fact remains that even with Terraform, restoring broken operational states and misconfigured or destroyed resources is still technically demanding and requires expert knowledge.

In the next part of this article series, we will therefore look at which tools Terraform provides to help detect and prevent blast radius issues early on.

Ralf Ramge

Founder, Cloud Architect & IT Consultant

Terraform @ Scale - Part 5a: Understanding API Limits

Terraform @ Scale - Part 4b: Best Practices for Scaling Data Sources

Terraform @ Scale - Part 4a: Data Sources are Dangerous!

Terraform @ Scale - Part 3c: Monitoring and Alerting for Blast Radius Events

HashiCorp Vault Deep Dive – Part 2b: Practical Work with the Key/Value Secrets Engine

Terraform @Scale - Part 3b: Blast Radius Recovery Strategies

HashiCorp Vault Deep Dive - Part 2a: Activating the Key/Value Secrets Engine

Terraform @ Scale - Part 3a: Blast-Radius Management

HashiCorp Vault Deep Dive - Part 1: Fundamentals of Secret Engines

Terraform @Scale - Part 3b: Blast Radius Recovery Strategies

1. Open-heart first aid: State Surgery

2. Gradual Reconstruction: Partial Import Strategies

Procedure

a. Identify affected resources using -detailed-exitcode

b. Define import configuration

c. Execute automated import and config generation using -generate-config-out

d. Reality check: State reconciliation using -refresh-only and -refresh=true

e. Drift detection in scripted workflows using -detailed-exitcode

Conclusion

Ralf Ramge

ICT.technology

Terraform @ Scale - Part 5a: Understanding API Limits

The Certificate Bomb is Ticking: The 200-day Deadline Threatens Your Business!

Terraform @ Scale - Part 4b: Best Practices for Scaling Data Sources

Terraform @ Scale - Part 4a: Data Sources are Dangerous!

Terraform @ Scale - Part 3c: Monitoring and Alerting for Blast Radius Events

HashiCorp Vault Deep Dive – Part 2b: Practical Work with the Key/Value Secrets Engine

Terraform @Scale - Part 3b: Blast Radius Recovery Strategies

HashiCorp Vault Deep Dive - Part 2a: Activating the Key/Value Secrets Engine

Terraform @ Scale - Part 3a: Blast-Radius Management

HashiCorp Vault Deep Dive - Part 1: Fundamentals of Secret Engines

Terraform @Scale - Part 3b: Blast Radius Recovery Strategies

1. Open-heart first aid: State Surgery

2. Gradual Reconstruction: Partial Import Strategies

Procedure

a. Identify affected resources using -detailed-exitcode

b. Define import configuration

c. Execute automated import and config generation using -generate-config-out

d. Reality check: State reconciliation using -refresh-only and -refresh=true

e. Drift detection in scripted workflows using -detailed-exitcode

Conclusion

Ralf Ramge

ICT.technology