Sägetstrasse 18, 3123 Belp, Switzerland +41 79 173 36 84 info@ict.technology

    Terraform @Scale - Part 3b: Blast Radius Recovery Strategies

    Despite careful blast radius minimisation, segmented states and lifecycle guardrails, it can happen sooner or later: a terraform apply accidentally deletes production resources, or a terraform destroy affects more than intended.

    What to do once the damage is already done?

    In the previous article of this series, I explained how to minimise the blast radius. In this follow-up, I will show proven techniques for restoring damaged Terraform states and limiting the impact after an incident.

    1. Open-heart first aid: State Surgery

    If resources still physically exist but are no longer properly referenced in the state file, the only remedy is a surgical procedure via the command line. So-called state surgery allows you to manually clean up Terraform states and realign them with reality. Terraform at scale 8b

    ⚠️ Important: This approach requires deep understanding of Terraform logic and the actual infrastructure state. It is powerful but also dangerous if used carelessly. Always create a full backup of the current state file before making manual changes!

    You can back up the current state using the following CLI command:


    $> terraform state pull > state-backup-$(date +%Y%m%d-%H%M%S).json
    

     Afterwards, you can selectively remove resources from the state without deleting the associated cloud resource. In this example, the affected resource is called aws_instance.problematic_instance:


    $> terraform state rm aws_instance.problematic_instance

    Now create a new resource in your Terraform manifest for the recovered resource (here we use the name aws_instance.recovered_instance). This can be a completely empty resource or a full copy and paste of the previous one, depending on what exactly was broken.
    Then import the affected resource using its unique ID (here as an example i-1234567890abcdef0) under the new name:


    $> terraform import aws_instance.recovered_instance i-1234567890abcdef0
    

     Alternatively, you can move an existing resource in the new state to a new position. This could be another part of the infrastructure, or a different Terraform module:


    $> terraform state mv aws_instance.misplaced_resource aws_instance.correct_scope

     

    2. Gradual Reconstruction: Partial Import Strategies

    In complex scenarios - for example, after a partial failure in a large state - a simple import is often not sufficient. Instead, a structured, step-by-step restoration using Partial Import Configurations (available since Terraform 1.5) is recommended.

    Procedure

    a. Identify affected resources using -detailed-exitcode

    $> terraform plan -detailed-exitcode

    The command terraform plan -detailed-exitcode is a special variant of the standard terraform plan command. It provides more detailed exit codes, which is especially useful in automation and CI/CD pipelines.

    Functionality:

    • terraform plan normally shows which changes Terraform would make to your infrastructure without actually applying anything.
    • With the -detailed-exitcode flag, the behaviour of the exit codes changes so that the result of the plan is displayed more precisely:
    Exit CodeMeaning
    0 No changes; the infrastructure matches the configuration.
    1 An error occurred (e.g., syntax error, provider failure, etc.).
    2 Changes are pending; the proposed plan would modify the infrastructure.

     

    Note: An exit code of 2 indicates changed resources. These require special attention.

    This function is not always mentioned in simple tutorials, but it is widely used in teams automating Terraform workflows and enables robust and predictable infrastructure changes.

    b. Define import configuration

    Now create a configuration for the affected resources:


    import {
      to = aws_vpc.recovered_vpc
      id = "vpc-12345678"
    }
    
    import {
      to = aws_subnet.recovered_subnet[0]
      id = "subnet-12345678"
    }

    c. Execute automated import and config generation using -generate-config-out

    The command terraform plan -generate-config-out=<file> is an experimental feature introduced in Terraform v1.5. It allows you to generate Terraform configuration files (in HCL format) for resources that are defined in import blocks but do not yet exist in your configuration.

    This feature is particularly useful when importing existing infrastructure into Terraform, as it helps you generate the necessary configuration code based on the actual state of the resources.


    $> terraform plan -generate-config-out=generated.tf
    [...] $> terraform apply

     Terraform performs the following:

    • It examines the import blocks in the configuration.
    • For each resource to be imported that has no existing configuration, Terraform generates an HCL resource block with the most appropriate set of arguments and values.
    • The generated configuration is written to the specified file (in this example generated.tf). You must provide a new file path - using an existing file will cause an error.
    • After executing the command, Terraform displays the plan for importing your resources and indicates where the generated configuration was saved.

    terraform plan -generate-config-out=<file> is a powerful, experimental tool for generating Terraform configurations from existing resources defined in import blocks. It makes it easier to bring unmanaged infrastructure under Terraform management. Always review and adapt the generated code before using it.

    d. Reality check: State reconciliation using -refresh-only and -refresh=true

    Terraform at scale 8dIf the Terraform state differs from reality but no resources are missing, reconciling the state with the actual infrastructure helps. This so-called refresh can uncover drift and in some cases automatically correct it.


    # State refresh without changes
    $> terraform apply -refresh-only
    
    # Compare state and actual infrastructure
    $> terraform plan -refresh=true

    The command terraform plan -refresh=true creates an execution plan for your Terraform-managed infrastructure, explicitly instructing Terraform to update the state before planning (refresh).

    Behaviour details

    • By default, Terraform automatically updates the state file when running terraform plan to synchronise it with the actual external infrastructure before evaluating changes.
    • With the -refresh=true flag, you explicitly instruct Terraform to perform this update step. This is normally redundant since it is the default behaviour, but it ensures that all changes made outside Terraform (e.g., manual changes in the cloud console) are detected and reflected in the plan.
    • During the refresh, Terraform queries all managed resources to update the state file with their current status. Then, Terraform compares this updated state with your configuration files to show which actions (add, change, delete) are needed to bring the actual infrastructure in line with the desired configuration.

    Comparison with -refresh=false

    OptionBehaviour
    -refresh=true State is updated with the real infrastructure before planning (default behaviour).
    -refresh=false State update is skipped, which can speed up planning but misses external changes.

     

    e. Drift detection in scripted workflows using  -detailed-exitcode

    The command terraform plan -detailed-exitcode is used to create an execution plan in Terraform - with a special behaviour regarding return values (exit codes). This technique is ideal for use in CI/CD pipelines or as a pre-check before a production terraform apply.


    #!/bin/bash
    terraform plan -detailed-exitcode
    if [ $? -eq 2 ]; then
        echo "Configuration drift detected – manual review required"
    fi

    Explanation of exit codes

    When you run terraform plan -detailed-exitcode, Terraform will:

    • Display the planned actions (add, change, delete) that would be required to align your infrastructure with the configuration, without applying any changes.
    • Return a specific exit code based on the result of the plan:
    Exit CodeMeaning
    0 No changes; the infrastructure matches the configuration.
    1 An error occurred (e.g., syntax error, provider issue).
    2 Changes need to be applied (resources will be added, modified, or deleted).

     

    Conclusion

    As you could see in this article, even a runaway terraform destroy, despite all precautions, is not the end of the world.
    On the contrary, in the pre-Terraform IT age the impact was usually far worse, since tools like those Terraform integrates did not yet exist. Instead, it usually meant: "We’re offline for now, rebuild the service manually from scratch over the next 1 or 2 days, and then we’ll figure out what actually went wrong."

    However, the fact remains that even with Terraform, restoring broken operational states and misconfigured or destroyed resources is still technically demanding and requires expert knowledge.

    In the next part of this article series, we will therefore look at which tools Terraform provides to help detect and prevent blast radius issues early on.