Sägetstrasse 18, 3123 Belp, Switzerland +41 79 173 36 84 info@ict.technology

    Terraform @ Scale - Part 5a: Understanding API Limits

    It is 2:30 p.m. on a regular Tuesday afternoon. The DevOps team of a Swiss financial services provider routinely launches its Terraform pipeline for the monthly disaster recovery testing. 300 virtual machines, 150 load balancer backends, 500 DNS entries, and countless network rules are to be provisioned in the backup region.
    After 5 minutes, the pipeline fails. HTTP 429: Too Many Requests.
    The next 3 hours are spent by the team manually cleaning up half-provisioned resources, while management nervously watches the clock.
    The DR test has failed before it even began.

    What happened? The team had fallen into one of the most insidious traps of cloud automation: API rate limits. While Terraform was busily trying to create hundreds of resources in parallel, the cloud provider had pulled the emergency brake after the first 100 requests per minute. A problem that had never occurred with 10 VMs became an insurmountable wall with 300 VMs.

    The Invisible Wall: Understanding API limits

    Every major cloud provider implements API rate limits to ensure the stability of its services. These limits are not an arbitrary restriction, but a necessity. Without them, a single faulty Terraform run could bring down the API endpoints for all customers of a provider.

    Oracle Cloud Infrastructure implements API rate limiting, with IAM returning a 429 error code when limits are exceeded. The specific limits vary depending on the service and are not documented globally, but implemented per service. OCI sets service limits for each tenancy, which are agreed upon with the Oracle sales representative at purchase or automatically applied as standard or trial limits.

    AWS implements a token bucket system for EC2 API throttling. For RunInstances, the resource token bucket size is 1000 tokens with a refill rate of two tokens per second. This means you can immediately launch 1000 instances and then up to two instances per second. API actions are grouped into categories, with Non-mutating Actions (Describe*, List*, Search*, Get*) typically having the highest API throttling limits.
    [Note: Non-mutating Actions are API operations that do not cause any changes to the cloud state but only read data. They are typically rate-limited higher by AWS (and other providers) than write actions, because they do not create, modify, or delete resources and are therefore less critical for the stability of backend systems.]

    Terraform itself exacerbates this problem through its parallelization. By default, Terraform attempts to create or modify up to 10 resources at the same time. For complex dependency graphs and many independent resources, this can quickly lead to dozens of simultaneous API calls. Multiplied by the data sources we discussed in previous articles in this series, this creates a so-called “perfect storm” of API requests.

    The parallelization paradox

    Terraform's default parallelization is a double-edged sword. On the one hand, it significantly accelerates deployments, since nobody wants to wait for hours while resources are created sequentially. On the other hand, this very optimization causes problems in large infrastructures.

    Let us look at a typical scenario: a Terraform configuration for a multi-tier application with web, app, and database layers. Each layer consists of several instances distributed across different availability domains:


    # A seemingly harmless root module, which will turn into an API bomb
    module "web_tier" {
      source = "./modules/compute-cluster"
    
      instance_count = 50
      instance_config = {
        shape  = "VM.Standard.E4.Flex"
        ocpus  = 2
        memory_in_gbs = 16
      }
    
      availability_domains = data.oci_identity_availability_domains.ads.availability_domains
      subnet_ids           = module.network.web_subnet_ids
    
      load_balancer_backend_sets = {
        for idx, val in range(50) :
        "web-${idx}" => module.load_balancer.backend_set_name
      }
    }
    
    module "app_tier" {
      source = "./modules/compute-cluster"
    
      instance_count = 30
      # ... similar configuration
    }
    
    module "monitoring" {
      source = "./modules/monitoring"
    
      # Create monitoring alarms for each instance
      monitored_instances = concat(
        module.web_tier.instance_ids,
        module.app_tier.instance_ids
      )
    
      alarms_per_instance = 5  # CPU, Memory, Disk, Network, Application
    }

    This harmless-looking configuration generates the following when executed:

    • 50 API calls for web tier instances
    • 30 API calls for app tier instances
    • 50 API calls for load balancer backend registrations
    • 400 API calls for monitoring alarms, 80 instances × 5 alarms

    That is 530 API calls that Terraform tries to execute as much in parallel as possible. With a limit of 10 write operations per second, this would take just under a minute in the best case - but only if everything were perfectly serialized.

    In practice, however, throttling leads to retry loops, exponential backoffs, and in the worst case, to timeouts and an aborted run with partially created resources that then have to be manually cleaned up.

    Terraform -target: The wrong solution

    In desperation, many teams turn to terraform apply -target to create resources selectively. "We will just deploy the networks first, then the compute instances, then the monitoring" - that is the plan. However, this is a dangerous approach that creates more problems than it solves.

    The -target flag was designed for surgical interventions in emergencies, not for regular deployments.

    It bypasses Terraform's dependency management and can lead to inconsistent states.

    Even more problematic: it does not scale. With hundreds of resources, you would either have to target each resource type individually, which explodes the complexity, or write complex scripts that call Terraform multiple times with different targets.

    An anti-pattern I have seen in practice:


    #!/bin/bash
    # DON'T TRY THIS AT HOME - THIS IS AN ANTIPATTERN!
    # Note: This script requires GNU grep.
    
    # Phase 1: Network
    terraform apply -target=module.network -auto-approve
    
    # Phase 2: Compute (in Batches)
    for i in {0..4}; do
      start=$((i*10))
      end=$((start+9))
      for j in $(seq "$start" "$end"); do
        terraform apply -target="module.web_tier.oci_core_instance.this[${j}]" -auto-approve
      done
      sleep 30  # "Cooldown" period
    done
    
    # Phase 3: Monitoring
    terraform apply -target=module.monitoring -auto-approve

    This script may work, but it is fragile, hard to maintain, and error-prone:

    1. It treats the symptoms, not the cause.
    2. You also lose Terraform's idempotence - if the script fails halfway through, it is unclear which resources have already been created.

    The right solution: Controlling parallelism

    The most elegant solution to API limit problems is to control Terraform's parallelization. Terraform provides the -parallelism flag for this purpose:


     terraform apply -parallelism=5
    

    This reduces the number of simultaneous operations from 10 to 5. For particularly sensitive environments, this value could be reduced even further:


    terraform apply -parallelism=1 
    

    Now all resources are created fully in sequential order. Yes, this now takes a long time, and is therefore only an academic example.

    And in general, this is only the tip of the iceberg. For production environments, you may need a more sophisticated strategy.

    Here is an almost paranoid example of a wrapper script that dynamically adjusts parallelism based on the number of resources to be created:


    #!/bin/bash
    # Intelligent control of parallelism based on the number of planned operations
    # Requires: jq, GNU bash
    
    set -euo pipefail
    
    # Create a fresh plan and capture exit code (0=no changes, 2=changes, 1=error)
    create_plan() {
      if terraform plan -out=tfplan -detailed-exitcode; then
        return 0
      fi
      local ec=$?
      if [ "$ec" -eq 2 ]; then
        return 0
      fi
      echo "Plan failed." >&2
      exit "$ec"
    }
    
    # Retrieve the number of operations from the JSON plan
    get_resource_count() {
      # JSON output, no colors, parse via jq
      local count
      count=$(terraform show -no-color -json tfplan 2>/dev/null \
        | jq '[.resource_changes[] 
                | select(.change.actions[] != "no-op")] 
               | length')
      echo "${count:-0}"
    }
    
    # Compute provider-aware parallelism (heuristic)
    calculate_parallelism() {
      local resource_count=$1
      local cloud_provider=${CLOUD_PROVIDER:-oci} # Set default value "oci" if unset
      cloud_provider="${cloud_provider,,}"  # Allow Case-insensitive value
    
      case "$cloud_provider" in
        oci)
          if   [ "$resource_count" -lt 20  ]; then echo 10
          elif [ "$resource_count" -lt 50  ]; then echo 5
          elif [ "$resource_count" -lt 100 ]; then echo 3
          else                                   echo 1
          fi
          ;;
        aws)
          if   [ "$resource_count" -lt 50  ]; then echo 10
          elif [ "$resource_count" -lt 100 ]; then echo 5
          else                                   echo 2
          fi
          ;;
        *)
          echo 5
          ;;
      esac
    }
    
    echo "Analyzing Terraform plan..."
    create_plan
    resource_count=$(get_resource_count)
    echo "Planned resource operations: ${resource_count}"
    
    if [ "$resource_count" -eq 0 ]; then
      echo "No changes necessary."
      rm -f tfplan
      exit 0
    fi
    
    parallelism=$(calculate_parallelism "$resource_count")
    echo "Using parallelism: ${parallelism}"
    
    # Execute apply against the saved plan with computed parallelism
    terraform apply -parallelism="${parallelism}" tfplan
    
    # Clean up plan file
    rm -f tfplan

    Provider-level throttling and retry mechanisms

    For self-developed OCI tools using the official OCI SDK, you can control the behavior via environment variables. The SDK supports the following:


    export OCI_SDK_DEFAULT_RETRY_ENABLED=true
    export OCI_SDK_DEFAULT_RETRY_MAX_ATTEMPTS=5
    export OCI_SDK_DEFAULT_RETRY_MAX_WAIT_TIME=30
    

    These settings are certainly no substitute for good architecture design. But they help to cushion temporary throttling spikes.

    Retry options in the OCI Terraform Provider

    The OCI Terraform Provider supports two configuration options in the provider block that control the behavior of automatic retries for certain API errors:

    disable_auto_retries (boolean): If set to true, all automatic retries of the provider are disabled, regardless of the value of retry_duration_seconds. Default value: false.

    retry_duration_seconds (integer): Defines the minimum time window in seconds in which the provider attempts retries for certain HTTP errors. This option only applies to HTTP status 429 and 500. The specified period is interpreted as a minimum value; due to the quadratic backoff with full jitter used, the actual duration can be longer. If disable_auto_retries is set to true, this value is ignored. Default value: 600 seconds.

    Default behavior: Out of the box (disable_auto_retries = false, retry_duration_seconds = 600), the provider automatically retries on HTTP 429 and HTTP 500 responses. Other HTTP errors such as 400, 401, 403, 404, or 409 are not retried by this mechanism.

    Example configuration:


    provider "oci" {
      # Automatic retries are enabled by default
      disable_auto_retries   = false
      retry_duration_seconds = 600  # Minimum window for retries (in seconds)
    }

     

    Retry configuration in the AWS Terraform Provider

    With AWS, this topic varies greatly depending on the service. The global configuration in the provider limits the number of retries; the mode of the backoff strategy is determined by the AWS SDK. In practice:


     provider "aws" {
      region = var.region
      # Maximum number of retry attempts if an API call fails
      max_retries = 5
      # Retry strategy:
      # "adaptive": adjusts dynamically to latency and error rates (good for throttling)
      # "standard": deterministic exponential backoff
      retry_mode = "adaptive"
    }

    Architecture pattern: Resource batching

    An advanced technique to avoid API limits is resource batching. This involves grouping resources into logical units that are provisioned sequentially. In the following example, we deliberately use the time provider and its time_sleep resource type to enforce an artificial wait time between batches. This ensures that the next batch only begins after the delay period has expired.


    terraform {
      required_version = ">= 1.5.0"
    
      required_providers {
        oci  = { source = "oracle/oci" }
        time = { source = "hashicorp/time" }
      }
    }
    
    variable "batch_size" {
      type        = number
      default     = 10
      description = "Number of instances per batch"
    }
    
    variable "batch_delay_seconds" {
      type        = number
      default     = 30
      description = "Wait time between batches (seconds)"
    }
    
    variable "total_instances" {
      type        = number
      description = "Total number of instances to create"
    }
    
    variable "name_prefix" {
      type        = string
      description = "Prefix for instance display_name"
    }
    
    variable "availability_domains" {
      type        = list(string)
      description = "ADs to spread instances across"
    }
    
    variable "subnet_ids" {
      type        = list(string)
      description = "Subnets to distribute NICs across"
    }
    
    variable "compartment_id" {
      type        = string
    }
    
    variable "instance_shape" {
      type        = string
    }
    
    variable "instance_ocpus" {
      type        = number
    }
    
    variable "instance_memory" {
      type        = number
    }
    
    variable "image_id" {
      type        = string
    }
    
    locals {
      batch_count = ceil(var.total_instances / var.batch_size)
    
      # Build batches with instance index metadata
      batches = {
        for batch_idx in range(local.batch_count) :
        "batch-${batch_idx}" => {
          instances = {
            for inst_idx in range(
              batch_idx * var.batch_size,
              min((batch_idx + 1) * var.batch_size, var.total_instances)
            ) :
            "instance-${inst_idx}" => {
              index = inst_idx
              batch = batch_idx
            }
          }
        }
      }
    }
    
    # Artificial delay resources for batch sequencing
    resource "time_sleep" "batch_delay" {
      for_each = { for b, _ in local.batches : b => b }
    
      create_duration = each.key == "batch-0" ? "0s" : "${var.batch_delay_seconds}s"
    
      # Ensure delay starts only after the previous batch instances are created
      depends_on = [
        oci_core_instance.batch
      ]
    }
    
    resource "oci_core_instance" "batch" {
      for_each = merge([
        for _, batch_data in local.batches : batch_data.instances
      ]...)
    
      display_name        = "${var.name_prefix}-${each.key}"
      availability_domain = var.availability_domains[each.value.index % length(var.availability_domains)]
      compartment_id      = var.compartment_id
      shape               = var.instance_shape
    
      shape_config {
        ocpus         = var.instance_ocpus
        memory_in_gbs = var.instance_memory
      }
    
      source_details {
        source_type = "image"
        source_id   = var.image_id
      }
    
      create_vnic_details {
        subnet_id = var.subnet_ids[each.value.index % length(var.subnet_ids)]
      }
    
      # Ensure this instance respects its batch delay
      depends_on = [
        time_sleep.batch_delay["batch-${each.value.batch}"]
      ]
    
      lifecycle {
        precondition {
          condition     = each.value.index < var.total_instances
          error_message = "Instance index ${each.value.index} exceeds total_instances ${var.total_instances}"
        }
      }
    }
    
    data "oci_core_instance" "health_check" {
      for_each    = oci_core_instance.batch
      instance_id = each.value.id
    }
    
    check "instance_health" {
      assert {
        condition = alltrue([
          for _, d in data.oci_core_instance.health_check :
          d.state == "RUNNING"
        ])
        error_message = "Not all instances reached RUNNING state"
      }
    }

    This pattern allows large deployments to be broken down into manageable chunks without having to resort to -target. The batches are created automatically with delays to respect API limits. The trick is to schedule expensive API operations over time without artificially breaking up the architecture or modules.

    This article should now be enough to provide first aid in the event of API limit problems and help you develop ideas for possible solutions. So far, however, we have been extremely reactive and even a little paranoid. In addition, batching is something that should ideally already be handled by good providers - at least this is something an end user subconsciously expects.
    In the next article, we will therefore look at the extent to which advanced methods such as API gateways, testing with ephemerals, and Sentinel policies can proactively diagnose such limits.