Android Map | Article Map
Terraform @ Scale - Part 5a: Understanding API Limits

Color logo   no background

Terraform @ Scale - Part 5a: Understanding API Limits

It is 2:30 p.m. on a regular Tuesday afternoon. The DevOps team of a Swiss financial services provider routinely launches its Terraform pipeline for the monthly disaster recovery testing. 300 virtual machines, 150 load balancer backends, 500 DNS entries, and countless network rules are to be provisioned in the backup region.
After 5 minutes, the pipeline fails. HTTP 429: Too Many Requests.
The next 3 hours are spent by the team manually cleaning up half-provisioned resources, while management nervously watches the clock.
The DR test has failed before it even began.

What happened? The team had fallen into one of the most insidious traps of cloud automation: API rate limits. While Terraform was busily trying to create hundreds of resources in parallel, the cloud provider had pulled the emergency brake after the first 100 requests per minute. A problem that had never occurred with 10 VMs became an insurmountable wall with 300 VMs.

The Invisible Wall: Understanding API limits

Every major cloud provider implements API rate limits to ensure the stability of its services. These limits are not an arbitrary restriction, but a necessity. Without them, a single faulty Terraform run could bring down the API endpoints for all customers of a provider.

Oracle Cloud Infrastructure implements API rate limiting, with IAM returning a 429 error code when limits are exceeded. The specific limits vary depending on the service and are not documented globally, but implemented per service. OCI sets service limits for each tenancy, which are agreed upon with the Oracle sales representative at purchase or automatically applied as standard or trial limits.

AWS implements a token bucket system for EC2 API throttling. For RunInstances, the resource token bucket size is 1000 tokens with a refill rate of two tokens per second. This means you can immediately launch 1000 instances and then up to two instances per second. API actions are grouped into categories, with Non-mutating Actions (Describe*, List*, Search*, Get*) typically having the highest API throttling limits.
[Note: Non-mutating Actions are API operations that do not cause any changes to the cloud state but only read data. They are typically rate-limited higher by AWS (and other providers) than write actions, because they do not create, modify, or delete resources and are therefore less critical for the stability of backend systems.]

Terraform itself exacerbates this problem through its parallelization. By default, Terraform attempts to create or modify up to 10 resources at the same time. For complex dependency graphs and many independent resources, this can quickly lead to dozens of simultaneous API calls. Multiplied by the data sources we discussed in previous articles in this series, this creates a so-called “perfect storm” of API requests.

The parallelization paradox

Terraform's default parallelization is a double-edged sword. On the one hand, it significantly accelerates deployments, since nobody wants to wait for hours while resources are created sequentially. On the other hand, this very optimization causes problems in large infrastructures.

Let us look at a typical scenario: a Terraform configuration for a multi-tier application with web, app, and database layers. Each layer consists of several instances distributed across different availability domains:


# A seemingly harmless root module, which will turn into an API bomb
module "web_tier" {
  source = "./modules/compute-cluster"

  instance_count = 50
  instance_config = {
    shape  = "VM.Standard.E4.Flex"
    ocpus  = 2
    memory_in_gbs = 16
  }

  availability_domains = data.oci_identity_availability_domains.ads.availability_domains
  subnet_ids           = module.network.web_subnet_ids

  load_balancer_backend_sets = {
    for idx, val in range(50) :
    "web-${idx}" => module.load_balancer.backend_set_name
  }
}

module "app_tier" {
  source = "./modules/compute-cluster"

  instance_count = 30
  # ... similar configuration
}

module "monitoring" {
  source = "./modules/monitoring"

  # Create monitoring alarms for each instance
  monitored_instances = concat(
    module.web_tier.instance_ids,
    module.app_tier.instance_ids
  )

  alarms_per_instance = 5  # CPU, Memory, Disk, Network, Application
}

This harmless-looking configuration generates the following when executed:

  • 50 API calls for web tier instances
  • 30 API calls for app tier instances
  • 50 API calls for load balancer backend registrations
  • 400 API calls for monitoring alarms, 80 instances × 5 alarms

That is 530 API calls that Terraform tries to execute as much in parallel as possible. With a limit of 10 write operations per second, this would take just under a minute in the best case - but only if everything were perfectly serialized.

In practice, however, throttling leads to retry loops, exponential backoffs, and in the worst case, to timeouts and an aborted run with partially created resources that then have to be manually cleaned up.

Terraform -target: The wrong solution

In desperation, many teams turn to terraform apply -target to create resources selectively. "We will just deploy the networks first, then the compute instances, then the monitoring" - that is the plan. However, this is a dangerous approach that creates more problems than it solves.

The -target flag was designed for surgical interventions in emergencies, not for regular deployments.

It bypasses Terraform's dependency management and can lead to inconsistent states.

Even more problematic: it does not scale. With hundreds of resources, you would either have to target each resource type individually, which explodes the complexity, or write complex scripts that call Terraform multiple times with different targets.

An anti-pattern I have seen in practice:


#!/bin/bash
# DON'T TRY THIS AT HOME - THIS IS AN ANTIPATTERN!
# Note: This script requires GNU grep.

# Phase 1: Network
terraform apply -target=module.network -auto-approve

# Phase 2: Compute (in Batches)
for i in {0..4}; do
  start=$((i*10))
  end=$((start+9))
  for j in $(seq "$start" "$end"); do
    terraform apply -target="module.web_tier.oci_core_instance.this[${j}]" -auto-approve
  done
  sleep 30  # "Cooldown" period
done

# Phase 3: Monitoring
terraform apply -target=module.monitoring -auto-approve

This script may work, but it is fragile, hard to maintain, and error-prone:

  1. It treats the symptoms, not the cause.
  2. You also lose Terraform's idempotence - if the script fails halfway through, it is unclear which resources have already been created.

The right solution: Controlling parallelism

The most elegant solution to API limit problems is to control Terraform's parallelization. Terraform provides the -parallelism flag for this purpose:


 terraform apply -parallelism=5

This reduces the number of simultaneous operations from 10 to 5. For particularly sensitive environments, this value could be reduced even further:


terraform apply -parallelism=1 

Now all resources are created fully in sequential order. Yes, this now takes a long time, and is therefore only an academic example.

And in general, this is only the tip of the iceberg. For production environments, you may need a more sophisticated strategy.

Here is an almost paranoid example of a wrapper script that dynamically adjusts parallelism based on the number of resources to be created:


#!/bin/bash
# Intelligent control of parallelism based on the number of planned operations
# Requires: jq, GNU bash

set -euo pipefail

# Create a fresh plan and capture exit code (0=no changes, 2=changes, 1=error)
create_plan() {
  if terraform plan -out=tfplan -detailed-exitcode; then
    return 0
  fi
  local ec=$?
  if [ "$ec" -eq 2 ]; then
    return 0
  fi
  echo "Plan failed." >&2
  exit "$ec"
}

# Retrieve the number of operations from the JSON plan
get_resource_count() {
  # JSON output, no colors, parse via jq
  local count
  count=$(terraform show -no-color -json tfplan 2>/dev/null \
    | jq '[.resource_changes[] 
            | select(.change.actions[] != "no-op")] 
           | length')
  echo "${count:-0}"
}

# Compute provider-aware parallelism (heuristic)
calculate_parallelism() {
  local resource_count=$1
  local cloud_provider=${CLOUD_PROVIDER:-oci} # Set default value "oci" if unset
  cloud_provider="${cloud_provider,,}"  # Allow Case-insensitive value

  case "$cloud_provider" in
    oci)
      if   [ "$resource_count" -lt 20  ]; then echo 10
      elif [ "$resource_count" -lt 50  ]; then echo 5
      elif [ "$resource_count" -lt 100 ]; then echo 3
      else                                   echo 1
      fi
      ;;
    aws)
      if   [ "$resource_count" -lt 50  ]; then echo 10
      elif [ "$resource_count" -lt 100 ]; then echo 5
      else                                   echo 2
      fi
      ;;
    *)
      echo 5
      ;;
  esac
}

echo "Analyzing Terraform plan..."
create_plan
resource_count=$(get_resource_count)
echo "Planned resource operations: ${resource_count}"

if [ "$resource_count" -eq 0 ]; then
  echo "No changes necessary."
  rm -f tfplan
  exit 0
fi

parallelism=$(calculate_parallelism "$resource_count")
echo "Using parallelism: ${parallelism}"

# Execute apply against the saved plan with computed parallelism
terraform apply -parallelism="${parallelism}" tfplan

# Clean up plan file
rm -f tfplan

Provider-level throttling and retry mechanisms

For self-developed OCI tools using the official OCI SDK, you can control the behavior via environment variables. The SDK supports the following:


export OCI_SDK_DEFAULT_RETRY_ENABLED=true
export OCI_SDK_DEFAULT_RETRY_MAX_ATTEMPTS=5
export OCI_SDK_DEFAULT_RETRY_MAX_WAIT_TIME=30

These settings are certainly no substitute for good architecture design. But they help to cushion temporary throttling spikes.

Retry options in the OCI Terraform Provider

The OCI Terraform Provider supports two configuration options in the provider block that control the behavior of automatic retries for certain API errors:

disable_auto_retries (boolean): If set to true, all automatic retries of the provider are disabled, regardless of the value of retry_duration_seconds. Default value: false.

retry_duration_seconds (integer): Defines the minimum time window in seconds in which the provider attempts retries for certain HTTP errors. This option only applies to HTTP status 429 and 500. The specified period is interpreted as a minimum value; due to the quadratic backoff with full jitter used, the actual duration can be longer. If disable_auto_retries is set to true, this value is ignored. Default value: 600 seconds.

Default behavior: Out of the box (disable_auto_retries = false, retry_duration_seconds = 600), the provider automatically retries on HTTP 429 and HTTP 500 responses. Other HTTP errors such as 400, 401, 403, 404, or 409 are not retried by this mechanism.

Example configuration:


provider "oci" {
  # Automatic retries are enabled by default
  disable_auto_retries   = false
  retry_duration_seconds = 600  # Minimum window for retries (in seconds)
}

 

Retry configuration in the AWS Terraform Provider

With AWS, this topic varies greatly depending on the service. The global configuration in the provider limits the number of retries; the mode of the backoff strategy is determined by the AWS SDK. In practice:


 provider "aws" {
  region = var.region
  # Maximum number of retry attempts if an API call fails
  max_retries = 5
  # Retry strategy:
  # "adaptive": adjusts dynamically to latency and error rates (good for throttling)
  # "standard": deterministic exponential backoff
  retry_mode = "adaptive"
}

Architecture pattern: Resource batching

An advanced technique to avoid API limits is resource batching. This involves grouping resources into logical units that are provisioned sequentially. In the following example, we deliberately use the time provider and its time_sleep resource type to enforce an artificial wait time between batches. This ensures that the next batch only begins after the delay period has expired.


terraform {
  required_version = ">= 1.5.0"

  required_providers {
    oci  = { source = "oracle/oci" }
    time = { source = "hashicorp/time" }
  }
}

variable "batch_size" {
  type        = number
  default     = 10
  description = "Number of instances per batch"
}

variable "batch_delay_seconds" {
  type        = number
  default     = 30
  description = "Wait time between batches (seconds)"
}

variable "total_instances" {
  type        = number
  description = "Total number of instances to create"
}

variable "name_prefix" {
  type        = string
  description = "Prefix for instance display_name"
}

variable "availability_domains" {
  type        = list(string)
  description = "ADs to spread instances across"
}

variable "subnet_ids" {
  type        = list(string)
  description = "Subnets to distribute NICs across"
}

variable "compartment_id" {
  type        = string
}

variable "instance_shape" {
  type        = string
}

variable "instance_ocpus" {
  type        = number
}

variable "instance_memory" {
  type        = number
}

variable "image_id" {
  type        = string
}

locals {
  batch_count = ceil(var.total_instances / var.batch_size)

  # Build batches with instance index metadata
  batches = {
    for batch_idx in range(local.batch_count) :
    "batch-${batch_idx}" => {
      instances = {
        for inst_idx in range(
          batch_idx * var.batch_size,
          min((batch_idx + 1) * var.batch_size, var.total_instances)
        ) :
        "instance-${inst_idx}" => {
          index = inst_idx
          batch = batch_idx
        }
      }
    }
  }
}

# Artificial delay resources for batch sequencing
resource "time_sleep" "batch_delay" {
  for_each = { for b, _ in local.batches : b => b }

  create_duration = each.key == "batch-0" ? "0s" : "${var.batch_delay_seconds}s"

  # Ensure delay starts only after the previous batch instances are created
  depends_on = [
    oci_core_instance.batch
  ]
}

resource "oci_core_instance" "batch" {
  for_each = merge([
    for _, batch_data in local.batches : batch_data.instances
  ]...)

  display_name        = "${var.name_prefix}-${each.key}"
  availability_domain = var.availability_domains[each.value.index % length(var.availability_domains)]
  compartment_id      = var.compartment_id
  shape               = var.instance_shape

  shape_config {
    ocpus         = var.instance_ocpus
    memory_in_gbs = var.instance_memory
  }

  source_details {
    source_type = "image"
    source_id   = var.image_id
  }

  create_vnic_details {
    subnet_id = var.subnet_ids[each.value.index % length(var.subnet_ids)]
  }

  # Ensure this instance respects its batch delay
  depends_on = [
    time_sleep.batch_delay["batch-${each.value.batch}"]
  ]

  lifecycle {
    precondition {
      condition     = each.value.index < var.total_instances
      error_message = "Instance index ${each.value.index} exceeds total_instances ${var.total_instances}"
    }
  }
}

data "oci_core_instance" "health_check" {
  for_each    = oci_core_instance.batch
  instance_id = each.value.id
}

check "instance_health" {
  assert {
    condition = alltrue([
      for _, d in data.oci_core_instance.health_check :
      d.state == "RUNNING"
    ])
    error_message = "Not all instances reached RUNNING state"
  }
}

This pattern allows large deployments to be broken down into manageable chunks without having to resort to -target. The batches are created automatically with delays to respect API limits. The trick is to schedule expensive API operations over time without artificially breaking up the architecture or modules.

This article should now be enough to provide first aid in the event of API limit problems and help you develop ideas for possible solutions. So far, however, we have been extremely reactive and even a little paranoid. In addition, batching is something that should ideally already be handled by good providers - at least this is something an end user subconsciously expects.
In the next article, we will therefore look at the extent to which advanced methods such as API gateways, testing with ephemerals, and Sentinel policies can proactively diagnose such limits.