It is 2:30 p.m. on a regular Tuesday afternoon. The DevOps team of a Swiss financial services provider routinely launches its Terraform pipeline for the monthly disaster recovery testing. 300 virtual machines, 150 load balancer backends, 500 DNS entries, and countless network rules are to be provisioned in the backup region.
After 5 minutes, the pipeline fails. HTTP 429: Too Many Requests.
The next 3 hours are spent by the team manually cleaning up half-provisioned resources, while management nervously watches the clock.
The DR test has failed before it even began.
What happened? The team had fallen into one of the most insidious traps of cloud automation: API rate limits. While Terraform was busily trying to create hundreds of resources in parallel, the cloud provider had pulled the emergency brake after the first 100 requests per minute. A problem that had never occurred with 10 VMs became an insurmountable wall with 300 VMs.
The Invisible Wall: Understanding API limits
Every major cloud provider implements API rate limits to ensure the stability of its services. These limits are not an arbitrary restriction, but a necessity. Without them, a single faulty Terraform run could bring down the API endpoints for all customers of a provider.
Oracle Cloud Infrastructure implements API rate limiting, with IAM returning a 429 error code when limits are exceeded. The specific limits vary depending on the service and are not documented globally, but implemented per service. OCI sets service limits for each tenancy, which are agreed upon with the Oracle sales representative at purchase or automatically applied as standard or trial limits.
AWS implements a token bucket system for EC2 API throttling. For RunInstances, the resource token bucket size is 1000 tokens with a refill rate of two tokens per second. This means you can immediately launch 1000 instances and then up to two instances per second. API actions are grouped into categories, with Non-mutating Actions (Describe*, List*, Search*, Get*) typically having the highest API throttling limits.
[Note: Non-mutating Actions are API operations that do not cause any changes to the cloud state but only read data. They are typically rate-limited higher by AWS (and other providers) than write actions, because they do not create, modify, or delete resources and are therefore less critical for the stability of backend systems.]
Terraform itself exacerbates this problem through its parallelization. By default, Terraform attempts to create or modify up to 10 resources at the same time. For complex dependency graphs and many independent resources, this can quickly lead to dozens of simultaneous API calls. Multiplied by the data sources we discussed in previous articles in this series, this creates a so-called “perfect storm” of API requests.
The parallelization paradox
Terraform's default parallelization is a double-edged sword. On the one hand, it significantly accelerates deployments, since nobody wants to wait for hours while resources are created sequentially. On the other hand, this very optimization causes problems in large infrastructures.
Let us look at a typical scenario: a Terraform configuration for a multi-tier application with web, app, and database layers. Each layer consists of several instances distributed across different availability domains:
# A seemingly harmless root module, which will turn into an API bomb module "web_tier" { source = "./modules/compute-cluster" instance_count = 50 instance_config = { shape = "VM.Standard.E4.Flex" ocpus = 2 memory_in_gbs = 16 } availability_domains = data.oci_identity_availability_domains.ads.availability_domains subnet_ids = module.network.web_subnet_ids load_balancer_backend_sets = { for idx, val in range(50) : "web-${idx}" => module.load_balancer.backend_set_name } } module "app_tier" { source = "./modules/compute-cluster" instance_count = 30 # ... similar configuration } module "monitoring" { source = "./modules/monitoring" # Create monitoring alarms for each instance monitored_instances = concat( module.web_tier.instance_ids, module.app_tier.instance_ids ) alarms_per_instance = 5 # CPU, Memory, Disk, Network, Application }
This harmless-looking configuration generates the following when executed:
- 50 API calls for web tier instances
- 30 API calls for app tier instances
- 50 API calls for load balancer backend registrations
- 400 API calls for monitoring alarms, 80 instances × 5 alarms
That is 530 API calls that Terraform tries to execute as much in parallel as possible. With a limit of 10 write operations per second, this would take just under a minute in the best case - but only if everything were perfectly serialized.
In practice, however, throttling leads to retry loops, exponential backoffs, and in the worst case, to timeouts and an aborted run with partially created resources that then have to be manually cleaned up.
Terraform -target: The wrong solution
In desperation, many teams turn to terraform apply -target to create resources selectively. "We will just deploy the networks first, then the compute instances, then the monitoring" - that is the plan. However, this is a dangerous approach that creates more problems than it solves.
The -target flag was designed for surgical interventions in emergencies, not for regular deployments.
It bypasses Terraform's dependency management and can lead to inconsistent states.
Even more problematic: it does not scale. With hundreds of resources, you would either have to target each resource type individually, which explodes the complexity, or write complex scripts that call Terraform multiple times with different targets.
An anti-pattern I have seen in practice:
#!/bin/bash # DON'T TRY THIS AT HOME - THIS IS AN ANTIPATTERN! # Note: This script requires GNU grep. # Phase 1: Network terraform apply -target=module.network -auto-approve # Phase 2: Compute (in Batches) for i in {0..4}; do start=$((i*10)) end=$((start+9)) for j in $(seq "$start" "$end"); do terraform apply -target="module.web_tier.oci_core_instance.this[${j}]" -auto-approve done sleep 30 # "Cooldown" period done # Phase 3: Monitoring terraform apply -target=module.monitoring -auto-approve
This script may work, but it is fragile, hard to maintain, and error-prone:
- It treats the symptoms, not the cause.
- You also lose Terraform's idempotence - if the script fails halfway through, it is unclear which resources have already been created.
The right solution: Controlling parallelism
The most elegant solution to API limit problems is to control Terraform's parallelization. Terraform provides the -parallelism flag for this purpose:
terraform apply -parallelism=5
This reduces the number of simultaneous operations from 10 to 5. For particularly sensitive environments, this value could be reduced even further:
terraform apply -parallelism=1
Now all resources are created fully in sequential order. Yes, this now takes a long time, and is therefore only an academic example.
And in general, this is only the tip of the iceberg. For production environments, you may need a more sophisticated strategy.
Here is an almost paranoid example of a wrapper script that dynamically adjusts parallelism based on the number of resources to be created:
#!/bin/bash # Intelligent control of parallelism based on the number of planned operations # Requires: jq, GNU bash set -euo pipefail # Create a fresh plan and capture exit code (0=no changes, 2=changes, 1=error) create_plan() { if terraform plan -out=tfplan -detailed-exitcode; then return 0 fi local ec=$? if [ "$ec" -eq 2 ]; then return 0 fi echo "Plan failed." >&2 exit "$ec" } # Retrieve the number of operations from the JSON plan get_resource_count() { # JSON output, no colors, parse via jq local count count=$(terraform show -no-color -json tfplan 2>/dev/null \ | jq '[.resource_changes[] | select(.change.actions[] != "no-op")] | length') echo "${count:-0}" } # Compute provider-aware parallelism (heuristic) calculate_parallelism() { local resource_count=$1 local cloud_provider=${CLOUD_PROVIDER:-oci} # Set default value "oci" if unset cloud_provider="${cloud_provider,,}" # Allow Case-insensitive value case "$cloud_provider" in oci) if [ "$resource_count" -lt 20 ]; then echo 10 elif [ "$resource_count" -lt 50 ]; then echo 5 elif [ "$resource_count" -lt 100 ]; then echo 3 else echo 1 fi ;; aws) if [ "$resource_count" -lt 50 ]; then echo 10 elif [ "$resource_count" -lt 100 ]; then echo 5 else echo 2 fi ;; *) echo 5 ;; esac } echo "Analyzing Terraform plan..." create_plan resource_count=$(get_resource_count) echo "Planned resource operations: ${resource_count}" if [ "$resource_count" -eq 0 ]; then echo "No changes necessary." rm -f tfplan exit 0 fi parallelism=$(calculate_parallelism "$resource_count") echo "Using parallelism: ${parallelism}" # Execute apply against the saved plan with computed parallelism terraform apply -parallelism="${parallelism}" tfplan # Clean up plan file rm -f tfplan
Provider-level throttling and retry mechanisms
For self-developed OCI tools using the official OCI SDK, you can control the behavior via environment variables. The SDK supports the following:
export OCI_SDK_DEFAULT_RETRY_ENABLED=true export OCI_SDK_DEFAULT_RETRY_MAX_ATTEMPTS=5 export OCI_SDK_DEFAULT_RETRY_MAX_WAIT_TIME=30
These settings are certainly no substitute for good architecture design. But they help to cushion temporary throttling spikes.
Retry options in the OCI Terraform Provider
The OCI Terraform Provider supports two configuration options in the provider block that control the behavior of automatic retries for certain API errors:
disable_auto_retries (boolean): If set to true, all automatic retries of the provider are disabled, regardless of the value of retry_duration_seconds. Default value: false.
retry_duration_seconds (integer): Defines the minimum time window in seconds in which the provider attempts retries for certain HTTP errors. This option only applies to HTTP status 429 and 500. The specified period is interpreted as a minimum value; due to the quadratic backoff with full jitter used, the actual duration can be longer. If disable_auto_retries is set to true, this value is ignored. Default value: 600 seconds.
Default behavior: Out of the box (disable_auto_retries = false, retry_duration_seconds = 600), the provider automatically retries on HTTP 429 and HTTP 500 responses. Other HTTP errors such as 400, 401, 403, 404, or 409 are not retried by this mechanism.
Example configuration:
provider "oci" { # Automatic retries are enabled by default disable_auto_retries = false retry_duration_seconds = 600 # Minimum window for retries (in seconds) }
Retry configuration in the AWS Terraform Provider
With AWS, this topic varies greatly depending on the service. The global configuration in the provider limits the number of retries; the mode of the backoff strategy is determined by the AWS SDK. In practice:
provider "aws" { region = var.region # Maximum number of retry attempts if an API call fails max_retries = 5 # Retry strategy: # "adaptive": adjusts dynamically to latency and error rates (good for throttling) # "standard": deterministic exponential backoff retry_mode = "adaptive" }
Architecture pattern: Resource batching
An advanced technique to avoid API limits is resource batching. This involves grouping resources into logical units that are provisioned sequentially. In the following example, we deliberately use the time provider and its time_sleep resource type to enforce an artificial wait time between batches. This ensures that the next batch only begins after the delay period has expired.
terraform { required_version = ">= 1.5.0" required_providers { oci = { source = "oracle/oci" } time = { source = "hashicorp/time" } } } variable "batch_size" { type = number default = 10 description = "Number of instances per batch" } variable "batch_delay_seconds" { type = number default = 30 description = "Wait time between batches (seconds)" } variable "total_instances" { type = number description = "Total number of instances to create" } variable "name_prefix" { type = string description = "Prefix for instance display_name" } variable "availability_domains" { type = list(string) description = "ADs to spread instances across" } variable "subnet_ids" { type = list(string) description = "Subnets to distribute NICs across" } variable "compartment_id" { type = string } variable "instance_shape" { type = string } variable "instance_ocpus" { type = number } variable "instance_memory" { type = number } variable "image_id" { type = string } locals { batch_count = ceil(var.total_instances / var.batch_size) # Build batches with instance index metadata batches = { for batch_idx in range(local.batch_count) : "batch-${batch_idx}" => { instances = { for inst_idx in range( batch_idx * var.batch_size, min((batch_idx + 1) * var.batch_size, var.total_instances) ) : "instance-${inst_idx}" => { index = inst_idx batch = batch_idx } } } } } # Artificial delay resources for batch sequencing resource "time_sleep" "batch_delay" { for_each = { for b, _ in local.batches : b => b } create_duration = each.key == "batch-0" ? "0s" : "${var.batch_delay_seconds}s" # Ensure delay starts only after the previous batch instances are created depends_on = [ oci_core_instance.batch ] } resource "oci_core_instance" "batch" { for_each = merge([ for _, batch_data in local.batches : batch_data.instances ]...) display_name = "${var.name_prefix}-${each.key}" availability_domain = var.availability_domains[each.value.index % length(var.availability_domains)] compartment_id = var.compartment_id shape = var.instance_shape shape_config { ocpus = var.instance_ocpus memory_in_gbs = var.instance_memory } source_details { source_type = "image" source_id = var.image_id } create_vnic_details { subnet_id = var.subnet_ids[each.value.index % length(var.subnet_ids)] } # Ensure this instance respects its batch delay depends_on = [ time_sleep.batch_delay["batch-${each.value.batch}"] ] lifecycle { precondition { condition = each.value.index < var.total_instances error_message = "Instance index ${each.value.index} exceeds total_instances ${var.total_instances}" } } } data "oci_core_instance" "health_check" { for_each = oci_core_instance.batch instance_id = each.value.id } check "instance_health" { assert { condition = alltrue([ for _, d in data.oci_core_instance.health_check : d.state == "RUNNING" ]) error_message = "Not all instances reached RUNNING state" } }
This pattern allows large deployments to be broken down into manageable chunks without having to resort to -target. The batches are created automatically with delays to respect API limits. The trick is to schedule expensive API operations over time without artificially breaking up the architecture or modules.
This article should now be enough to provide first aid in the event of API limit problems and help you develop ideas for possible solutions. So far, however, we have been extremely reactive and even a little paranoid. In addition, batching is something that should ideally already be handled by good providers - at least this is something an end user subconsciously expects.
In the next article, we will therefore look at the extent to which advanced methods such as API gateways, testing with ephemerals, and Sentinel policies can proactively diagnose such limits.