Terraform @ Scale - Part 5b: API Gateways

Details: Read Time: 14 mins; Created: 26 August 2025

In the previous article 5a we saw how quickly large Terraform-rollouts hit API limits, for example when DR tests create hundreds of resources in parallel and 429 errors trigger retries like an avalanche. This continuation now picks up at that point and shows how you can use the API Gateway of Oracle Cloud Infrastructure and Amazon API Gateway to deliberately manage limits, ensure clean observability, and make them operationally robust via "Policy as Code".

API Gateway: The Ultima Ratio?

API gateways help make API limits predictable. Used correctly, they bundle API calls, enforce quotas and throttling, deliver consistent observability material, and create a central place for operations and governance.

For us, one aspect is especially relevant, a gateway does not just shift the rate limit problem, it enables active control of it per team, per deployment, and per route.

In Oracle Cloud Infrastructure you establish technical guardrails with usage plans and entitlements. These apply directly to API Gateway deployments, for example a hard rate per second as well as quotas per minute or per month. For enforcement and transparency, service specific metrics like HttpResponses along with the dimensions deploymentId and httpStatusCode are available, which can be alarmed cleanly. (Oracle Documentation).

The service log categories access and execution are the designated channels of the service, they are assigned directly to the API deployment and are the first choice compared to legacy bucket log archiving. (Oracle Documentation)

Here is an example for OCI (an example for AWS follows further below):

# Terraform >= 1.10, OCI Provider 7.14.0
terraform {
  required_version = ">= 1.10"
  required_providers {
    oci = { source = "oracle/oci", version >= "7.14.0" }
  }
}

provider "oci" {
  region = var.region
}

variable "region" {
  type        = string
  description = "OCI region, e.g., eu-frankfurt-1"
  validation {
    condition     = can(regex("^[a-z]+-[a-z0-9]+-[0-9]+$", var.region))
    error_message = "Region must match a pattern like 'eu-frankfurt-1'."
  }
}

variable "compartment_id" {
  type        = string
  description = "Compartment OCID used for gateway, logs, and alarms"
}

# Optional: Many organizations manage the API deployment separately.
# We intentionally reference it via a variable to keep the example focused.
variable "api_deployment_id" {
  type        = string
  description = "OCID of the API Gateway deployment"
  validation {
    condition     = can(regex("^ocid1\\..+", var.api_deployment_id))
    error_message = "api_deployment_id must be a valid OCID."
  }
}

# Enable service logs for 'access' and 'execution'
resource "oci_logging_log_group" "apigw" {
  compartment_id = var.compartment_id
  display_name   = "apigw-logs"
}

resource "oci_logging_log" "apigw_access" {
  log_group_id = oci_logging_log_group.apigw.id
  display_name = "apigateway-access"
  log_type     = "SERVICE"
  is_enabled   = true

  configuration {
    source {
      category = "access"
      resource = var.api_deployment_id
      service  = "apigateway"
    }
  }
}

resource "oci_logging_log" "apigw_execution" {
  log_group_id = oci_logging_log_group.apigw.id
  display_name = "apigateway-execution"
  log_type     = "SERVICE"
  is_enabled   = true

  configuration {
    source {
      category = "execution"
      resource = var.api_deployment_id
      service  = "apigateway"
    }
  }
}

# Usage plan with rate limit & minute quota
resource "oci_apigateway_usage_plan" "team_plan" {
  compartment_id = var.compartment_id
  display_name   = "team-standard-plan"

  entitlements {
    name        = "default"
    description = "Standard quota for CI runs"

    rate_limit {
      unit  = "SECOND"
      value = 50
    }

    quota {
      unit                 = "MINUTE"
      value                = 2000
      reset_policy         = "CALENDAR"
      operation_on_breach  = "REJECT"
    }

    targets {
      deployment_id = var.api_deployment_id
    }
  }

  lifecycle {
    prevent_destroy = true
  }
}

In Amazon API Gateway you combine three levers: stage and method throttling, usage plans with API keys, and rate-based rules in AWS WAF for IP aggregations. The CloudWatch metrics 4XXError and 5XXError provide a robust early warning system at the stage level.

Important: AWS WAFv2 can currently only be associated with REST API stages, not with HTTP APIs. (AWS Documentation, Terraform Registry)

# Amazon API Gateway (REST) – stage throttling, usage plan, WAF
terraform {
  required_version = ">= 1.10"
  required_providers {
    aws = { source = "hashicorp/aws", version = ">= 5.0" }
  }
}

provider "aws" {
  region = var.aws_region
}

data "aws_region" "current" {}

resource "aws_api_gateway_rest_api" "tf_api" {
  name = "terraform-at-scale"
}

resource "aws_api_gateway_resource" "status" {
  rest_api_id = aws_api_gateway_rest_api.tf_api.id
  parent_id   = aws_api_gateway_rest_api.tf_api.root_resource_id
  path_part   = "status"
}

resource "aws_api_gateway_method" "get_status" {
  rest_api_id   = aws_api_gateway_rest_api.tf_api.id
  resource_id   = aws_api_gateway_resource.status.id
  http_method   = "GET"
  authorization = "NONE"
}

resource "aws_api_gateway_integration" "get_status_mock" {
  rest_api_id = aws_api_gateway_rest_api.tf_api.id
  resource_id = aws_api_gateway_resource.status.id
  http_method = aws_api_gateway_method.get_status.http_method
  type        = "MOCK"
}

resource "aws_api_gateway_deployment" "this" {
  rest_api_id = aws_api_gateway_rest_api.tf_api.id
  depends_on  = [aws_api_gateway_integration.get_status_mock]
}

resource "aws_api_gateway_stage" "prod" {
  rest_api_id   = aws_api_gateway_rest_api.tf_api.id
  deployment_id = aws_api_gateway_deployment.this.id
  stage_name    = "prod"

  method_settings {
    resource_path           = "/*"
    http_method             = "*"
    metrics_enabled         = true
    logging_level           = "INFO"
    data_trace_enabled      = false
    throttling_burst_limit  = 100
    throttling_rate_limit   = 50
  }
}

resource "aws_api_gateway_usage_plan" "plan" {
  name = "team-standard-plan"

  api_stages {
    api_id = aws_api_gateway_rest_api.tf_api.id
    stage  = aws_api_gateway_stage.prod.stage_name
  }

  throttle_settings {
    burst_limit = 100
    rate_limit  = 50
  }

  quota_settings {
    limit  = 2000
    period = "DAY"
  }
}

resource "aws_api_gateway_api_key" "ci_key" {
  name    = "ci-runs"
  enabled = true
  # If 'value' is omitted, the service generates a secure key automatically.
}

resource "aws_api_gateway_usage_plan_key" "ci_key_bind" {
  key_id        = aws_api_gateway_api_key.ci_key.id
  key_type      = "API_KEY"
  usage_plan_id = aws_api_gateway_usage_plan.plan.id
}

# WAFv2 rate-based rule (REGIONAL) – only for REST API stages, not HTTP APIs
resource "aws_wafv2_web_acl" "apigw_waf" {
  name        = "apigw-waf"
  description = "Rate limit per source IP"
  scope       = "REGIONAL"

  default_action { allow {} }

  rule {
    name     = "rate-limit"
    priority = 1
    action { block {} }

    statement {
      rate_based_statement {
        limit              = 500
        aggregate_key_type = "IP"
      }
    }

    visibility_config {
      cloudwatch_metrics_enabled = true
      metric_name                = "apigw-waf"
      sampled_requests_enabled   = true
    }
  }

  visibility_config {
    cloudwatch_metrics_enabled = true
    metric_name                = "apigw-waf"
    sampled_requests_enabled   = true
  }
}

resource "aws_wafv2_web_acl_association" "stage_assoc" {
  resource_arn = "arn:aws:apigateway:${data.aws_region.current.name}::/restapis/${aws_api_gateway_rest_api.tf_api.id}/stages/${aws_api_gateway_stage.prod.stage_name}"
  web_acl_arn  = aws_wafv2_web_acl.apigw_waf.arn
}

The stage-wide throttles, usage plans, and the WAF association are the key building blocks on the AWS side. CloudWatch provides, among other things, 4XXError metrics with the dimensions ApiName and Stage, which simplifies alarm triggering per stage. (AWS Documentation)

Testing and Validation with Terraform 1.10+

For fast, reproducible safety nets the native testing framework of Terraform is recommended. Mock providers encapsulate external dependencies, assertions check project specific rules such as maximum batch sizes or the behavior with too low limits.

Pro tip: Deliberately use concise, meaningful tests that harden your modules against misconfigurations. (HashiCorp Developer)

# tests/api_limits.tftest.hcl

test {
  # optional name and timeouts can be added here
}

variables {
  # Global default variables for all runs in this test file
  max_batch_size = 50
}

# Example: The plan must never try to create more than 50 new resources
run "enforce_small_batches" {
  command = plan

  assert {
    condition = length([for rc in run.plan.resource_changes : rc if contains(rc.change.actions, "create")]) <= var.max_batch_size
    error_message = "Too many new resources in a single run – split the deployment into smaller batches."
  }
}

# Example: We expect a failure of a named precondition
# (Preconditions are defined in your modules/resources)
run "expect_precondition_failure" {
  command = plan
  expect_failures = [
    precondition.api_limits_reasonable
  ]
}

Practical notes:

Assertions must be one-liners,
expect_failures refers to named preconditions, not to general type errors.
Ephemeral resources as of today (Terraform 1.12.0) are mainly useful for short-lived tokens and queries, but not as a universal replacement for mocks.

Monitoring + Alerting

Observability is the operational backbone of your API limit strategy.

On OCI you work most reliably directly with the service metrics of the API Gateway in combination with alarms of the monitoring platform. The dimensions deploymentId and httpStatusCode allow unambiguous filtering on 429 responses. The syntax in MQL is as follows, make sure to use the correct dimension names: (Oracle Documentation)

# OCI: Alarm on sustained HTTP 429 responses at deployment level
resource "oci_ons_notification_topic" "ops" {
  compartment_id = var.compartment_id
  name           = "ops-alerts"
}

resource "oci_ons_subscription" "ops_mail" {
  compartment_id = var.compartment_id
  topic_id       = oci_ons_notification_topic.ops.id
  protocol       = "EMAIL"
  endpoint       = var.alert_email
}

resource "oci_monitoring_alarm" "apigw_429" {
  compartment_id        = var.compartment_id
  metric_compartment_id = var.compartment_id
  display_name          = "APIGW 429 bursts"
  is_enabled            = true
  severity              = "CRITICAL"
  destinations          = [oci_ons_notification_topic.ops.id]
  message_format        = "ONS_OPTIMIZED"
  pending_duration      = "PT1M"  # 1 minute
  resolution            = "1m"

  # Correct dimensions according to API Gateway metrics: deploymentId, httpStatusCode
  query = <<-EOT
    HttpResponses[1m]{deploymentId="${var.api_deployment_id}", httpStatusCode="429"}.sum() > 5
  EOT

  body = "Increased rate of HTTP 429 on API Gateway deployment: {{triggerValue}}/min"
}

On AWS you define simple, robust alarms on 4XXError and 5XXError, supplemented with stage-wide throttling. In practice, alarms on 4XXError report early and broadly, WAF rate limits intercept occurring load peaks. (AWS Documentation)

# AWS: CloudWatch alarm on 4XX errors (stage-wide)
resource "aws_cloudwatch_metric_alarm" "api_4xx_spike" {
  alarm_name          = "apigw-prod-4xx-spike"
  comparison_operator = "GreaterThanThreshold"
  evaluation_periods  = 1
  period              = 60
  statistic           = "Sum"
  threshold           = 50
  namespace           = "AWS/ApiGateway"
  metric_name         = "4XXError"

  dimensions = {
    ApiName = aws_api_gateway_rest_api.tf_api.name
    Stage   = aws_api_gateway_stage.prod.stage_name
  }

  alarm_description = "Elevated client errors on 'prod' stage"
}

Best Practices for Production Operations

Planning before Optimization

API gateways should fit into your architecture and operating model, not the other way around. The following practices have proven effective and build on article 5a of this series:

Staggered deployments: Separate foundation, platform, and application workloads so that individual runs remain small and quotas are not exceeded cumulatively.

Circuit breaker for IaC: Implement preconditions and checks that stop runs as soon as error rates rise. This way you do not wear down quotas of other teams.

Use time windows: Large rollouts should take place outside of peak load windows. CI schedules are operational tools, not cosmetics.

Provider timeouts and retries: Extend timeouts only in a targeted way instead of inflating them globally. For OCI resources you can set time limits per resource, for example during deployment:

resource "oci_apigateway_deployment" "depl" {
  # ... your configuration ...
  timeouts {
    create = "30m"
    update = "30m"
    delete = "30m"
  }
}

Consciously control parallelism: In Terraform Enterprise set TFE_PARALLELISM per workspace instead of hard-wiring -parallelism flags in command lines everywhere. This prevents uncontrolled load peaks and is auditable.

Graceful degradation: Build optional paths that fall back to simpler operating modes in case of limits, instead of failing the entire run.

Documented quotas: Centrally maintained quotas per provider and service are mandatory. Only those who know quotas can deploy within limits.

Policy as Code with Sentinel

Policies protect platform quality. The following Sentinel policy limits the maximum number of new resources per run. It can be stored as a must-have guardrail in Terraform Enterprise and generates a meaningful warning instead of a hard fail at high volumes.

# sentinel/policies/api_limit_guard.sentinel
import "tfplan/v2" as tfplan

max\_resources\_per\_run = 50

resources\_to\_create = filter tfplan.resource\_changes as \_, rc {
rc.change.actions contains "create"
}

main = rule {
length(resources\_to\_create) <= max\_resources\_per\_run
}

warn\_high\_resource\_count = rule when length(resources\_to\_create) > 30 {
print("WARNING: High resource volume detected.")
print("Consider reducing parallelism or splitting the deployment.")
true
}

Integration with Terraform Enterprise

Many of the measures discussed in article 5a only unfold their full effect in the pipeline.

Terraform Enterprise makes it possible to codify parallelism, runtime settings, and gateway client configurations as an organizational standard. For customers within the EU with requirements regarding data sovereignty, TFE is (currently) the only option.

terraform {
  required_version = ">= 1.10"
  required_providers {
    tfe = { source = "hashicorp/tfe", version = ">= 0.65.0" }
  }
}

provider "tfe" {
  hostname = var.tfe_hostname   # e.g., tfe.example.eu
  token    = var.tfe_token
}

resource "tfe_workspace" "prod" {
  name              = "production-infra"
  organization      = var.tfe_org
  queue_all_runs    = true    # Consider 'false' if your maturity model requires manual gates
  terraform_version = "1.10.5"
  working_directory = "live/prod"
}

resource "tfe_variable_set" "api_limits" {
  name         = "api-limit-controls"
  description  = "Controls for parallelism and API client defaults"
  organization = var.tfe_org
}

# Control Terraform parallelism via TFE_PARALLELISM
resource "tfe_variable" "parallelism" {
  key             = "TFE_PARALLELISM"
  value           = "5"
  category        = "env"
  description     = "Terraform parallelism for API limit control"
  variable_set_id = tfe_variable_set.api_limits.id
}

# Example of passing a client header for downstream API gateway policies
resource "tfe_variable" "client_header" {
  key             = "TF_VAR_apigw_client_header"
  value           = "X-CI-Run: ${timestamp()}"
  category        = "env"
  description     = "Example header for downstream API gateway policies"
  variable_set_id = tfe_variable_set.api_limits.id
}

The control via TFE_PARALLELISM is documented and proven in practice. Keep the values conservative and measure the impact on plan and apply duration.

Attention: Blindly increasing it often leads to worse performance due to more frequent 429/5xx responses.

Conclusion: Respect for the API

API limits are often perceived as an obstacle, but in reality they are something like an operational contract between your code and the platform. A Terraform-centric approach with clear rate limits, quotas, and alerting at the gateway level brings predictability to CI pipelines, protects cross-team resources, and significantly increases the success rate of your runs.

The measures discussed in article 5a remain the first lever. Additional API gateways deepen control, harmonize observability, and anchor your rules centrally.

Remember: Those who respect limits deploy more sustainably and more robustly.

Ralf Ramge

Founder, Cloud Architect & IT Consultant

Terraform @ Scale - Part 7: Module Versioning (Best Practices)

Terraform @ Scale - Part 6c: Module Dependencies for Advanced Users (and Masochists)

Terraform @ Scale - Part 6b: Practical handling of nested modules

Terraform @ Scale - Part 6a: Understanding and Managing Nested Modules

Terraform @ Scale - Part 5b: API Gateways

Terraform @ Scale - Part 5a: Understanding API Limits

Terraform @ Scale - Part 4b: Best Practices for Scaling Data Sources

Terraform @ Scale - Part 4a: Data Sources are Dangerous!

Terraform @ Scale - Part 3c: Monitoring and Alerting for Blast Radius Events

Terraform @ Scale - Part 5b: API Gateways

API Gateway: The Ultima Ratio?

Testing and Validation with Terraform 1.10+

Monitoring + Alerting

Best Practices for Production Operations

Planning before Optimization

Policy as Code with Sentinel

Integration with Terraform Enterprise

Conclusion: Respect for the API

Ralf Ramge

ICT.technology

Terraform @ Scale - Part 7: Module Versioning (Best Practices)

Terraform @ Scale - Part 6c: Module Dependencies for Advanced Users (and Masochists)

Terraform @ Scale - Part 6b: Practical handling of nested modules

Terraform @ Scale - Part 6a: Understanding and Managing Nested Modules

Terraform @ Scale - Part 5b: API Gateways

Terraform @ Scale - Part 5a: Understanding API Limits

The Certificate Bomb is Ticking: The 200-day Deadline Threatens Your Business!

Terraform @ Scale - Part 4b: Best Practices for Scaling Data Sources

Terraform @ Scale - Part 4a: Data Sources are Dangerous!

Terraform @ Scale - Part 3c: Monitoring and Alerting for Blast Radius Events

Terraform @ Scale - Part 5b: API Gateways

API Gateway: The Ultima Ratio?

Testing and Validation with Terraform 1.10+

Monitoring + Alerting

Best Practices for Production Operations

Planning before Optimization

Policy as Code with Sentinel

Integration with Terraform Enterprise

Conclusion: Respect for the API

Ralf Ramge

ICT.technology