In the previous article 5a we saw how quickly large Terraform-rollouts hit API limits, for example when DR tests create hundreds of resources in parallel and 429 errors trigger retries like an avalanche. This continuation now picks up at that point and shows how you can use the API Gateway of Oracle Cloud Infrastructure and Amazon API Gateway to deliberately manage limits, ensure clean observability, and make them operationally robust via "Policy as Code".
API Gateway: The Ultima Ratio?
API gateways help make API limits predictable. Used correctly, they bundle API calls, enforce quotas and throttling, deliver consistent observability material, and create a central place for operations and governance.
For us, one aspect is especially relevant, a gateway does not just shift the rate limit problem, it enables active control of it per team, per deployment, and per route.
In Oracle Cloud Infrastructure you establish technical guardrails with usage plans and entitlements. These apply directly to API Gateway deployments, for example a hard rate per second as well as quotas per minute or per month. For enforcement and transparency, service specific metrics like HttpResponses along with the dimensions deploymentId and httpStatusCode are available, which can be alarmed cleanly. (Oracle Documentation).
The service log categories access and execution are the designated channels of the service, they are assigned directly to the API deployment and are the first choice compared to legacy bucket log archiving. (Oracle Documentation)
Here is an example for OCI (an example for AWS follows further below):
# Terraform >= 1.10, OCI Provider 7.14.0 terraform { required_version = ">= 1.10" required_providers { oci = { source = "oracle/oci", version >= "7.14.0" } } } provider "oci" { region = var.region } variable "region" { type = string description = "OCI region, e.g., eu-frankfurt-1" validation { condition = can(regex("^[a-z]+-[a-z0-9]+-[0-9]+$", var.region)) error_message = "Region must match a pattern like 'eu-frankfurt-1'." } } variable "compartment_id" { type = string description = "Compartment OCID used for gateway, logs, and alarms" } # Optional: Many organizations manage the API deployment separately. # We intentionally reference it via a variable to keep the example focused. variable "api_deployment_id" { type = string description = "OCID of the API Gateway deployment" validation { condition = can(regex("^ocid1\\..+", var.api_deployment_id)) error_message = "api_deployment_id must be a valid OCID." } } # Enable service logs for 'access' and 'execution' resource "oci_logging_log_group" "apigw" { compartment_id = var.compartment_id display_name = "apigw-logs" } resource "oci_logging_log" "apigw_access" { log_group_id = oci_logging_log_group.apigw.id display_name = "apigateway-access" log_type = "SERVICE" is_enabled = true configuration { source { category = "access" resource = var.api_deployment_id service = "apigateway" } } } resource "oci_logging_log" "apigw_execution" { log_group_id = oci_logging_log_group.apigw.id display_name = "apigateway-execution" log_type = "SERVICE" is_enabled = true configuration { source { category = "execution" resource = var.api_deployment_id service = "apigateway" } } } # Usage plan with rate limit & minute quota resource "oci_apigateway_usage_plan" "team_plan" { compartment_id = var.compartment_id display_name = "team-standard-plan" entitlements { name = "default" description = "Standard quota for CI runs" rate_limit { unit = "SECOND" value = 50 } quota { unit = "MINUTE" value = 2000 reset_policy = "CALENDAR" operation_on_breach = "REJECT" } targets { deployment_id = var.api_deployment_id } } lifecycle { prevent_destroy = true } }
In Amazon API Gateway you combine three levers: stage and method throttling, usage plans with API keys, and rate-based rules in AWS WAF for IP aggregations. The CloudWatch metrics 4XXError and 5XXError provide a robust early warning system at the stage level.
Important: AWS WAFv2 can currently only be associated with REST API stages, not with HTTP APIs. (AWS Documentation, Terraform Registry)
# Amazon API Gateway (REST) – stage throttling, usage plan, WAF terraform { required_version = ">= 1.10" required_providers { aws = { source = "hashicorp/aws", version = ">= 5.0" } } } provider "aws" { region = var.aws_region } data "aws_region" "current" {} resource "aws_api_gateway_rest_api" "tf_api" { name = "terraform-at-scale" } resource "aws_api_gateway_resource" "status" { rest_api_id = aws_api_gateway_rest_api.tf_api.id parent_id = aws_api_gateway_rest_api.tf_api.root_resource_id path_part = "status" } resource "aws_api_gateway_method" "get_status" { rest_api_id = aws_api_gateway_rest_api.tf_api.id resource_id = aws_api_gateway_resource.status.id http_method = "GET" authorization = "NONE" } resource "aws_api_gateway_integration" "get_status_mock" { rest_api_id = aws_api_gateway_rest_api.tf_api.id resource_id = aws_api_gateway_resource.status.id http_method = aws_api_gateway_method.get_status.http_method type = "MOCK" } resource "aws_api_gateway_deployment" "this" { rest_api_id = aws_api_gateway_rest_api.tf_api.id depends_on = [aws_api_gateway_integration.get_status_mock] } resource "aws_api_gateway_stage" "prod" { rest_api_id = aws_api_gateway_rest_api.tf_api.id deployment_id = aws_api_gateway_deployment.this.id stage_name = "prod" method_settings { resource_path = "/*" http_method = "*" metrics_enabled = true logging_level = "INFO" data_trace_enabled = false throttling_burst_limit = 100 throttling_rate_limit = 50 } } resource "aws_api_gateway_usage_plan" "plan" { name = "team-standard-plan" api_stages { api_id = aws_api_gateway_rest_api.tf_api.id stage = aws_api_gateway_stage.prod.stage_name } throttle_settings { burst_limit = 100 rate_limit = 50 } quota_settings { limit = 2000 period = "DAY" } } resource "aws_api_gateway_api_key" "ci_key" { name = "ci-runs" enabled = true # If 'value' is omitted, the service generates a secure key automatically. } resource "aws_api_gateway_usage_plan_key" "ci_key_bind" { key_id = aws_api_gateway_api_key.ci_key.id key_type = "API_KEY" usage_plan_id = aws_api_gateway_usage_plan.plan.id } # WAFv2 rate-based rule (REGIONAL) – only for REST API stages, not HTTP APIs resource "aws_wafv2_web_acl" "apigw_waf" { name = "apigw-waf" description = "Rate limit per source IP" scope = "REGIONAL" default_action { allow {} } rule { name = "rate-limit" priority = 1 action { block {} } statement { rate_based_statement { limit = 500 aggregate_key_type = "IP" } } visibility_config { cloudwatch_metrics_enabled = true metric_name = "apigw-waf" sampled_requests_enabled = true } } visibility_config { cloudwatch_metrics_enabled = true metric_name = "apigw-waf" sampled_requests_enabled = true } } resource "aws_wafv2_web_acl_association" "stage_assoc" { resource_arn = "arn:aws:apigateway:${data.aws_region.current.name}::/restapis/${aws_api_gateway_rest_api.tf_api.id}/stages/${aws_api_gateway_stage.prod.stage_name}" web_acl_arn = aws_wafv2_web_acl.apigw_waf.arn }
The stage-wide throttles, usage plans, and the WAF association are the key building blocks on the AWS side. CloudWatch provides, among other things, 4XXError metrics with the dimensions ApiName and Stage, which simplifies alarm triggering per stage. (AWS Documentation)
Testing and Validation with Terraform 1.10+
For fast, reproducible safety nets the native testing framework of Terraform is recommended. Mock providers encapsulate external dependencies, assertions check project specific rules such as maximum batch sizes or the behavior with too low limits.
Pro tip: Deliberately use concise, meaningful tests that harden your modules against misconfigurations. (HashiCorp Developer)
# tests/api_limits.tftest.hcl test { # optional name and timeouts can be added here } variables { # Global default variables for all runs in this test file max_batch_size = 50 } # Example: The plan must never try to create more than 50 new resources run "enforce_small_batches" { command = plan assert { condition = length([for rc in run.plan.resource_changes : rc if contains(rc.change.actions, "create")]) <= var.max_batch_size error_message = "Too many new resources in a single run – split the deployment into smaller batches." } } # Example: We expect a failure of a named precondition # (Preconditions are defined in your modules/resources) run "expect_precondition_failure" { command = plan expect_failures = [ precondition.api_limits_reasonable ] }
Practical notes:
- Assertions must be one-liners,
- expect_failures refers to named preconditions, not to general type errors.
- Ephemeral resources as of today (Terraform 1.12.0) are mainly useful for short-lived tokens and queries, but not as a universal replacement for mocks.
Monitoring + Alerting
Observability is the operational backbone of your API limit strategy.
On OCI you work most reliably directly with the service metrics of the API Gateway in combination with alarms of the monitoring platform. The dimensions deploymentId and httpStatusCode allow unambiguous filtering on 429 responses. The syntax in MQL is as follows, make sure to use the correct dimension names: (Oracle Documentation)
# OCI: Alarm on sustained HTTP 429 responses at deployment level resource "oci_ons_notification_topic" "ops" { compartment_id = var.compartment_id name = "ops-alerts" } resource "oci_ons_subscription" "ops_mail" { compartment_id = var.compartment_id topic_id = oci_ons_notification_topic.ops.id protocol = "EMAIL" endpoint = var.alert_email } resource "oci_monitoring_alarm" "apigw_429" { compartment_id = var.compartment_id metric_compartment_id = var.compartment_id display_name = "APIGW 429 bursts" is_enabled = true severity = "CRITICAL" destinations = [oci_ons_notification_topic.ops.id] message_format = "ONS_OPTIMIZED" pending_duration = "PT1M" # 1 minute resolution = "1m" # Correct dimensions according to API Gateway metrics: deploymentId, httpStatusCode query = <<-EOT HttpResponses[1m]{deploymentId="${var.api_deployment_id}", httpStatusCode="429"}.sum() > 5 EOT body = "Increased rate of HTTP 429 on API Gateway deployment: {{triggerValue}}/min" }
On AWS you define simple, robust alarms on 4XXError and 5XXError, supplemented with stage-wide throttling. In practice, alarms on 4XXError report early and broadly, WAF rate limits intercept occurring load peaks. (AWS Documentation)
# AWS: CloudWatch alarm on 4XX errors (stage-wide) resource "aws_cloudwatch_metric_alarm" "api_4xx_spike" { alarm_name = "apigw-prod-4xx-spike" comparison_operator = "GreaterThanThreshold" evaluation_periods = 1 period = 60 statistic = "Sum" threshold = 50 namespace = "AWS/ApiGateway" metric_name = "4XXError" dimensions = { ApiName = aws_api_gateway_rest_api.tf_api.name Stage = aws_api_gateway_stage.prod.stage_name } alarm_description = "Elevated client errors on 'prod' stage" }
Best Practices for Production Operations
Planning before Optimization
API gateways should fit into your architecture and operating model, not the other way around. The following practices have proven effective and build on article 5a of this series:
Staggered deployments: Separate foundation, platform, and application workloads so that individual runs remain small and quotas are not exceeded cumulatively.
Circuit breaker for IaC: Implement preconditions and checks that stop runs as soon as error rates rise. This way you do not wear down quotas of other teams.
Use time windows: Large rollouts should take place outside of peak load windows. CI schedules are operational tools, not cosmetics.
Provider timeouts and retries: Extend timeouts only in a targeted way instead of inflating them globally. For OCI resources you can set time limits per resource, for example during deployment:
resource "oci_apigateway_deployment" "depl" { # ... your configuration ... timeouts { create = "30m" update = "30m" delete = "30m" } }
Consciously control parallelism: In Terraform Enterprise set TFE_PARALLELISM per workspace instead of hard-wiring -parallelism flags in command lines everywhere. This prevents uncontrolled load peaks and is auditable.
Graceful degradation: Build optional paths that fall back to simpler operating modes in case of limits, instead of failing the entire run.
Documented quotas: Centrally maintained quotas per provider and service are mandatory. Only those who know quotas can deploy within limits.
Policy as Code with Sentinel
Policies protect platform quality. The following Sentinel policy limits the maximum number of new resources per run. It can be stored as a must-have guardrail in Terraform Enterprise and generates a meaningful warning instead of a hard fail at high volumes.
# sentinel/policies/api_limit_guard.sentinel import "tfplan/v2" as tfplan max\_resources\_per\_run = 50 resources\_to\_create = filter tfplan.resource\_changes as \_, rc { rc.change.actions contains "create" } main = rule { length(resources\_to\_create) <= max\_resources\_per\_run } warn\_high\_resource\_count = rule when length(resources\_to\_create) > 30 { print("WARNING: High resource volume detected.") print("Consider reducing parallelism or splitting the deployment.") true }
Integration with Terraform Enterprise
Many of the measures discussed in article 5a only unfold their full effect in the pipeline.
Terraform Enterprise makes it possible to codify parallelism, runtime settings, and gateway client configurations as an organizational standard. For customers within the EU with requirements regarding data sovereignty, TFE is (currently) the only option.
terraform { required_version = ">= 1.10" required_providers { tfe = { source = "hashicorp/tfe", version = ">= 0.65.0" } } } provider "tfe" { hostname = var.tfe_hostname # e.g., tfe.example.eu token = var.tfe_token } resource "tfe_workspace" "prod" { name = "production-infra" organization = var.tfe_org queue_all_runs = true # Consider 'false' if your maturity model requires manual gates terraform_version = "1.10.5" working_directory = "live/prod" } resource "tfe_variable_set" "api_limits" { name = "api-limit-controls" description = "Controls for parallelism and API client defaults" organization = var.tfe_org } # Control Terraform parallelism via TFE_PARALLELISM resource "tfe_variable" "parallelism" { key = "TFE_PARALLELISM" value = "5" category = "env" description = "Terraform parallelism for API limit control" variable_set_id = tfe_variable_set.api_limits.id } # Example of passing a client header for downstream API gateway policies resource "tfe_variable" "client_header" { key = "TF_VAR_apigw_client_header" value = "X-CI-Run: ${timestamp()}" category = "env" description = "Example header for downstream API gateway policies" variable_set_id = tfe_variable_set.api_limits.id }
The control via TFE_PARALLELISM is documented and proven in practice. Keep the values conservative and measure the impact on plan and apply duration.
Attention: Blindly increasing it often leads to worse performance due to more frequent 429/5xx responses.
Conclusion: Respect for the API
API limits are often perceived as an obstacle, but in reality they are something like an operational contract between your code and the platform. A Terraform-centric approach with clear rate limits, quotas, and alerting at the gateway level brings predictability to CI pipelines, protects cross-team resources, and significantly increases the success rate of your runs.
The measures discussed in article 5a remain the first lever. Additional API gateways deepen control, harmonize observability, and anchor your rules centrally.
Remember: Those who respect limits deploy more sustainably and more robustly.