Terraform's Data Sources are a popular way to dynamically populate variables with real-world values from the respective cloud environment. However, using them in dynamic infrastructures requires some foresight. A harmless data.oci_identity_availability_domains in a module, for example, is enough - and suddenly every terraform plan takes minutes instead of seconds. Because 100 module instances mean 100 API calls, and your cloud provider starts throttling. Welcome to the world of unintended API amplification through Data Sources.
In this article, I will show you why Data Sources in Terraform modules can pose a scaling problem.
The Hidden Scaling Issue
Scenario: The 10-second trap
You have written a clean Terraform module for VM instances. For each VM you need an Availability Domain, so you use a Data Source:
data "oci_identity_availability_domains" "ads" { compartment_id = var.tenancy_ocid } resource "oci_core_instance" "this" { for_each = var.instances != null ? var.instances : {} availability_domain = data.oci_identity_availability_domains.ads.availability_domains[0].name compartment_id = var.compartment_id shape = each.value.shape create_vnic_details { subnet_id = each.value.subnet_id } }
At first glance, everything seems correct here. The module works perfectly - for a single VM instance, even for a handful of them.
But then your customer scales up to 50 VMs:
module "app_servers" { for_each = var.server_configs != null ? var.server_configs : {} source = "./modules/compute-instance" instances = each.value.instances compartment_id = each.value.compartment_id }
The result: 50 identical API calls to oci_identity_availability_domains with every terraform plan and terraform apply.
What used to take seconds now takes minutes. With 100 or 200 instances, it becomes truly painful.
In many use cases, this may still be tolerable and manageable when provisioning new VM instances - but what happens when it involves existing resources with short life cycles? Imagine longer waiting times during the provisioning of cluster nodes to handle traffic spikes, load balancer backends, DNS records, or even resources on which other resources depend. This can lead to prolonged outages with corresponding external impact on your customers, which in the worst case may even be unpredictable.
Why Data Sources in Modules Are Problematic
Data Sources behave fundamentally differently from resources. They are executed during every terraform plan and terraform apply and cannot be cached. In a module, this means:
Problem 1: API Amplification
Each module instance performs its own Data Source queries, even when the data is identical.
For example, these two modules both call the same child module ./modules/server:
module "web_servers" { for_each = var.web_configs != null ? var.web_configs : {} source = "./modules/server" compartment_id = each.value.compartment_id } module "app_servers" { for_each = var.app_configs != null ? var.app_configs : {} source = "./modules/server" compartment_id = each.value.compartment_id }
Inside the module ./modules/server, a Data Source query is hidden - invisible to the author of the root module:
data "oci_identity_availability_domains" "available" { compartment_id = var.compartment_id } resource "oci_core_instance" "this" { for_each = var.instances != null ? var.instances : {} availability_domain = data.oci_identity_availability_domains.available.availability_domains[0].name compartment_id = var.compartment_id }
If, for example, 10 app_servers and 50 web_servers are now deployed, this results in 60 parallel (!) API queries. No cloud provider will allow that. Instead, your terraform apply will hit the API gateway's limits, and the deployment will take forever.
In the best case, the servers are deployed in smaller batches, for example in groups of 10 instances each. Then the deployment already takes at least six times as long.
And this doesn't just affect the server instances shown in the code example. It also impacts all other resources still included in the plan. The API limit applies to everything, not just to a specific resource type like oci_core_instance. Other resources that would normally be accessed quickly or even in parallel now have to wait their turn in the queue. In the worst case, this can even lead to race conditions or timeouts.
Problem 2: Performance Degradation
As the number of module instances increases, planning time grows linearly or even exponentially.
Take a look at this example, which reads the list of available OS images from OCI and filters for a specific release of the GPU-optimized variant of Oracle Linux 8:
data "oci_core_images" "compute_images" { compartment_id = var.compartment_id operating_system = "Oracle Linux" operating_system_version = "8" sort_by = "TIMECREATED" sort_order = "DESC" filter { name = "display_name" values = [".*GPU.*"] regex = true } } resource "oci_core_instance" "gpu_instances" { for_each = var.gpu_instances != null ? var.gpu_instances : {} source_details { source_id = data.oci_core_images.compute_images.images[0].id source_type = "image" } }
The list of available OS images is extremely long and is read out in full by the Terraform provider and filtered every time for each VM instance. So this not only causes unnecessary API calls with very large responses, it also puts Terraform itself under strain.
Problem 3: API Limits and Throttling
As already mentioned earlier in this article: Cloud providers impose API limits.
Too many Data Source calls therefore lead to:
- HTTP 429 (Too Many Requests) errors
- Exponential backoff and retry cycles
- Blocking of other Terraform operations
- Unstable CI/CD pipelines
Anti-Pattern: Non-Scaling Examples
Example 1: The Availability Domain Dilemma
Problem: This module performs two separate API calls during each instantiation – one for Availability Domains and one for Subnets. In an environment with 20 database modules, this results in 40 redundant API calls for identical information.
Why it doesn't scale: Each module queries the same global Availability Domains, even though they change very rarely. Additionally, the same Subnet filter is executed in every module instance, which becomes particularly costly in complex VPCs with many subnets.
Impact: With 50 database instances = 100 API calls plus 50x filtering for data that changes at most once per year.
data "oci_identity_availability_domains" "available" { compartment_id = var.compartment_id } data "oci_core_subnets" "database" { compartment_id = var.compartment_id vcn_id = var.vcn_id filter { name = "display_name" values = ["*database*"] } } resource "oci_database_db_system" "main" { for_each = var.db_systems != null ? var.db_systems : {} availability_domain = data.oci_identity_availability_domains.available.availability_domains[0].name subnet_id = data.oci_core_subnets.database.subnets[0].id compartment_id = var.compartment_id }
Example 2: The Image Lookup Problem
Problem: This implementation performs a costly image search with regex filtering during each module invocation. Sorting and local processing of image lists is resource-intensive both on OCI's side and locally in Terraform.
Why it doesn't scale: Image lookups are especially slow, since they transfer large amounts of data and apply complex filters. Every VM performs the same lookup, even though the result is the same for all instances. Local sorting with sort() amplifies the problem further.
Impact: With 100 VM instances = 100 expensive image API calls + 100 local filtering operations = several minutes of planning time for something that could be done in no time.
data "oci_core_images" "ol8_images" { compartment_id = var.tenancy_ocid operating_system = "Oracle Linux" operating_system_version = "8" shape = var.instance_shape filter { name = "display_name" values = ["Oracle-Linux-8.8-.*"] regex = true } } locals { latest_image_id = sort([for img in data.oci_core_images.ol8_images.images : img.id])[0] } resource "oci_core_instance" "this" { for_each = var.instances != null ? var.instances : {} source_details { source_id = local.latest_image_id source_type = "image" } }
Example 3: The Nested Data Source Problem
Problem: This module combines multiple scaling issues: cluster lookups, image lookups for worker nodes, and complex dependency chains between Data Sources. Each node pool creation triggers both Data Sources again.
Why it doesn't scale: Nested Data Sources create dependency chains that must be fully resolved for each module instance. Kubernetes-specific image lookups are especially slow because they use highly specific filters with regex patterns that must be processed server-side.
Impact: With 10 node pools = 20 API calls for cluster information + 10 expensive OKE image lookups = exponentially increasing planning times as the number of node pools grows.
data "oci_containerengine_clusters" "existing" { compartment_id = var.compartment_id filter { name = "name" values = [var.cluster_name] } } data "oci_core_images" "worker_images" { compartment_id = var.compartment_id operating_system = "Oracle Linux" filter { name = "display_name" values = ["Oracle-Linux-.*-OKE-.*"] regex = true } } resource "oci_containerengine_node_pool" "workers" { for_each = var.node_pools != null ? var.node_pools : {} cluster_id = data.oci_containerengine_clusters.existing.clusters[0].id node_config_details { size = each.value.size } }
Now we have shown you several examples of how not to use Data Sources. But that alone won’t help you much. In the next article, I will show you how to elegantly avoid this trap through smart architecture and variable injection.
Because the best Data Source is the one that is never executed - or only executed once.