Terraform @ Scale - Part 4a: Data Sources are Dangerous!

Details: Read Time: 9 mins; Created: 30 July 2025

Terraform's Data Sources are a popular way to dynamically populate variables with real-world values from the respective cloud environment. However, using them in dynamic infrastructures requires some foresight. A harmless data.oci_identity_availability_domains in a module, for example, is enough - and suddenly every terraform plan takes minutes instead of seconds. Because 100 module instances mean 100 API calls, and your cloud provider starts throttling. Welcome to the world of unintended API amplification through Data Sources.

In this article, I will show you why Data Sources in Terraform modules can pose a scaling problem.

The Hidden Scaling Issue

Terraform at scale 11 5

Scenario: The 10-second trap

You have written a clean Terraform module for VM instances. For each VM you need an Availability Domain, so you use a Data Source:

data "oci_identity_availability_domains" "ads" {
  compartment_id = var.tenancy_ocid
}

resource "oci_core_instance" "this" {
  for_each = var.instances != null ? var.instances : {}

  availability_domain = data.oci_identity_availability_domains.ads.availability_domains[0].name
  compartment_id      = var.compartment_id
  shape              = each.value.shape
  
  create_vnic_details {
    subnet_id = each.value.subnet_id
  }
}

At first glance, everything seems correct here. The module works perfectly - for a single VM instance, even for a handful of them.

But then your customer scales up to 50 VMs:

module "app_servers" {
  for_each = var.server_configs != null ? var.server_configs : {}
  
  source = "./modules/compute-instance"
  
  instances      = each.value.instances
  compartment_id = each.value.compartment_id
}

The result: 50 identical API calls to oci_identity_availability_domains with every terraform plan and terraform apply.

What used to take seconds now takes minutes. With 100 or 200 instances, it becomes truly painful.

In many use cases, this may still be tolerable and manageable when provisioning new VM instances - but what happens when it involves existing resources with short life cycles? Imagine longer waiting times during the provisioning of cluster nodes to handle traffic spikes, load balancer backends, DNS records, or even resources on which other resources depend. This can lead to prolonged outages with corresponding external impact on your customers, which in the worst case may even be unpredictable.

Why Data Sources in Modules Are Problematic

Data Sources behave fundamentally differently from resources. They are executed during every terraform plan and terraform apply and cannot be cached. In a module, this means:

Problem 1: API Amplification

Each module instance performs its own Data Source queries, even when the data is identical.

For example, these two modules both call the same child module ./modules/server:

module "web_servers" {
  for_each = var.web_configs != null ? var.web_configs : {}
  source   = "./modules/server"
  
  compartment_id = each.value.compartment_id
}

module "app_servers" {
  for_each = var.app_configs != null ? var.app_configs : {}
  source   = "./modules/server"
  
  compartment_id = each.value.compartment_id
}

Inside the module ./modules/server, a Data Source query is hidden - invisible to the author of the root module:

data "oci_identity_availability_domains" "available" {
  compartment_id = var.compartment_id
}

resource "oci_core_instance" "this" {
  for_each = var.instances != null ? var.instances : {}
  
  availability_domain = data.oci_identity_availability_domains.available.availability_domains[0].name
  compartment_id      = var.compartment_id
}

If, for example, 10 app_servers and 50 web_servers are now deployed, this results in 60 parallel (!) API queries. No cloud provider will allow that. Instead, your terraform apply will hit the API gateway's limits, and the deployment will take forever.

In the best case, the servers are deployed in smaller batches, for example in groups of 10 instances each. Then the deployment already takes at least six times as long.

And this doesn't just affect the server instances shown in the code example. It also impacts all other resources still included in the plan. The API limit applies to everything, not just to a specific resource type like oci_core_instance. Other resources that would normally be accessed quickly or even in parallel now have to wait their turn in the queue. In the worst case, this can even lead to race conditions or timeouts.

Problem 2: Performance Degradation

As the number of module instances increases, planning time grows linearly or even exponentially.

Take a look at this example, which reads the list of available OS images from OCI and filters for a specific release of the GPU-optimized variant of Oracle Linux 8:

 data "oci_core_images" "compute_images" {
  compartment_id           = var.compartment_id
  operating_system         = "Oracle Linux"
  operating_system_version = "8"
  sort_by                  = "TIMECREATED"
  sort_order              = "DESC"
  
  filter {
    name   = "display_name"
    values = [".*GPU.*"]
    regex  = true
  }
}

resource "oci_core_instance" "gpu_instances" {
  for_each = var.gpu_instances != null ? var.gpu_instances : {}
  
  source_details {
    source_id   = data.oci_core_images.compute_images.images[0].id
    source_type = "image"
  }
}

The list of available OS images is extremely long and is read out in full by the Terraform provider and filtered every time for each VM instance. So this not only causes unnecessary API calls with very large responses, it also puts Terraform itself under strain.

Problem 3: API Limits and Throttling

As already mentioned earlier in this article: Cloud providers impose API limits.

Too many Data Source calls therefore lead to:

HTTP 429 (Too Many Requests) errors
Exponential backoff and retry cycles
Blocking of other Terraform operations
Unstable CI/CD pipelines

Anti-Pattern: Non-Scaling Examples

Terraform at scale 11 4

Example 1: The Availability Domain Dilemma

Problem: This module performs two separate API calls during each instantiation – one for Availability Domains and one for Subnets. In an environment with 20 database modules, this results in 40 redundant API calls for identical information.

Why it doesn't scale: Each module queries the same global Availability Domains, even though they change very rarely. Additionally, the same Subnet filter is executed in every module instance, which becomes particularly costly in complex VPCs with many subnets.

Impact: With 50 database instances = 100 API calls plus 50x filtering for data that changes at most once per year.

 data "oci_identity_availability_domains" "available" {
  compartment_id = var.compartment_id
}

data "oci_core_subnets" "database" {
  compartment_id = var.compartment_id
  vcn_id         = var.vcn_id
  
  filter {
    name   = "display_name"
    values = ["*database*"]
  }
}

resource "oci_database_db_system" "main" {
  for_each = var.db_systems != null ? var.db_systems : {}
  
  availability_domain = data.oci_identity_availability_domains.available.availability_domains[0].name
  subnet_id          = data.oci_core_subnets.database.subnets[0].id
  compartment_id     = var.compartment_id
}

Example 2: The Image Lookup Problem

Problem: This implementation performs a costly image search with regex filtering during each module invocation. Sorting and local processing of image lists is resource-intensive both on OCI's side and locally in Terraform.

Why it doesn't scale: Image lookups are especially slow, since they transfer large amounts of data and apply complex filters. Every VM performs the same lookup, even though the result is the same for all instances. Local sorting with sort() amplifies the problem further.

Impact: With 100 VM instances = 100 expensive image API calls + 100 local filtering operations = several minutes of planning time for something that could be done in no time.

 data "oci_core_images" "ol8_images" {
  compartment_id           = var.tenancy_ocid
  operating_system         = "Oracle Linux"
  operating_system_version = "8"
  shape                    = var.instance_shape
  
  filter {
    name   = "display_name"
    values = ["Oracle-Linux-8.8-.*"]
    regex  = true
  }
}

locals {
  latest_image_id = sort([for img in data.oci_core_images.ol8_images.images : img.id])[0]
}

resource "oci_core_instance" "this" {
  for_each = var.instances != null ? var.instances : {}
  
  source_details {
    source_id   = local.latest_image_id
    source_type = "image"
  }
}

Example 3: The Nested Data Source Problem

Problem: This module combines multiple scaling issues: cluster lookups, image lookups for worker nodes, and complex dependency chains between Data Sources. Each node pool creation triggers both Data Sources again.

Why it doesn't scale: Nested Data Sources create dependency chains that must be fully resolved for each module instance. Kubernetes-specific image lookups are especially slow because they use highly specific filters with regex patterns that must be processed server-side.

Impact: With 10 node pools = 20 API calls for cluster information + 10 expensive OKE image lookups = exponentially increasing planning times as the number of node pools grows.

data "oci_containerengine_clusters" "existing" {
  compartment_id = var.compartment_id
  
  filter {
    name   = "name"
    values = [var.cluster_name]
  }
}

data "oci_core_images" "worker_images" {
  compartment_id   = var.compartment_id
  operating_system = "Oracle Linux"
  
  filter {
    name   = "display_name"
    values = ["Oracle-Linux-.*-OKE-.*"]
    regex  = true
  }
}

resource "oci_containerengine_node_pool" "workers" {
  for_each = var.node_pools != null ? var.node_pools : {}
  
  cluster_id = data.oci_containerengine_clusters.existing.clusters[0].id
  
  node_config_details {
    size = each.value.size
  }
}

Now we have shown you several examples of how not to use Data Sources. But that alone won’t help you much. In the next article, I will show you how to elegantly avoid this trap through smart architecture and variable injection.

Because the best Data Source is the one that is never executed - or only executed once.

Ralf Ramge

Founder, Cloud Architect & IT Consultant

Terraform @ Scale - Part 7: Module Versioning (Best Practices)

Terraform @ Scale - Part 6c: Module Dependencies for Advanced Users (and Masochists)

Terraform @ Scale - Part 6b: Practical handling of nested modules

Terraform @ Scale - Part 6a: Understanding and Managing Nested Modules

Terraform @ Scale - Part 5b: API Gateways

Terraform @ Scale - Part 5a: Understanding API Limits

Terraform @ Scale - Part 4b: Best Practices for Scaling Data Sources

Terraform @ Scale - Part 4a: Data Sources are Dangerous!

Terraform @ Scale - Part 3c: Monitoring and Alerting for Blast Radius Events

Terraform @ Scale - Part 4a: Data Sources are Dangerous!

The Hidden Scaling Issue

Scenario: The 10-second trap

Why Data Sources in Modules Are Problematic

Problem 1: API Amplification

Problem 2: Performance Degradation

Problem 3: API Limits and Throttling

Anti-Pattern: Non-Scaling Examples

Example 1: The Availability Domain Dilemma

Example 2: The Image Lookup Problem

Example 3: The Nested Data Source Problem

Ralf Ramge

ICT.technology

Terraform @ Scale - Part 7: Module Versioning (Best Practices)

Terraform @ Scale - Part 6c: Module Dependencies for Advanced Users (and Masochists)

Terraform @ Scale - Part 6b: Practical handling of nested modules

Terraform @ Scale - Part 6a: Understanding and Managing Nested Modules

Terraform @ Scale - Part 5b: API Gateways

Terraform @ Scale - Part 5a: Understanding API Limits

The Certificate Bomb is Ticking: The 200-day Deadline Threatens Your Business!

Terraform @ Scale - Part 4b: Best Practices for Scaling Data Sources

Terraform @ Scale - Part 4a: Data Sources are Dangerous!

Terraform @ Scale - Part 3c: Monitoring and Alerting for Blast Radius Events

Terraform @ Scale - Part 4a: Data Sources are Dangerous!

The Hidden Scaling Issue

Scenario: The 10-second trap

Why Data Sources in Modules Are Problematic

Problem 1: API Amplification

Problem 2: Performance Degradation

Problem 3: API Limits and Throttling

Anti-Pattern: Non-Scaling Examples

Example 1: The Availability Domain Dilemma

Example 2: The Image Lookup Problem

Example 3: The Nested Data Source Problem

Ralf Ramge

ICT.technology