Android Map | Article Map
Terraform @ Scale - Part 4b: Best Practices for Scaling Data Sources

Color logo   no background

Terraform @ Scale - Part 4b: Best Practices for Scaling Data Sources

In the last part of this series, we showed how seemingly harmless data sources in Terraform modules can become a serious performance issue. Multi-minute terraform plan runtimes, unstable pipelines and uncontrollable API throttling effects were the result.

But how can you avoid this scalability trap in an elegant and sustainable way?

In this part, we present proven architectural patterns that allow you to centralize data sources, inject them efficiently and thereby achieve fast, stable and predictable Terraform executions even with hundreds of module instances.

Included: three scalable solution strategies, a practical step-by-step guide and a best practices checklist for production-ready infrastructure modules.

 

Best Practice: Scalable Alternatives

Solution 1 (simple scenarios): Variable Injection Pattern

Instead of using data sources in modules, inject the required data as variables:


 data "oci_identity_availability_domains" "available" {
  compartment_id = var.tenancy_ocid
}

data "oci_core_subnets" "database" {
  compartment_id = var.compartment_id
  vcn_id         = var.vcn_id
  
  filter {
    name   = "display_name"
    values = ["*database*"]
  }
}

locals {
  availability_domains = data.oci_identity_availability_domains.available.availability_domains
  database_subnets     = data.oci_core_subnets.database.subnets
}

module "databases" {
  for_each = var.database_configs != null ? var.database_configs : {}
  
  source = "./modules/database"
  
  availability_domains = local.availability_domains
  subnet_ids          = [for subnet in local.database_subnets : subnet.id]
  
  name = each.key
  size = each.value.size
}

 


 variable "availability_domains" {
  type = list(object({
    name = string
    id   = string
  }))
  description = "Available ADs for database placement"
}

variable "subnet_ids" {
  type        = list(string)
  description = "Database subnet IDs"
}

resource "oci_database_db_system" "main" {
  for_each = var.db_systems != null ? var.db_systems : {}
  
  availability_domain = var.availability_domains[0].name
  subnet_id          = var.subnet_ids[0]
  compartment_id     = var.compartment_id
}

 

Solution 2 (complex scenarios): Structured Configuration Pattern

For more complex scenarios, use structured configuration objects:


data "oci_core_images" "ol8" {
  compartment_id           = var.tenancy_ocid
  operating_system         = "Oracle Linux"
  operating_system_version = "8"
}

locals {
  compute_images = {
    "VM.Standard.E4.Flex" = {
      image_id         = [for img in data.oci_core_images.ol8.images : img.id if can(regex(".*E4.*", img.display_name))][0]
      boot_volume_size = 50
    }
    "BM.Standard3.64" = {
      image_id         = [for img in data.oci_core_images.ol8.images : img.id if can(regex(".*Standard.*", img.display_name))][0]
      boot_volume_size = 100
    }
  }
  
  network_config = {
    availability_domains = data.oci_identity_availability_domains.ads.availability_domains
    vcn_id              = data.oci_core_vcn.main.id
  }
}

module "compute_instances" {
  for_each = var.instance_configs != null ? var.instance_configs : {}
  
  source = "./modules/compute-instance"
  
  compute_config = local.compute_images[each.value.shape]
  network_config = local.network_config
}

 


variable "compute_config" {
  type = object({
    image_id         = string
    boot_volume_size = number
  })
  description = "Pre-resolved compute configuration"
}

variable "network_config" {
  type = object({
    availability_domains = list(object({
      name = string
      id   = string
    }))
    vcn_id = string
  })
  description = "Pre-resolved network configuration"
}

resource "oci_core_instance" "this" {
  for_each = var.instances != null ? var.instances : {}
  
  availability_domain = var.network_config.availability_domains[0].name
  compartment_id      = var.compartment_id
  
  source_details {
    source_id               = var.compute_config.image_id
    source_type            = "image"
    boot_volume_size_in_gbs = var.compute_config.boot_volume_size
  }
}

Solution 3 (very complex scenarios): Data Proxy Pattern

For very complex scenarios, create dedicated "Data Proxy" modules:


data "oci_core_images" "oracle_linux" {
  compartment_id           = var.tenancy_ocid
  operating_system         = "Oracle Linux"
  operating_system_version = "8"
}

data "oci_core_vcn" "main" {
  vcn_id = var.vcn_id
}

data "oci_core_security_lists" "web" {
  compartment_id = var.compartment_id
  vcn_id         = var.vcn_id
  
  filter {
    name   = "display_name"
    values = ["*web*"]
  }
}

output "platform_data" {
  value = {
    image_id = data.oci_core_images.oracle_linux.images[0].id
    vcn_id   = data.oci_core_vcn.main.id
    
    instance_shapes = {
      small  = "VM.Standard.E3.Flex"
      medium = "VM.Standard.E4.Flex"
      large  = "VM.Standard3.Flex"
    }
  }
}

 


module "platform_data" {
  source = "./modules/data-proxy"
  
  tenancy_ocid   = var.tenancy_ocid
  compartment_id = var.compartment_id
  vcn_id         = var.vcn_id
}

module "web_servers" {
  for_each = var.web_server_configs != null ? var.web_server_configs : {}
  
  source = "./modules/oci-instance"
  
  platform_data = module.platform_data.platform_data
  
  name          = each.key
  instance_type = each.value.size
}

Performance Comparison

A concrete example from a customer project deploying 50 VM instances illustrates the dramatic difference:

 
 Before: Data Sources in Modules

After: Variable Injection

Number of API Calls
150 API calls 3 API calls
Time Required
$ time terraform plan
...
Plan: 50 to add, 0 to change, 0 to destroy.
real 4m23.415s
user 0m12.484s
sys 0m2.108s
$ time terraform plan
...
Plan: 50 to add, 0 to change, 0 to destroy.
real 0m18.732s
user 0m8.234s
sys 0m1.456s

 

Result: 93% less planning time and 98% fewer API calls.

Variable Injection: Step-by-Step Guide

Step 1: Centralize Data Sources

Goal: Remove all data sources from modules and centralize them in the root module to consolidate API calls and establish a single source of truth.

How: Move all data sources used by modules into the root module. This ensures that each piece of information is queried only once, regardless of how many modules require that data. In doing so, you reduce the number of API calls from N×M (number of modules × number of data sources) to just M (number of data sources).


data "oci_identity_availability_domains" "ads" {
  compartment_id = var.tenancy_ocid
}

data "oci_core_images" "ol8" {
  compartment_id           = var.tenancy_ocid
  operating_system         = "Oracle Linux"  
  operating_system_version = "8"
}

data "oci_core_vcn" "main" {
  vcn_id = var.vcn_id
}

 

Step 2: Process Data in Locals

Goal: Transform raw data source results into a consumable format while keeping complexity out of the modules.

How: Use locals to filter, sort and convert data source results into structured data formats. This allows you to handle complex logic centrally and supply modules with already processed, clean data. With for-loops and conditional expressions, you can also implement fallback mechanisms and validation logic at the same time.


locals {
  availability_domains = [
    for ad in data.oci_identity_availability_domains.ads.availability_domains : ad.name
  ]
  
  compute_images = {
    standard = [
      for img in data.oci_core_images.ol8.images :
      img.id if can(regex(".*Standard.*", img.display_name))
    ][0]
    
    gpu = [
      for img in data.oci_core_images.ol8.images :
      img.id if can(regex(".*GPU.*", img.display_name))  
    ][0]
  }
  
  network_config = {
    vcn_id               = data.oci_core_vcn.main.id
    vcn_cidr            = data.oci_core_vcn.main.cidr_block
    availability_domains = local.availability_domains
  }
}

 

Step 3: Define Variables in Modules

Goal: Create clear interfaces for passing data to modules while ensuring type safety and validation.

How: Replace data sources in modules with typed variables that include descriptive documentation and validation rules. The type definitions ensure consistency and robustness, while validation blocks guarantee that only valid data is passed into the modules. This makes modules more testable and independent from the cloud provider API.


variable "availability_domains" {
  type        = list(string)
  description = "List of available availability domains"
  
  validation {
    condition     = length(var.availability_domains) > 0
    error_message = "At least one availability domain must be provided."
  }
}

variable "compute_images" {
  type        = map(string)
  description = "Map of compute images by type"
  
  validation {
    condition = alltrue([
      for image_id in values(var.compute_images) :
      can(regex("^ocid1\\.image\\.", image_id))
    ])
    error_message = "All image IDs must be valid OCI OCIDs."
  }
}

 

Step 4: Implement Modules Without Data Sources

Goal: Completely decouple modules from external API calls and turn them into pure resource definition containers.

How: Replace all data source references in modules with variable references. This makes modules deterministic and predictable, as they only operate on the parameters passed in and do not make any unexpected API calls. At the same time, it makes modules independently testable, since you can inject mock data through the variables.


resource "oci_core_instance" "this" {
  for_each = var.instances != null ? var.instances : {}
  
  availability_domain = var.availability_domains[each.value.ad_index]
  compartment_id      = var.compartment_id
  shape              = each.value.shape
  
  create_vnic_details {
    subnet_id = each.value.subnet_id
  }
  
  source_details {
    source_id   = var.compute_images[each.value.image_type]
    source_type = "image"
  }
  
  metadata = {
    ssh_authorized_keys = var.ssh_public_key
  }
}

 

Step 5: Call Modules with Injected Data

Goal: Establish the connection between centrally retrieved data and modules to implement a clean data flow pattern.

How: Pass the data processed in locals as parameters to the modules. This completes the variable injection loop: data is retrieved centrally once, processed and then explicitly distributed to the modules. This explicit data transfer creates clear dependencies that are understandable both for humans and for Terraform itself.


module "web_servers" {
  for_each = var.web_server_configs != null ? var.web_server_configs : {}
  
  source = "./modules/compute"
  
  availability_domains = local.availability_domains
  compute_images      = local.compute_images  
  network_config      = local.network_config
  
  instances      = each.value.instances
  compartment_id = each.value.compartment_id
  ssh_public_key = var.ssh_public_key
}

 

Best Practices Checklist

✅ Do's: Scalable Patterns

  • [ ] Central Data Sources: Define all data sources in the root module
  • [ ] Variable Injection: Pass data to modules via variables
  • [ ] Structured Objects: Organize complex data in typed objects
  • [ ] Validation Rules: Implement variable validations for injected data
  • [ ] Documentation: Write variable descriptions for injected data
  • [ ] Local Processing: Process data in locals in the root module
  • [ ] Data Proxy Pattern: Use separate data modules for very complex scenarios

❌ Don'ts: Avoid Anti-Patterns

  • [ ] Data Sources in Modules: Never use data sources in reusable modules
  • [ ] Redundant Lookups: Identical data sources in multiple modules
  • [ ] Complex Filtering: Costly filter operations in every module
  • [ ] Nested Data Sources: Data sources depending on other data sources
  • [ ] Dynamic References: for_each on data source results within modules
  • [ ] Missing Validation: Using injected data without validation

Monitoring and Debugging

To monitor data source performance, you can search and evaluate the debug output of terraform plan for data source entries:


export TF_LOG=DEBUG
export TF_LOG_PATH=./terraform.log

terraform plan 2>&1 | grep -E "(data\.|GET|POST)" | wc -l

terraform plan 2>&1 | grep -E "data\." | awk '{print $2}' | sort | uniq -c

Conclusion: Performant Modules Through Intentional Architecture

Data sources are a powerful Terraform feature - but in modules, they can become a performance trap. The variable injection pattern offers an elegant solution:

Advantages:

  • Drastically reduced API calls (95%+ savings possible)
  • Linear performance scaling instead of exponential degradation
  • Centralized data logic for better maintainability
  • Explicit dependencies instead of hidden data source calls
  • Better testability through injectable mock data

The key lies in a paradigm shift: Instead of fetching data when needed, fetch it once centrally and distribute it in a targeted manner.

At ICT.technology, we have reduced Terraform planning times from minutes to seconds - even with hundreds of module instances - by consistently applying these patterns.